WO2022057598A1 - Procédé et dispositif de rendu d'image - Google Patents
Procédé et dispositif de rendu d'image Download PDFInfo
- Publication number
- WO2022057598A1 WO2022057598A1 PCT/CN2021/115203 CN2021115203W WO2022057598A1 WO 2022057598 A1 WO2022057598 A1 WO 2022057598A1 CN 2021115203 W CN2021115203 W CN 2021115203W WO 2022057598 A1 WO2022057598 A1 WO 2022057598A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- map
- pixel point
- frame
- pixel
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 148
- 238000009877 rendering Methods 0.000 title claims abstract description 120
- 238000003780 insertion Methods 0.000 claims abstract description 99
- 230000037431 insertion Effects 0.000 claims abstract description 99
- 238000005286 illumination Methods 0.000 claims abstract description 70
- 230000002457 bidirectional effect Effects 0.000 claims abstract description 54
- 239000013598 vector Substances 0.000 claims description 152
- 238000012545 processing Methods 0.000 claims description 71
- 238000012549 training Methods 0.000 claims description 66
- 230000004927 fusion Effects 0.000 claims description 29
- 238000004590 computer program Methods 0.000 claims description 15
- 238000003062 neural network model Methods 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 abstract description 46
- 230000008569 process Effects 0.000 description 37
- 238000010586 diagram Methods 0.000 description 26
- 239000011159 matrix material Substances 0.000 description 25
- 238000004422 calculation algorithm Methods 0.000 description 24
- 230000000694 effects Effects 0.000 description 24
- 238000005516 engineering process Methods 0.000 description 20
- 230000009466 transformation Effects 0.000 description 17
- 238000004364 calculation method Methods 0.000 description 15
- 238000004891 communication Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 11
- 239000000463 material Substances 0.000 description 11
- 230000009467 reduction Effects 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 7
- 238000001914 filtration Methods 0.000 description 7
- 239000012634 fragment Substances 0.000 description 6
- 238000007500 overflow downdraw method Methods 0.000 description 6
- 230000008878 coupling Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 238000005859 coupling reaction Methods 0.000 description 4
- 230000002452 interceptive effect Effects 0.000 description 4
- 239000004973 liquid crystal related substance Substances 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000009825 accumulation Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 238000003672 processing method Methods 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 230000001131 transforming effect Effects 0.000 description 3
- 230000002146 bilateral effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 239000004926 polymethyl methacrylate Substances 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000002301 combined effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000009469 supplementation Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
- G06T15/205—Image-based rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/50—Lighting effects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Definitions
- the present application relates to the technical field of image processing, and more particularly, to a method and apparatus for image rendering.
- Ray tracing is a technique used to produce or enhance special visual effects in modern films, games and other fields. It achieves global illumination such as ambient occlusion, indirect reflection, and diffuse by tracing each ray emitted from the camera, which can be used in the rendering framework. It ensures the seamless connection between the image and reality.
- the mainstream ray tracing technology is mainly divided into three modes, including offline mode, interactive mode, and real-time mode.
- the offline mode has the best rendering effect but takes a long time.
- the interactive mode balances the rendering effect and time, and the real-time mode sacrifices some rendering effects to meet the real-time requirements.
- the presentation method of movies is non-interactive, so it can be rendered offline through a large number of servers during production, and games require real-time human-computer interaction, so game manufacturers can only calculate each frame through real-time rendering.
- Real-time computing brings a huge amount of computation.
- the ray sampling value of each pixel directly affects the rendering effect. High sampling value means a huge amount of calculation, while low sampling value introduces a lot of noise on the premise of ensuring real-time rendering, making The quality of the rendered image is degraded.
- the rendering time when the number of samples per pixel (sample per pixel, spp) is 1 in the SponzaGlossy scene is 70ms, and the rendering time of 1spp in the SanMiguel scene is 260ms.
- the path tracing algorithm based on Optix cannot meet the demand. Therefore, to achieve real-time rendering under limited hardware conditions, it is necessary to use low sampling values combined with noise reduction algorithms. Table 1 shows the optimization effect of the existing noise reduction algorithm under the condition of low sampling value of 1 to 2spp:
- Titan XP, Intel i7-7700HQ, Nvidia Quadro M6000 are high-performance graphics cards
- SBF sure based filter
- AAF axis aligned filter for both soft shadows
- LBF learning based filter
- NFOR nonlinearly Weighted first order regression
- AE interactive reconstruction of Monte carlo image sequences using a recurrent denoising autoencoder
- KPCN kernel predicting convolutional networks
- the noise reduction algorithm requires a higher sampling value to obtain a higher-resolution 1080P rendered image, and it takes too long to meet the demand
- spatiotemporal variance-guided filtering spatiotemporal variance-guided filtering, The SVGF) noise reduction algorithm takes time to obtain a rendered image with a resolution of 720P under the condition of low sampling value to meet the time requirement, but the resolution of 720
- the existing real-time ray tracing technology still has the defects of large amount of calculation and high hardware requirements.
- the rendering time is long and cannot meet the time consuming of game rendering. Require. Therefore, it is particularly important to obtain high frame rate and high resolution real-time rendering effects without increasing hardware costs.
- the present application provides an image rendering method and apparatus, which can achieve high resolution and high frame rate rendering images under the condition of low sampling values.
- a first aspect provides an image rendering method, the method comprising: acquiring a first image, a second image and a third image, where the first image, the second image and the third image are three consecutive images; An image updates the light map of the second image to obtain the updated light map of the second image; the updated light map of the second image is input into the super-division denoising network to obtain the super-division and denoising image of the second image; according to The second image updates the light map of the third image to obtain the updated light map of the third image; the updated light map of the third image is input into the super-division denoising network to obtain the super-division and denoising image of the third image; According to the super-divided and de-noised image of the second image and the super-divided and de-noised image of the third image, the initial interpolated frame image is obtained at the target time, and the target time is the time between the second image and the third image; input the initial interpolated frame image Bidirectional frame interpolation network to obtain
- the image rendering method of the embodiment of the present application can process images with a low sampling value (for example, 1spp), which greatly reduces the requirements for hardware devices;
- the illumination map of the sampled values can make up for the noise problem caused by the insufficient amount of sampling information;
- the image rendering method of the embodiment of the present application uses a super-resolution denoising network to process the image, which can improve the resolution of the image;
- the image rendering method of the embodiment of the present application The method interpolates two consecutive frames of images, and uses a bidirectional interpolating network to process the interpolated images, thereby improving the frame rate of the images and ensuring the smoothness of image rendering.
- updating the illumination map of the second image according to the first image to obtain the updated illumination map of the second image includes: acquiring the illumination map of the second image,
- the light map of the two images includes the color values of multiple pixels, and the light map of the second image is a direct light map or an indirect light map; obtain the second pixel corresponding to the first pixel in the first image, the first pixel is any one of a plurality of pixel points; according to the color value of the first pixel point and the color value of the second pixel point, update the color value of the first pixel point to obtain the updated light map.
- the image rendering method of the embodiment of the present application updates the color value of each pixel by accumulating time domain information and combining the historical color information of each pixel, thereby updating the light map of the image, Make up for the lack of sample values, thereby reducing the noise of the rendering.
- the method further includes: acquiring the four closest pixels to the second pixel point The color value of the pixel point, the four pixel points are on the grid nodes of the first image; the color value of the second pixel point is obtained according to the color value of the four pixel points.
- the image rendering method according to the embodiment of the present application considers that there may be pixels in the image of the next frame whose corresponding pixels in the image of the previous frame are not on the grid nodes of the image in the previous frame.
- the color value of the point cannot be obtained directly, so the image rendering method of the embodiment of the present application obtains the color value of the pixel point of the previous frame image according to the method of bilinear interpolation.
- the method before updating the color value of the first pixel point according to the color value of the first pixel point and the color value of the second pixel point, the method further includes: judging the first pixel point The pixel point and the second pixel point are consistent, and the judging that the first pixel point and the second pixel point are consistent includes: acquiring the depth value of the first pixel point, the normal vector value of the first pixel point, and the first pixel point.
- the method of the embodiment of the present application further includes: Consistency judgment is performed on the first pixel point and the second pixel point. Similarly, if the position of the second pixel corresponding to the first pixel in the first image is not on the grid node of the first image, the depth value, normal vector value, and patch ID of the second pixel cannot be directly Obtained, similar to the above-mentioned method for obtaining the color value, a bilinear interpolation algorithm needs to be used to obtain the depth value, normal vector value, and patch ID of the second pixel point.
- updating the color value of the first pixel point according to the color value of the first pixel point and the color value of the second pixel point includes: after the first pixel point is updated The color value of is the sum of multiplying the color value of the first pixel by the first coefficient and multiplying the color value of the second pixel by the second coefficient.
- the image rendering method provides a method for updating the color value of the first pixel with the color value of the first pixel and the color value of the second pixel, wherein the first coefficient and the second coefficient are predetermined set value.
- inputting the updated light map of the second image into a super-division denoising network further comprising: acquiring a depth map of the second image and a normal vector map of the second image ; fuse the depth map of the second image, the normal vector map of the second image, and the updated illumination map of the second image to obtain the first fusion result; input the first fusion result into the super-division denoising network.
- the image rendering method in the embodiment of the present application further includes feature fusion of the depth map, the normal vector map, and the updated light map.
- obtaining an initial interpolated frame image at the target moment according to the super-divided and de-noised image of the second image and the super-divided and de-noised image of the third image includes: obtaining a third image The motion vector from the image to the second image; according to the motion vector from the third image to the second image, determine the first motion vector from the initial interpolated frame image at the target moment to the second image, and the initial interpolated frame image at the target moment to the third image The second motion vector of the second image is obtained; the initial interpolated frame image at the target moment is obtained according to the super-divided and de-noised image of the second image, the super-divided and de-noised image of the third image, the first motion vector and the second motion vector.
- the image rendering method of the embodiment of the present application performs frame interpolation on two consecutive frames of images, thereby improving the frame rate of image rendering.
- inputting the initial frame insertion image into the bidirectional frame insertion network further includes: acquiring the depth map of the second image, the normal vector map of the second image, the The depth map and the normal vector map of the third image; fuse the depth map of the second image, the normal vector map of the second image, the depth map of the third image, and the normal vector map of the third image with the initial frame insertion image to obtain The second fusion result; the second fusion result is input into the bidirectional frame insertion network.
- the image rendering method in the embodiment of the present application further includes feature fusion of the depth map, the normal vector map, and the initial interpolated frame image.
- the super-division denoising network is a pre-trained neural network model
- the training of the super-division denoising network includes: obtaining multiple sets of super-division denoising original training data, Each group of super-divided denoising original training data includes two consecutive frames of images and a standard image corresponding to the next frame of the two consecutive frames of images; it is judged that the pixels of the two consecutive frames of images conform to the consistency ; Obtain the depth map of the next frame of images, the normal vector map of the next frame of images, and the light map of the next frame of images, and the light map of the next frame of images is a direct light map or an indirect light map; Two frames of images update the color values of the pixels of the next frame of image to obtain the updated light map of the next frame of image; Fusion of the light maps to obtain the updated image; according to the updated image and the standard image, the super-resolution denoising network is trained.
- the embodiment of the present application also provides a training method for a super-resolution denoising network, obtaining an image with a low sampling value (for example, 1spp), and using the rendering result of the next frame of image under the condition of a high sampling value (for example, 4096spp) as a standard image. to train.
- a low sampling value for example, 1spp
- a high sampling value for example, 4096spp
- the two-way frame insertion network is a pre-trained neural network model
- the training of the two-way frame insertion network includes: obtaining multiple sets of original training data for two-way frame insertion, and multiple sets of two-way frame insertion.
- each group of bidirectionally inserted frame original training data includes a fourth image, a fifth image, and a sixth image, and the fourth image, the fifth image, and the sixth image are three consecutive frames of images;
- the six images obtain the interpolated frame image at the middle moment of the fourth image and the sixth image; according to the interpolated frame image and the fifth image at the middle moment, the two-way frame interpolation network is trained.
- the embodiment of the present application also provides a method for training a bidirectional frame insertion network, obtaining consecutive fourth images, fifth images, and sixth images, and performing frame insertion on the fourth and sixth images to obtain an initial frame insertion result, and then The fifth image is used as a criterion to train the neural network.
- an image rendering device configured to acquire a first image, a second image and a third image, wherein the first image, the second image and the third image are three consecutive images a frame image; a processing module for updating the light map of the second image according to the first image to obtain the light map after the second image update; the processing module is also used for inputting the light map after the second image update into super-score denoising network to obtain the super-divided and denoised image of the second image; the processing module is also used to update the illumination map of the third image according to the second image to obtain the updated illumination map of the third image; the processing module is also used to The light map after the image update is input to the super-division denoising network to obtain the super-division and de-noising image of the third image; the processing module is also used for the super-division and de-noising image of the second image and the super-division and de-noising image of the third image.
- the light map after the image update is input to the super-d
- the processing module updates the illumination map of the second image according to the first image to obtain the updated illumination map of the second image, including: acquiring the illumination map of the second image , the light map of the second image includes the color values of multiple pixels, and the light map of the second image is a direct light map or an indirect light map; obtain the second pixel corresponding to the first pixel in the first image, the first The pixel point is any one of the plurality of pixel points; according to the color value of the first pixel point and the color value of the second pixel point, the color value of the first pixel point is updated to obtain the updated light map.
- the processing module is further configured to: obtain the four pixels closest to the second pixel point.
- the color value of the pixel points obtain the color value of the second pixel point according to the color value of the four pixel points.
- the processing module before the processing module updates the color value of the first pixel point according to the color value of the first pixel point and the color value of the second pixel point, the processing module is further used for : judging the consistency of the first pixel point and the second pixel point, the judging the consistency of the first pixel point and the second pixel point includes: obtaining the depth value of the first pixel point, the normal vector value of the first pixel point, the first pixel point The patch ID of a pixel point and the depth value of the second pixel point, the normal vector value of the second pixel point, the patch ID of the second pixel point; the depth value of the first pixel point and the depth value of the second pixel point The square of the difference is less than the first threshold; the square of the difference between the normal vector value of the first pixel point and the normal vector value of the second pixel point is less than the second threshold value; the patch ID of the first pixel point and the patch of the second pixel point IDs are equal.
- updating the color value of the first pixel point according to the color value of the first pixel point and the color value of the second pixel point includes: after the first pixel point is updated The color value of is the sum of multiplying the color value of the first pixel by the first coefficient and multiplying the color value of the second pixel by the second coefficient.
- the processing module inputs the updated light map of the second image into the super-division denoising network, and further includes: acquiring a depth map of the second image, a method for obtaining the second image vector image;
- the depth map of the second image, the normal vector map of the second image, and the updated illumination map of the second image are fused to obtain a first fusion result; the first fusion result is input into the super-division denoising network.
- the processing module obtains the initial interpolated frame image at the target moment according to the super-divided and de-noised image of the second image and the super-divided and de-noised image of the third image, including: obtaining The motion vector from the third image to the second image; according to the motion vector from the third image to the second image, determine the first motion vector from the initial interpolated frame image at the target time to the second image, and the initial interpolated frame image at the target time to the first motion vector of the second image.
- the second motion vector of the three images; the initial interpolated frame image at the target moment is determined according to the super-divided and de-noised image of the second image, the super-divided and de-noised image of the third image, the first motion vector and the second motion vector.
- the processing module inputs the initial frame insertion image into the bidirectional frame insertion network, and further includes: acquiring a depth map of the second image, a normal vector map of the second image, a third The depth map of the image and the normal vector map of the third image; fuse the depth map of the second image, the normal vector map of the second image, the depth map of the third image, and the normal vector map of the third image with the initial frame insertion image, to obtain the second fusion result; input the second fusion result into the bidirectional frame interpolation network.
- the super-division denoising network is a pre-trained neural network model
- the training of the super-division denoising network includes: acquiring multiple sets of super-division denoising original training data, Each group of super-divided denoising original training data includes two consecutive frames of images and a standard image corresponding to the next frame of the two consecutive frames of images; it is judged that the pixels of the two consecutive frames of images conform to the consistency ; Obtain the depth map of the next frame of images, the normal vector map of the next frame of images, and the light map of the next frame of images, and the light map of the next frame of images is a direct light map or an indirect light map; Two frames of images update the color values of the pixels of the next frame of image to obtain the updated light map of the next frame of image; Fusion of the light maps to obtain the updated image; according to the updated image and the standard image, the super-resolution denoising network is trained.
- the two-way frame insertion network is a pre-trained neural network model
- the training of the two-way frame insertion network includes: obtaining multiple sets of original training data for two-way frame insertion, and multiple sets of two-way frame insertion.
- each group of bidirectionally inserted frame original training data includes a fourth image, a fifth image, and a sixth image, and the fourth image, the fifth image, and the sixth image are three consecutive frames of images;
- the six images obtain the interpolated frame image at the middle moment of the fourth image and the sixth image; according to the interpolated frame image and the fifth image at the middle moment, the two-way frame interpolation network is trained.
- an image rendering device in a third aspect, includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed by the processor, the processor executes the above Some or all of the operations of any one of the first aspects.
- an electronic device in a fourth aspect, includes the image rendering apparatus according to any one of the above-mentioned second aspects.
- a computer-readable storage medium stores a computer program that can be executed by a processor.
- the processor executes any of the above-mentioned first aspects. Some or all of the operations in a way.
- a chip in a sixth aspect, includes a processor, and the processor is configured to perform some or all of the operations in the method described in the first aspect above.
- a seventh aspect provides a computer program or computer program product, the computer program or computer program product comprising computer-readable instructions, when the computer-readable instructions are executed by a processor, the processor executes the first aspect above. Some or all of the operations in either way.
- FIG. 1 is a schematic block diagram of ray tracing and rasterization according to an embodiment of the present application
- FIG. 2 is a schematic structural diagram of a U-Net neural network according to an embodiment of the present application.
- FIG. 3 is a schematic block diagram of an electronic device according to an embodiment of the present application.
- FIG. 4 is a schematic block diagram of a system architecture of an existing image rendering method based on real-time ray tracing technology according to an embodiment of the present application;
- FIG. 5 is a schematic block diagram of a system architecture of an image rendering method based on real-time ray tracing technology according to an embodiment of the present application
- FIG. 6 is a schematic flowchart of an image rendering method according to an embodiment of the present application.
- FIG. 7 is a schematic flowchart of training of a super-resolution denoising network according to an embodiment of the present application.
- FIG. 8 is a schematic flowchart of the training of the bidirectional frame insertion network according to the embodiment of the present application.
- FIG. 9 is a schematic block diagram of an image rendering method according to an embodiment of the present application.
- FIG. 10 is a schematic flowchart of acquiring a data set according to an embodiment of the present application.
- FIG. 11 is a schematic block diagram of a rasterization process according to an embodiment of the present application.
- FIG. 12 is a schematic block diagram of obtaining parameters of pixels of the previous frame by bilinear interpolation according to an embodiment of the present application
- FIG. 13 is a schematic block diagram of performing superdivision denoising on an image by using a superdivision denoising network in an embodiment of the present application;
- FIG. 14 is a schematic block diagram of processing an image using a bidirectional frame insertion network according to an embodiment of the present application.
- 15 is a schematic block diagram of an image rendering apparatus according to an embodiment of the present application.
- 16 is a schematic block diagram of an apparatus for training a superdivision denoising network according to an embodiment of the present application
- 17 is a schematic block diagram of an apparatus for training a bidirectional frame insertion network according to an embodiment of the present application.
- FIG. 18 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- references in this specification to "one embodiment” or “some embodiments” and the like mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
- appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically emphasized otherwise.
- the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
- Ray Tracing A special rendering algorithm in 3D computer graphics that emits a ray from the viewpoint and passes through every pixel on the view plane and continuously determines the intersection of the ray and the object, while taking into account optical phenomena such as reflection and refraction to render the 3D scene.
- Global illumination refers to a rendering technology that considers both the direct illumination from the light source in the scene and the indirect illumination reflected by other objects in the scene, showing the combined effect of direct illumination and indirect illumination.
- Rasterization It is a process of converting vertex data into fragments, which has the function of converting a graph into an image composed of rasters. It is a process of converting geometric primitives into two-dimensional images.
- Rigid body A solid of finite size that can be deformed in negligibly. The distance between particle and particle inside the rigid body does not change whether or not it is subjected to an external force.
- Image super-resolution Image super-resolution, a technology that reconstructs an input low-resolution image into a high-resolution image.
- Deep learning a branch of machine learning, which is an algorithm that uses artificial neural networks as the architecture to perform representation learning on data.
- FIG. 1 shows a schematic block diagram of ray tracing and rasterization in an embodiment of the present application.
- Ray tracing and rasterization are both rendering technologies, the purpose is to project objects in three-dimensional space to two-dimensional screen space for display through computational shading.
- the difference between ray tracing and rasterization is that ray tracing rendering assumes that each point on the screen is a forward ray, calculates where these rays hit the graphics (the triangle shown in Figure 1), and then calculates The texture pixel color at these positions is obtained; while rasterization rendering is to first perform coordinate transformation on the vertices of the graphics (the triangle shown in Figure 1), and then fill the texture inside the triangle on the two-dimensional screen.
- ray tracing requires a larger amount of computation, but the rendering effect is more realistic.
- the image rendering method of the embodiment of the present application is a rendering method based on ray tracing.
- U-Net convolutional neural network
- CNN convolutional neuron network
- U-Net was originally applied to the medical image segmentation task, and has been widely used in various segmentation tasks because of its good effect.
- U-Net supports a small amount of data to train the model. By classifying each pixel point, a higher segmentation accuracy is obtained.
- U-Net uses the trained model to segment images, which is fast.
- FIG. 2 shows a schematic structural diagram of U-Net, and U-Net is briefly introduced below in conjunction with FIG. 2 .
- Figure 2 shows the network structure of U-Net, the encoder part on the left, downsampling the input, and the downsampling is realized by max pooling; the decoder part on the right, upsampling the output of the encoder to restore the resolution , upsampling is achieved by upsample; the middle is skip-connect for feature fusion. Since the entire network structure is shaped like "U”, it is called U-Net.
- upsampling and downsampling can increase the robustness to some small perturbations of the input image, such as image translation, rotation, etc., reduce the risk of overfitting, reduce the amount of computation and increase the size of the receptive field.
- the function of upsampling is to restore and decode the abstract features to the size of the original image, and finally obtain a clear and noise-free image.
- the image rendering method in this embodiment of the present application may be performed by an electronic device.
- the electronic device may be a mobile terminal (eg, a smart phone), a computer, a personal digital assistant, a wearable device, a vehicle-mounted device, an Internet of Things device, or other devices capable of image rendering processing.
- the electronic device may be a device running an Android system, an IOS system, a windows system, and other systems.
- the graphics rendering method in the embodiment of the present application may be performed by an electronic device, and the specific structure of the electronic device may be shown in FIG. 3 .
- the specific structure of the electronic device will be described in detail below with reference to FIG. 3 .
- the electronic device 300 may include: a central processing unit (CPU) 301 , a graphics processing unit (GPU) 302 , a display device 303 and a memory 304 .
- the electronic device 300 may further include at least one communication bus 310 (not shown in FIG. 3 ) for realizing connection and communication between various components.
- the various components in the electronic device 300 may also be coupled through other connectors, and the other connectors may include various types of interfaces, transmission lines, or buses.
- the various components in the electronic device 300 may also be connected in a radial manner centered on the processor 301 .
- coupling refers to electrical connection or communication with each other, including direct connection or indirect connection through other devices.
- the central processing unit 301 and the graphics processing unit 302 in the electronic device 300 may be located on the same chip, or may be separate chips.
- the functions of the central processing unit 301 , the graphics processing unit 302 , the display device 303 and the memory 304 are briefly introduced below.
- the Central processing unit 301 used to run the operating system 305 and application programs 307 .
- the application 307 may be a graphics-type application, such as a game, a video player, and the like.
- the operating system 305 provides a system graphics library interface, through which the application 307 generates graphics for rendering or a driver provided by the operating system 305, such as a graphics library user mode driver and/or a graphics library kernel mode driver.
- the system graphics library includes but is not limited to: an embedded open graphics library (open graphics library for embedded system, OpenGL ES), the khronos platform graphics interface (the khronos platform graphics interface) or Vulkan (a cross-platform graphics application program interface) and other system graphics libraries.
- the instruction stream contains a series of instructions, which are usually invocation instructions for the interface of the system graphics library.
- the central processing unit 301 may include at least one of the following types of processors: an application processor, one or more microprocessors, a digital signal processor (DSP), a microcontroller (microcontroller unit, MCU) or artificial intelligence processor, etc.
- processors an application processor, one or more microprocessors, a digital signal processor (DSP), a microcontroller (microcontroller unit, MCU) or artificial intelligence processor, etc.
- DSP digital signal processor
- MCU microcontroller unit
- artificial intelligence processor etc.
- the central processing unit 301 may further include necessary hardware accelerators, such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or an integrated circuit for implementing logic operations.
- the processor 301 may be coupled to one or more data buses for transferring data and instructions between the various components of the electronic device 300 .
- Graphics processor 302 used to receive the graphics instruction stream sent by the processor 301, generate a rendering target through a rendering pipeline, and display the rendering target to the display device 303 through the layer composition display module of the operating system.
- graphics processor 302 may include a general-purpose graphics processor that executes software, such as a GPU or other type of dedicated graphics processing unit, or the like.
- Display device 303 used to display various images generated by the electronic device 300, which can be a graphical user interface (GUI) of an operating system or image data (including still images and videos) processed by the graphics processor 302 data).
- GUI graphical user interface
- display device 303 may include any suitable type of display screen.
- display screen Such as liquid crystal display (liquid crystal display, LCD) or plasma display or organic light-emitting diode (organic light-emitting diode, OLED) display and so on.
- the memory 304 is a transmission channel between the central processing unit 301 and the graphics processor 302, and may be a double data rate synchronous dynamic random access memory (DDR SDRAM) or other types of cache.
- DDR SDRAM double data rate synchronous dynamic random access memory
- the rendering pipeline is a series of operations sequentially performed by the graphics processor 302 in the process of rendering graphics or image frames. Typical operations include: vertex processing (Vertex Processing), primitive processing (Primitive Processing), rasterization (Rasterization), fragmentation Fragment Processing and so on.
- transformation matrices In order to transform coordinates from one coordinate system to another coordinate system, several transformation matrices are generally needed. The most important transformation matrices are the three matrices of model (Model), observation (View), and projection (Projection).
- the coordinates of vertex data generally start from the local space (Local Space). Here, the coordinates of the local space are called local coordinates (Local Coordinates). After the local coordinates are transformed, they will become world coordinates in turn. Observe the coordinates. (View Coordinate), clip coordinate (Clip Coordinate), and finally end in the form of screen coordinate (Screen Coordinate).
- the local coordinates are the coordinates of the object relative to the local origin, and are also the coordinates of the start of the object.
- the local coordinates are transformed into world space coordinates, and the world space coordinates are in a larger spatial range. These coordinates are relative to the world's global origin, and they are placed relative to the world's origin along with other objects.
- the clipping coordinates are processed to a range of -1.0 to 1.0 and which vertices will appear on the screen.
- the crop coordinates are transformed to screen coordinates, and a process called Viewport Transform is used next.
- the viewport transform transforms coordinates in the range -1.0 to 1.0 into the coordinate range defined by the glViewport function.
- the final transformed coordinates will be sent to the rasterizer to convert it into a fragment (after converting to a fragment, the video image can be displayed according to the fragment).
- the reason for transforming vertices into different spaces is because some operations only make sense and are more convenient in a specific coordinate system. For example, when you need to modify an object, it makes more sense to do it in local space; if you want to make an operation on an object relative to the position of other objects, it makes more sense to do it in the world coordinate system. Makes sense, wait. We could also define a transformation matrix that transforms directly from local space to clip space if we wanted to, but that would lose a lot of flexibility.
- the local space refers to the coordinate space where the object is located, that is, the place where the object first started.
- you created a cube in a modeling software say Blender. It is possible that the origin of the cube you create is at (0,0,0), even though it may end up in a completely different position in the program. It is even possible to create all models with an initial position of (0,0,0) (however they will end up in different positions in the world). So, all vertices of the created model are in local space: they are local to your object.
- a model matrix is a transformation matrix that can place an object in the position or orientation it should be by displacing, scaling, and rotating it. You can think of it as transforming a house, you need to shrink it first (it's too big in local space) and shift it to a small town on the outskirts, then rotate it a little to the left on the y axis to match the nearby House of. You can also think of the matrix used in the previous section for placing boxes around the scene roughly as a model matrix; we transform the box's local coordinates to different locations in the scene/world.
- the observation space is often referred to as the cross-platform graphics library (open graphics library, OPENGL) camera (sometimes also called the camera space (Camera Space) or visual space (Eye Space)).
- OPENGL open graphics library
- the viewing space is the result of converting world space coordinates to coordinates in front of the user's field of view. So the viewing space is the space observed from the camera's point of view. And this is usually done by a combination of displacements and rotations, translating/rotating the scene so that certain objects are transformed in front of the camera. These combined transformations are usually stored in a View Matrix, which is used to transform world coordinates into view space. In the next section we will discuss in depth how to create such an observation matrix to simulate a camera.
- OpenGL At the end of a vertex shader run, OpenGL expects all coordinates to fall within a certain range, and any points outside this range should be clipped. The clipped coordinates will be ignored, so the remaining coordinates will become the fragment visible on the screen. This is where the name Clip Space comes from.
- projection Matrix specifies a range of coordinates, such as -1000 to 1000 in each dimension.
- the projection matrix will then transform the coordinates within this specified range to the range of normalized device coordinates (-1.0, 1.0). All coordinates outside the range will not be mapped to the range -1.0 to 1.0, so will be clipped.
- the coordinates (1250, 500, 750) will not be visible, this is because its x coordinate is out of range, it is converted to a normalized device coordinate greater than 1.0, so it is clipped .
- OpenGL will reconstruct the triangle into one or more triangles that fit within the clipping range.
- Ray tracing obtains accurate shadows, reflections, and diffuse global illumination effects by tracing every ray emitted from the camera, so simulation and rendering of highly realistic virtual scenes requires a huge amount of computation and power consumption.
- real-time ray tracing with a resolution of 1920 ⁇ 1080 and a frame rate of 30fps, due to the limitation of GPU hardware conditions, only a sampling value of 1 to 2spp can be provided for each pixel value.
- Low sampling value will introduce a lot of noise, which makes the rendered image Quality is reduced, if for higher resolutions like 4k or 8k the sample value must be lower. Therefore, it is necessary to optimize the rendering effect under the condition of low sampling value, remove the noise caused by insufficient sampling, and output stable global illumination images without increasing the hardware cost and maintaining real-time ray tracing.
- the existing SVGF algorithm combines information filtering and denoising in the spatial and temporal domains, and calculates the variance in the spatial and temporal domains to distinguish high-frequency texture information and noise areas, and guide the filter to perform multi-layer filtering.
- this method cannot accurately estimate the motion vector of non-rigid body motion and shadow parts, so the denoising effect of shadow parts is poor; at the same time, this method adopts traditional bilateral filtering, which cannot dynamically adjust the filter weight, and multi-layer filtering takes a long time, so this method uses traditional bilateral filtering. The method has poor timeliness.
- Another existing KPCN algorithm divides the image into a specular reflection part and a diffuse reflection part, and then uses a neural network to adaptively adjust the filter kernel weights of the specular reflection part and the diffuse reflection part, and finally combines the two parts to obtain a joint denoising result.
- the disadvantage of this method is that the sampling value is large (usually 32spp), the model structure is huge but the effect is single (only with denoising function), the calculation amount is large, time-consuming, and the time domain information is not fully utilized for supplementation, so this algorithm is almost impossible. Meet real-time requirements.
- the existing third algorithm uses pixel surface information and shadow information to calculate the gradient change value. If the gradient value is larger, it means that the pixel points move. The larger the value is, the corresponding historical information of the pixel is discarded.
- This method proposes to judge the intensity of motion based on gradient, which is used to alleviate the ghost phenomenon caused by the inability to accurately estimate the motion vector due to large displacement, but it cannot be used as a separate denoising module.
- This method may be combined with the above-mentioned SVGF algorithm, and may also be used in combination with the image rendering method of the embodiment of the present application.
- the power consumption and computing power of the existing GPU hardware are limited, and the calculation amount is very large under the condition of high sampling value, which cannot meet the real-time requirement of 30fps. If only 1 or 2 rays are traced for each pixel, although the amount of calculation is greatly reduced, a lot of noise is introduced. In addition, the noise characteristics of different material surfaces are different. If the same denoising process is used, the denoising effect will be poor, which further increases the difficulty of the denoising algorithm under low sampling values.
- the motion vector of rigid body motion can be accurately obtained.
- G-buffer refers to the buffer containing color, normal, and world space coordinates.
- the motion vector estimation of the non-rigid body and shadow parts is not accurate, which will cause the rendering effect to decrease.
- the real-time performance of the noise reduction algorithm is further affected by the size of the image resolution, and the real-time performance of the ray tracing noise reduction algorithm under high resolution and high frame rate faces greater challenges.
- an embodiment of the present application proposes an image rendering method. Under limited hardware conditions, using low sampling values and combining time domain information, different optimization strategies are adopted for different noise points generated by different materials to achieve a high frame rate. and high-resolution real-time ray-traced image rendering.
- Figure 4 shows a schematic block diagram of the system architecture of an existing image rendering method based on real-time ray tracing technology. As shown in Figure 4, it includes a model material loading module 401, a ray generation module 402, a ray intersection module 403, and a denoising module. Module 404, post-processing module 405, display module 406 and other six parts.
- the first step of image rendering based on real-time ray tracing technology is the loading of model materials, which mainly involves two parts.
- the first part is to add the model that needs to be rendered to the scene, and the second part is to add their own materials to the models in the scene. information and texture information.
- This step is implemented by the type material loading module 401 .
- Generating rays refers to the process of emitting rays from a light source to the imaging dimension plane.
- the number of rays emitted for each pixel greatly affects the final rendering effect.
- a low sampling value results in a blurry image and more noise; a high sampling value results in a clearer image and better effect.
- due to limited hardware conditions in order to ensure the real-time performance of ray tracing, generally only 1 to 2 rays are emitted for a pixel. This step is implemented by the light generating module 402 .
- Ray tracing splits the rendering task of a scene into considering the effect of several rays from the camera on the scene. These rays do not know each other, but know information about the entire scene model.
- Ray intersection means to trace the light emitted by the camera, find the intersection with the scene model, and obtain the material, texture and other information on the surface of the scene model according to the position of the intersection, and calculate the reflected light in combination with the light source information. The calculation of reflected rays is based on Monte Carlo importance sampling. Under the condition of sampling value of 1spp, that is, only 1 ray is traced for each intersection point. Corresponding to the ray generation part, the sample value of the reflected ray also affects the final rendering effect.
- the ray intersection is realized by the generating ray intersection module 403 .
- the denoising module 404 is used to reduce the noise caused by the low sampling value, so as to ensure the rendering effect while ensuring the real-time performance of ray tracing.
- the post-processing module 405 is used to improve the rendering effect by adopting tone mapping and anti-aliasing (temporal anti-aliasing, TAA) techniques.
- tone mapping technology is used to map and transform the color of the image, adjust the grayscale of the image, so that the processed image can better express the information and characteristics of the original image
- TAA technology uses toning technology to ease the "jaggies" at the edges of the image, Makes image edges smoother.
- the display module 406 is used to display the final rendered image.
- FIG. 5 shows a schematic block diagram of the system architecture of an image rendering method based on real-time ray tracing technology according to an embodiment of the present application. As shown in FIG. 5 , it includes a model material loading module 501 , a down-sampling generation ray module 502 , and a ray intersection module 503, denoising module 504, frame insertion module 505, post-processing module 506, display module 507 and other seven parts.
- the denoising module 504 integrates the super-score technology to restore the image.
- a frame insertion module 505 is added after the denoising module 504 to solve the real-time problem in high frame rate scenarios.
- FIG. 6 shows a schematic flowchart of an image rendering method according to an embodiment of the present application. As shown in FIG. 6 , it includes steps 601 to 607 , and these steps are described in detail below.
- S601 Obtain the n-1 th frame image, the n th frame image, and the n+1 th frame image.
- the n-1 th frame image, the n th frame image, and the n+1 th frame image are three consecutive images.
- the continuity here means that the n-1 th frame image is before the n th frame image, and the n th frame image is before the n+1 th frame image.
- the image of the n-1th frame, the image of the nth frame, and the image of the n+1th frame are the low-sampling values (for example, the model material loading module 501, the down-sampling generation ray module 502, and the ray intersection module 503 in FIG. 5). 1spp) image.
- S602 Update the illumination map of the n-th frame of image according to the n-1-th frame of image, so as to obtain an updated illumination map of the n-th frame of image. This step may be performed by the denoising module 504 in FIG. 5 .
- the light map includes direct light map and indirect light map, in which the direct light map is the light map obtained by the light source directly illuminating the observed object, and the light reflected by the observed object into the user's eyes; the indirect light map is the light source first irradiated to other On the object, after one or more reflections, the light finally reaches the observed object, and the light map obtained by the observed object reflected into the user's eyes.
- an image is composed of a plurality of pixel points, wherein each pixel point has its own color value, and the sum of the color values of all the pixel points on an image is the light map of the image.
- the illumination map includes a direct illumination map and an indirect illumination map.
- Direct illumination means that the light of the light source directly illuminates the object
- indirect illumination means that the light of the light source is reflected one or more times and then illuminates the object.
- the direct illumination map is hereinafter referred to as example to illustrate.
- any pixel in the nth frame image is recorded as the first pixel point, and the pixel corresponding to the acquired first pixel point in the n-1th frame image is recorded as the second pixel point. That is to say, the second pixel is the pixel corresponding to the first pixel in the n-1 th frame image.
- the color value of the first pixel point is updated. Specifically, the color value of the first pixel point can be multiplied by the first coefficient to obtain the first result, Multiply the color value of the second pixel by the second coefficient to obtain the second result, and then add the first result and the second result to obtain the updated color value of the first pixel.
- the first coefficient and the second coefficient may be artificially preset values, which are not specifically limited in this embodiment of the present application.
- the color value of the second pixel cannot be directly obtained at this time.
- a bilinear interpolation algorithm needs to be used to obtain the color value of the second pixel. Specifically, first find the color values of the four pixels that are closest to the second pixel, and these four pixels need to be on the grid node of the n-1th frame image; then obtain the color values of these four pixels ; using the color values of these four pixels combined with the bilinear interpolation algorithm, the color value of the second pixel can be calculated.
- the method of the embodiment further includes performing consistency judgment on the first pixel point and the second pixel point.
- the square of the difference between the value and the normal vector value of the second pixel is less than the second threshold, and the patch ID of the first pixel is equal to the patch ID of the second pixel, then it is considered that the first pixel and the second pixel satisfy If they are consistent, the color value of the second pixel can be used to update the color value of the first pixel. If the first pixel does not satisfy the consistency with the second pixel, the color value of the first pixel is not updated with the color value of the second pixel, and the updated color value of the first
- the depth value and normal vector of the second pixel at this time cannot be obtained directly.
- a bilinear interpolation algorithm needs to be used to obtain the depth value, normal vector value, and patch ID of the second pixel. Specifically, first find the depth value, normal vector value, and patch ID of the four pixels closest to the second pixel.
- These four pixels need to be on the grid node of the n-1th frame image; then obtain The depth value, normal vector value, and patch ID of these four pixels; using the depth value, normal vector value, and patch ID of these four pixels, combined with the bilinear interpolation algorithm, the second pixel can be calculated. Depth value, normal vector value, patch ID.
- the updated direct illumination map of the nth frame image After updating the color value of each pixel in the nth frame image in the above manner, the updated direct illumination map of the nth frame image can be obtained.
- the processing methods for direct illumination images and indirect illumination images are the same.
- the processing methods for indirect illumination images reference may be made to the above-mentioned processing methods for direct illumination images.
- S603 Input the updated light map of the nth frame of images into a super-division denoising network to obtain a super-division and denoising image of the nth frame of images. This step may be performed by the denoising module 504 in FIG. 5 .
- the updated light map includes an updated direct light map and an updated indirect light map.
- the depth map of the nth frame of image is the sum of the depth values of all pixels in the nth frame of image
- the normal vector map of the nth frame of image is the sum of the normal vector values of all pixels in the nth frame image; then the updated direct light map, the updated indirect light map, the depth map, and the normal vector map of the nth frame image are fused to obtain the first fusion result.
- the fusion method can be an existing feature fusion method, such as concate or add, which is not specifically limited in this embodiment of the application; finally, the first fusion result is input into the super-division denoising network, so as to obtain the nth frame
- the super-resolution denoised image of the image is a pre-trained neural network model, and the superdivision denoising network may have a U-Net network structure as shown in Figure 2. The training process of the superdivision denoising network is described in detail below.
- S604 update the light map of the n+1 th frame image according to the n th frame image, so as to obtain the updated light map of the n+1 th frame image. This step may be performed by the denoising module 504 in FIG. 5 .
- the process of updating the light map of the n+1th frame image according to the nth frame image is similar to the process of updating the light map of the nth frame image according to the n-1th frame image.
- the illumination map of the n-th frame of image is used to describe the process of obtaining the updated illumination map of the n-th frame of image.
- S605 input the updated light map of the n+1 th frame image into a super-division denoising network, so as to obtain a super-division and denoising image of the n+1 th frame image.
- This step may be performed by the denoising module 504 in FIG. 5 .
- the process of inputting the updated light map of the n+1th frame image into the superdivision denoising network is similar to the process of inputting the updated light map of the nth frame image to the superdivision denoising network.
- the light map after the image update is input to the super-resolution denoising network to obtain the description of the process of the super-resolution denoising image of the nth frame of image.
- S606 obtaining an initial interpolated frame image at the target moment according to the super-divided and denoised image of the n-th frame of image and the super-divided and de-noised image of the n-th frame of image, where the target moment is between the n-th frame image and the n+1-th frame image time.
- This step may be performed by the frame insertion module 505 in FIG. 5 .
- the super-divided and denoised image of the nth frame image and the n+1th frame image is obtained to obtain the initial interpolated frame image at the target moment, wherein the target moment is the moment between the nth frame image and the n+1th frame image, preferably, the target moment is the nth frame image and the n+1th frame image.
- the target moment is the moment between the nth frame image and the n+1th frame image, preferably, the target moment is the nth frame image and the n+1th frame image.
- Intermediate moments between frame images Specifically, first obtain the motion vector from the n+1th frame image to the nth frame image.
- each pixel in the n+1th frame image to the corresponding pixel in the nth frame image has a motion vector , the sum of the motion vectors of all pixels is the motion vector from the n+1th frame image to the nth frame image;
- the first motion vector to the nth frame image, and the second motion vector from the initial interpolated frame image to the n+1th frame image at the target time for example, suppose that the motion vector from the n+1th frame image to the nth frame image is M 3 ⁇ 2 , the target time is time t, and t is a value between (0, 1), then the first motion vector from the initial interpolated frame image to the n-th frame image at the target time is:
- the second motion vector from the initial interpolated frame image at the target moment to the n+1th frame image is:
- the initial interpolated frame image at the target moment can be obtained according to the super-divided and de-noised image of the nth frame image, the super-divided and de-noised image of the n+1 th frame image, the first motion vector and the second motion vector.
- the super-divided and de-noised image of the n-th frame image is I 2
- the super-divided and de-noised image of the n+1-th frame image is I 3
- the calculation method of the initial frame insertion image at the target moment is:
- I t (1-t) ⁇ g(I 2 ,M t ⁇ 2 )+t ⁇ g(I 3 ,M t ⁇ 3 )
- S607 Input the initial frame insertion image into the bidirectional frame insertion network to obtain the frame insertion image at the target moment. This step may be performed by the frame insertion module 505 in FIG. 5 .
- the method of the embodiment of the present application can directly input the initial frame interpolation image into the bidirectional frame interpolation network, but in order to make up for the problem of inaccurate motion vector estimation of non-rigid body motion and shadow motion, the method of the embodiment of the present application further includes: first obtaining the nth frame The depth map of the frame image, the normal vector map of the nth frame image, the depth map of the n+1th frame image, and the normal vector map of the n+1th frame image; then the depth map of the nth frame image, the nth frame image The normal vector map of the third image, the depth map of the third image, and the normal vector map of the third image are fused with the initial interpolated frame image to obtain the second fusion result.
- the fusion method can be an existing feature fusion method, such as concate or add and other methods, which are not specifically limited in this embodiment of the present application; finally, the second fusion result is input into the bidirectional frame insertion network.
- the bidirectional frame insertion network is a pre-trained neural network model, and the bidirectional frame insertion network may have a U-Net network structure as shown in Figure 2. The training process of the bidirectional frame insertion network is described in detail below.
- the image rendering method of the embodiment of the present application can process images with a low sampling value (for example, 1spp), which greatly reduces the requirements for hardware devices;
- the illumination map of the sampled values can make up for the noise problem caused by the insufficient amount of sampling information;
- the image rendering method of the embodiment of the present application uses a super-resolution denoising network to process the image, which can improve the resolution of the image;
- the image rendering method of the embodiment of the present application The method interpolates two consecutive frames of images, and uses a bidirectional interpolating network to process the interpolated images, thereby improving the frame rate of the images and ensuring the smoothness of image rendering.
- FIG. 7 shows a schematic flow chart of training of the superdivision denoising network according to the embodiment of the present application. As shown in FIG. 7 , it includes steps 701 to 706 , and these steps are introduced separately below.
- S701 Acquire multiple sets of super-divided denoising original training data, where each set of super-divided and denoised original training data in the multiple sets of super-divided denoising original training data includes two consecutive frames of images and a corresponding image of the next frame of the two consecutive frames of images. Standard image.
- two consecutive frames of images here are images with a low sampling value (eg, 1spp), and the standard image corresponding to the next frame of image is the rendering result of the next frame of image under the condition of a high sampling value (eg, 4096spp).
- Standard images are used as training standards for images with low sample values.
- S703 Acquire the depth map of the next frame of images, the normal vector map of the next frame of images, and the light map of the next frame of images, where the light map of the next frame of images is a direct light map or an indirect light map.
- S704 update the color values of the pixels of the next frame of images according to the two consecutive frames of images, so as to obtain an updated light map of the next frame of images.
- the color value of the pixel of the next frame of image is updated according to two consecutive frames of images, and the method of updating the color value of the pixel can refer to the description in the above S602. For brevity, this application implements The example will not be repeated here.
- the current color value of the pixel point of the next frame of image is used as the updated color value of the pixel point. After updating the color value of each pixel in the next frame of image, the updated light map of the next frame of image can be obtained.
- the training of the super-resolution denoising network here is the same as the training method of the general neural network, and the high sampling value image is used as the standard, so that the training result of the low sampling value image is close to the high sampling value image.
- the difference between the images is smaller than the preset value, it is considered that the training of the super-resolution denoising network is completed at this time.
- FIG. 8 shows a schematic flowchart of the training of the bidirectional frame insertion network according to the embodiment of the present application. As shown in FIG. 8 , it includes steps 801 to 803 , and these steps are introduced separately below.
- each group of original training data for bidirectional frame insertion in the multiple groups of bidirectional frame insertion original training data includes a fourth image, a fifth image, a sixth image, a fourth image, a fifth image, The sixth image is three consecutive frames of images.
- the fourth image, the fifth image, and the sixth image may be super-divided and denoised images obtained through the above steps, or may be other images, which are not limited in this embodiment of the present application.
- S802 Acquire an interpolated frame image at an intermediate moment between the fourth image and the sixth image according to the fourth image and the sixth image.
- the fifth image is used as the standard image, so that the training result for the interpolated image at the middle moment is close to the fifth image.
- the difference between the training result and the fifth image is less than the preset value, it is considered that the bidirectional frame interpolation network training at this time is completed.
- FIG. 9 shows a schematic block diagram of an image rendering method according to an embodiment of the present application. The following describes the image rendering method in an embodiment of the present application in detail with reference to FIG. 9 .
- the image rendering method of the embodiment of the present application involves a superdivision denoising network and a bidirectional frame insertion network.
- the superdivision denoising network and the bidirectional frame insertion network have the U-Net network model structure in FIG. 2 above.
- the frame insertion network needs to be pre-trained.
- the training of the super-resolution denoising network includes: dividing the obtained illumination information into direct illumination and indirect illumination through ray tracing, and updating the color information of direct illumination and indirect illumination combined with the motion vector and consistency judgment in the G-buffer. Integrate the updated color information of direct illumination and indirect illumination with the corresponding depth information and normal information in the G-buffer, where the fusion method can be concate or add.
- the fused direct illumination and indirect illumination are input into the superdivision denoising network, and the superdivision denoising result of direct illumination and the superdivision denoising result of indirect illumination are obtained respectively.
- the ground truth of the super-resolution denoising network is the ray tracing rendering result with a sample value of 4096app.
- the training of the two-way frame insertion network includes: denoting the three consecutive frames of images output by the super-resolution denoising network as I AI+Denoise_0 , I AI+Denoise_1 , I AI+Denoise_2 , and combining G-buffer to obtain I AI+Denoise_0 and I AI+
- the bidirectional motion vector between Denoise_1 , I AI+Denoise_1 and I AI+Denoise_2 is estimated by using the interpolation frame calculation formula to obtain the initial intermediate frame interpolation result I AI+Denoise_1_calculate .
- the Ground Truth of the bidirectional frame insertion network is I AI+Denoise_1 .
- the rasterization rendering shown in Figure 9 maps a series of coordinate values of a three-dimensional object to a two-dimensional plane.
- the process from 3D coordinates to 2D coordinates is usually carried out in steps, requiring multiple coordinate systems to transition, including: local space, world space, observation space, clipping space, plane space, etc.
- Transforming coordinates from one coordinate system to another coordinate system is achieved by a transformation matrix, where the transformation matrix from the local space coordinate to the world space coordinate is the model matrix M model , and the transformation matrix from the world space coordinate system to the observation space coordinate system is The transformation matrix is the viewing matrix M view , and the transformation matrix from the viewing space coordinate system to the clipping space coordinate system is the projection matrix M projection .
- the scenes of two adjacent frames of images are related, so the relative offset of the same pixel on the two adjacent frames of images is the motion vector, and the process of solving the motion vector is motion estimation.
- the motion vector of the pixels in two consecutive frames of images can be obtained.
- the depth information, normal information, mesh ID, motion vector, etc. of the image can be obtained, which are all stored in the G-buffer.
- the obtained lighting information is divided into direct lighting and indirect lighting through ray tracing.
- the historical color buffer By accessing the historical color buffer and combining the color values of the corresponding pixels in the current frame, the continuous accumulation of color values in the time domain can be achieved. This is because the image will generate more noise when the sampling value is small.
- the accumulation of the historical color value and the current color value is equivalent to increasing the sampling value.
- the historical information will be accumulated only when the information and meshid satisfy the consistency at the same time.
- the position of the pixel of the current frame is projected into the previous frame, and bilinear interpolation is used to obtain the normal information, depth information and meshid of the pixel in the previous frame, and then the consistency is carried out. Judgment, update the color cache of direct lighting and indirect lighting according to the judgment result. Finally, the color information of direct illumination and indirect illumination as well as the corresponding depth information and normal information are fused respectively, and input to the super-division denoising network, the super-division denoising image can be obtained, which are recorded as the 0th frame and the 1st frame, respectively.
- the 0th frame and the 1st frame are two consecutive frames of images.
- the bidirectional motion vector corresponding to the middle time t of the 0th frame and the first frame can be obtained by linear operation, and the super-divided denoising image of the 0th frame and the first frame is combined with the middle time t.
- the corresponding bidirectional motion vector is mapped to obtain the initial frame insertion result.
- the initial frame insertion result is fused with the corresponding depth information and normal information in the G-buffer, and then input into the bidirectional frame insertion network to obtain the final frame insertion result, that is, the final t-th frame image.
- the image rendering method of the embodiment of the present application combines the super-division technology and the frame interpolation technology, so as to obtain an image with a high frame rate and a given resolution under the condition of a low sampling value, and at the same time reduce the rendering time.
- the image rendering method according to the embodiment of the present application is described above with reference to FIG. 9 , and the following describes the image rendering method according to the embodiment of the present application in further detail with reference to specific examples.
- Fig. 10 shows a schematic flowchart of acquiring a dataset in an embodiment of the present application.
- the model scene dataset used for training a neural network in this embodiment of the present application may be a collection of existing public rendering models. It can also be a collection of model scenarios developed and built by yourself.
- different types of rendering scenes should be screened, such as buildings, cars, homes, games, animals, characters, statues and other different scenes .
- the data set should contain smooth and complex areas as much as possible, in which complex areas are more difficult to remove noise than smooth areas due to more textures.
- the method of the embodiment of the present application further includes performing a series of operations on the acquired image, such as flipping, rotating, stretching, and shrinking, so as to expand the data set as much as possible.
- FIG. 11 shows a schematic block diagram of the rasterization process of the embodiment of the present application.
- the rasterization process is to convert three-dimensional coordinates into two The process of dimensional coordinates.
- the most important transformation matrices are the model (Model) matrix M model , the observation (View) matrix M view , the projection ( Projection) matrix M projection three matrices.
- the coordinates of the vertex data generally start from the local space (Local Space).
- the coordinates of the local space are called the local coordinates (Local Coordinate).
- the local coordinates are called the local coordinates (Local Coordinate).
- the local coordinates After the local coordinates are transformed, they will become the world coordinates (World Coordinate) in turn.
- Observe the coordinates. View Coordinate
- clip coordinates Clip Coordinate
- Screen Coordinate End in the form of screen coordinates
- M mvp_prev M projection_prev ⁇ M view_prev ⁇ M model_prev ;
- M mvp_cur M projection_cur ⁇ M view_cur ⁇ M model_cur ;
- aPos represents the three-dimensional coordinates in the local coordinate system.
- the method of the embodiment of the present application also needs to use the above calculated motion vector and the method of bilinear interpolation to find out that the pixel point u in the current frame image I is in the previous frame.
- the pixel point u in the current frame is projected to the corresponding position in the previous frame to obtain the pixel point v at the corresponding position, and then the consistency judgment of the pixel points u and v is performed.
- the formula for judging consistency is as follows:
- Wid_cur Wid_prev
- W z_cur represents the depth value of the pixel point u
- W z_prev represents the depth value of the pixel point v
- the square of the difference between the two needs to be less than the depth threshold threshold z
- W n_cur represents the normal value of the pixel point u
- W n_prev represents the pixel point
- Wid_cur represents the meshid of the pixel u
- Wid_prev represents the meshid of the pixel v, and the two need to be equal.
- the depth threshold threshold z and the normal threshold threshold n are empirical values, which can be appropriately adjusted and determined according to the rendering result.
- C update represents the updated light map
- C original represents the original light map
- C history represents the light map in the historical cache
- ⁇ represents the scale factor of the original light map and the light map in the historical cache
- the coefficient can be an artificial set value
- the pixel point u and the pixel point v do not meet the consistency, it is considered that the pixel point corresponding to the pixel point u is not found in the previous frame, and the current color value of the pixel point u is used as the updated color value. .
- the first part of the data set of the super-resolution denoising network is the direct light map obtained from the rendering pipeline. After consistency judgment, the direct light color value in the historical cache is accumulated to the current direct light map.
- the formula is as follows:
- C direct_update ⁇ 1 ⁇ C direct_original +(1- ⁇ 1 ) ⁇ C direct_history
- C direct_update represents the updated direct light map
- C direct_original represents the original direct light map
- C direct_history represents the direct light map in the history cache
- ⁇ 1 represents the scale coefficient of the original direct light map and the direct light map in the history cache .
- the second part of the data set of the super-resolution denoising network is the indirect light map obtained from the rendering pipeline. After the consistency judgment, the indirect light color value in the historical cache is accumulated to the current indirect light map.
- the formula is as follows :
- C indirect_update ⁇ 2 ⁇ C indirect_original +(1- ⁇ 2 ) ⁇ C indirect_history
- C indirect_update represents the updated indirect light map
- C indirect_original represents the original indirect light map
- C indirect_history represents the indirect light map in the history cache
- ⁇ 2 represents the ratio of the original indirect light map and the indirect light map in the history cache coefficient.
- the third part of the super-resolution denoising network dataset is the depth map I depth and normal vector map I normal_vector of the current frame obtained from the G-buffer.
- the training dataset Dataset of the super-resolution denoising network consists of four parts:
- FIG. 13 shows a schematic block diagram of performing super-division denoising on an image by using a super-division denoising network in an embodiment of the present application.
- the processing of the direct illumination image of a certain pixel is taken as an example for description.
- the processing of the indirect illumination map is similar to the processing process of the direct illumination map. For details, reference may be made to the processing process of the direct illumination map, which will not be repeated in this embodiment of the present application.
- the current frame buffer includes the motion vector of the pixel from the current frame to the previous frame, the depth information, normal information and meshid of the pixel in the current frame and other parameters
- the previous frame buffer includes the historical color value of the pixel, the depth information of the pixel in the current frame, the normal information and the meshid and other parameters.
- the motion vector is used to project the previous frame buffer into the space of the current frame, and the consistency judgment is made according to parameters such as depth information, normal information and meshid. If the judgment result is consistent, the historical color value and the current color value are accumulated, and the color value of the current frame is updated; if the judgment result is inconsistent, the current color value is retained.
- the updated color value update the historical color value.
- the updated color value is fused with the depth information and normal information of the current frame, and the fusion result is input into the super-division denoising network to obtain the super-division and denoising image.
- the acquisition of the data set of the bidirectional frame interpolation network includes: denoting the continuous three frames of images output by the super-resolution denoising network as I AI+Denoise_0 , I AI+Denoise_1 , I AI+Denoise_2 , and obtaining I AI+Denoise_1 from the G-buffer
- the motion vector to I AI+Denoise_0 is M 1-0
- the motion vector from I AI+Denoise_2 to I AI+Denoise_1 is M 1-2 obtained from the G-buffer, so the frame insertion result can be obtained according to the frame insertion formula:
- I AI+Denoise_1_calculate (1-t) ⁇ g(I AI+Denoise_0 ,M t-0 )+t ⁇ g(I AI+Denoise_2 ,M t-2 )
- I AI+Denoise_1_calculate is the frame insertion result
- t represents the moment position of the frame insertion result in the 0th frame and the 2nd frame, for example, t can be taken as 0.5
- M t-0 represents the motion vector from the time t to the 0th frame time
- M t-0 is equal to M 1-0
- M t-2 represents the motion vector from time t to the second frame time, where M t-2 is equal to M 1-2 .
- the frame interpolation result I AI+Denoise_1_calculate is used as the input of the bidirectional frame interpolation network, and I AI+Denoise_1 is used as the Ground Truth of the bidirectional frame interpolation network, so that the bidirectional frame interpolation network can be trained.
- FIG. 14 shows a schematic block diagram of processing an image by using a bidirectional frame interpolation network according to an embodiment of the present application.
- M 1-0 represents the motion vector from the next frame image obtained from the G-buffer to the previous frame image
- t is the coefficient between the interval (0, 1), representing the previous frame image I 0 and the next frame image
- the bidirectional motion vectors M t-0 and M t-1 at time t can be obtained through linear frame interpolation calculation.
- the calculation formula is as follows:
- I t_initial (1-t) ⁇ g(I 0 ,M t-0 )+t ⁇ g(I 1 ,M t-1 )
- It_initial represents the initial frame insertion result
- the function g( ) represents the mapping operation
- the depth information and normal information of the previous frame image and the next frame image are obtained from the G-buffer and fused with the initial frame interpolation result to compensate for the inaccurate motion vector estimation of non-rigid motion or shadow motion.
- the fusion result is input into the bidirectional frame interpolation network to obtain the final frame interpolation result.
- FIGS. 15 to 18 The image rendering method of the embodiment of the present application is described in detail above with reference to FIGS. 7 to 14 , and the graphics rendering apparatus of the embodiment of the present application is described in detail below with reference to FIGS. 15 to 18 . It should be understood that the image rendering apparatuses shown in FIGS. 15 to 18 can perform various steps of the image rendering methods of the embodiments of the present application. When describing the image rendering apparatuses shown in FIGS. 15 to 18 below, repeated descriptions are appropriately omitted.
- FIG. 15 is a schematic block diagram of an image rendering apparatus according to an embodiment of the present application. As shown in FIG. 15 , it includes an acquisition module 1501 and a processing module 1502 , which will be briefly introduced below.
- the acquiring module 1501 is configured to acquire a first image, a second image and a third image, where the first image, the second image and the third image are three consecutive frames of images.
- the processing module 1502 is configured to update the illumination map of the second image according to the first image, so as to obtain an updated illumination map of the second image.
- the processing module 1502 is further configured to input the updated light map of the second image into the superdivision denoising network to obtain a superdivision denoising image of the second image.
- the processing module 1502 is further configured to update the illumination map of the third image according to the second image, so as to obtain an updated illumination map of the third image.
- the processing module 1502 is further configured to input the updated light map of the third image into the superdivision denoising network to obtain a superdivision denoising image of the third image.
- the processing module 1502 is further configured to obtain an initial interpolated frame image at a target time according to the super-divided and de-noised image of the second image and the super-divided and de-noised image of the third image, where the target time is the time between the second image and the third image.
- the processing module 1502 is further configured to input the initial frame insertion image into the bidirectional frame insertion network to obtain the frame insertion image at the target moment.
- the processing module 1502 is further configured to execute each step of the method in S602 to S607 in FIG. 6 .
- the processing module 1502 is further configured to execute each step of the method in S602 to S607 in FIG. 6 .
- FIG. 6 For details, reference may be made to the above description of FIG. 6 .
- FIG. 16 shows a schematic block diagram of an apparatus for training a superdivision denoising network according to an embodiment of the present application. As shown in FIG. 16 , it includes an acquisition module 1601 and a processing module 1602 , which will be briefly introduced below.
- the acquiring module 1601 is used to acquire multiple sets of super-score denoising original training data, and each group of the multiple sets of super-segmented and denoised original training data includes two consecutive frames of images and the latter one of the two consecutive frames of images.
- the acquisition module 1601 is also used to acquire the depth map of the next frame of images, the normal vector map of the next frame of images, and the light map of the next frame of images, where the light map of the next frame of images is a direct light map or an indirect light map. lightmap;
- the processing module 1602 is used for judging that the pixels of two consecutive frames of images are consistent
- the processing module 1602 is also used to update the color values of the pixels of the next frame of images according to two consecutive frames of images, so as to obtain the updated light map of the next frame of images;
- the processing module 1602 is also used to merge the depth map of the next frame of image, the normal vector image of the next frame of image and the updated light map of the next frame of image, to obtain the updated image;
- the processing module 1602 is further configured to train the super-resolution denoising network according to the updated image and the standard image.
- FIG. 17 shows a schematic block diagram of an apparatus for training a bidirectional frame insertion network according to an embodiment of the present application. As shown in FIG. 17 , it includes an acquisition module 1701 and a processing module 1702 , which will be briefly introduced below.
- the acquisition module 1701 is used to acquire multiple groups of two-way frame insertion original training data, and each group of two-way frame insertion original training data in the multiple groups of two-way frame insertion original training data includes the fourth image, the fifth image, the sixth image, the fourth image, The fifth image and the sixth image are three consecutive frames of images;
- a processing module 1702 configured to obtain an interpolated frame image at an intermediate moment between the fourth image and the sixth image according to the fourth image and the sixth image;
- the processing module 1702 is further configured to train the bidirectional frame insertion network according to the frame insertion image and the fifth image at the intermediate moment.
- FIG. 18 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- the electronic device in FIG. 18 includes a communication module 3010 , a sensor 3020 , a user input module 3030 , an output module 3040 , a processor 3050 , a memory 3070 and a power supply 3080 .
- the processor 3050 may include one or more CPUs.
- the electronic device shown in FIG. 18 may execute various steps of the graphics rendering method of the embodiment of the present application. Specifically, one or more CPUs in the processor 3050 may execute each step of the graphics rendering method of the embodiment of the present application.
- the communication module 3010 may include at least one module that enables communication between the electronic device and other electronic devices.
- the communication module 3010 may include one or more of a wired network interface, a broadcast receiving module, a mobile communication module, a wireless Internet module, a local area communication module, and a location (or positioning) information module, and the like.
- the communication module 3010 can acquire the game screen from the game server in real time.
- the sensor 3020 may sense some operations of the user, and the sensor 3020 may include a distance sensor, a touch sensor and the like.
- the sensor 3020 can sense operations such as the user touching the screen or approaching the screen.
- the sensor 3020 can sense some operations of the user on the game interface.
- the user input module 3030 is used to receive input digital information, character information or contact touch operation/non-contact gesture, and receive signal input related to user settings and function control of the system.
- User input module 3030 includes a touch panel and/or other input devices. For example, the user can control the game through the user input module 3030.
- the output module 3040 includes a display panel for displaying information input by the user, information provided to the user, various menu interfaces of the system, and the like.
- the display panel may be configured in the form of a liquid crystal display (liquid crystal display, LCD) or an organic light-emitting diode (organic light-emitting diode, OLED).
- the touch panel may cover the display panel to form a touch display screen.
- the output module 3040 may further include a video output module, an alarm, a haptic module, and the like.
- the video output module can display the game screen after graphics rendering.
- the power supply 3080 may receive external power and internal power under the control of the processor 3050, and provide the power required for the operation of each module of the entire electronic device.
- the processor 3050 may include one or more CPUs, and the processor 3050 may also include one or more GPUs.
- the processor 3050 includes multiple CPUs
- the multiple CPUs may be integrated on the same chip, or may be integrated on different chips respectively.
- the processor 3050 includes multiple GPUs
- the multiple GPUs can be integrated on the same chip or on different chips respectively.
- the processor 3050 includes both a CPU and a GPU
- the CPU and the GPU may be integrated on the same chip.
- the inside of the processor of the smart phone is generally a CPU and a GPU related to image processing. Both the CPU and GPU here can contain multiple cores.
- the memory 3070 may store computer programs including an operating system program 3072, an application program 3071, and the like.
- typical operating systems such as Microsoft's Windows, Apple's MacOS, etc. are used for desktop or notebook systems, and Google's developed based on android system etc. for mobile terminals.
- the memory 3070 may be one or more of the following types: flash memory, hard disk type memory, micro multimedia card type memory, card memory (eg SD or XD memory), random access memory , RAM), static random access memory (static RAM, SRAM), read-only memory (read only memory, ROM), electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), programmable Read-only memory (programmable ROM, PROM), magnetic memory, magnetic disk or optical disk.
- the memory 3070 can also be a network storage device on the Internet, and the system can perform operations such as updating or reading on the memory 3070 on the Internet.
- the above-mentioned memory 3070 may store a computer program (the computer program is a program corresponding to the graphics rendering method of the embodiment of the present application).
- the processor 3050 executes the computer program, the processor 3050 can execute the graphics of the embodiment of the present application. render method.
- the memory 3070 also stores other data 3073 other than computer programs.
- the memory 3070 may store data during processing of the graphics rendering method of the present application.
- connection relationship of each module in FIG. 18 is only an example, and the electronic device provided in any embodiment of the present application may also be applied to electronic devices in other connection manners, for example, all modules are connected through a bus.
- Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program that can be executed by a processor.
- the processor executes as shown in FIG. 6 to the method described in any one of FIG. 8 .
- the disclosed system, apparatus and method may be implemented in other manners.
- the apparatus embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
- the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
- the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Computer Graphics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Geometry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Processing (AREA)
Abstract
La présente invention concerne un procédé et un dispositif de rendu d'image, capables d'obtenir une image rendue ayant une haute résolution et une fréquence d'images élevée dans des conditions de faible valeur d'échantillonnage. Le procédé comprend les étapes consistant à acquérir trois images de trame consécutives, c'est-à-dire une première image, une deuxième image et une troisième image ; à mettre à jour une carte d'éclairage de la seconde image en fonction de la première image pour obtenir une carte d'éclairage mise à jour de la deuxième image ; à entrer la carte d'éclairage mise à jour de la deuxième image dans un réseau de super-résolution et de débruitage pour obtenir une image de super-résolution et de débruitage de la deuxième image ; à mettre à jour une carte d'éclairage de la troisième image en fonction de la deuxième image pour obtenir une carte d'éclairage mise à jour de la troisième image ; à entrer la carte d'éclairage mise à jour de la troisième image dans le réseau de super-résolution et de débruitage pour obtenir une image de super-résolution et de débruitage de la troisième image ; à acquérir une image d'insertion de trame initiale à un moment cible en fonction de l'image de super-résolution et de débruitage de la seconde image et de l'image de super-résolution et de débruitage de la troisième image, le moment cible étant un moment entre la deuxième image et la troisième image ; et à entrer l'image d'insertion de trame initiale dans un réseau d'insertion de trame bidirectionnel pour obtenir une image d'insertion de trame au moment cible.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010971444.5 | 2020-09-16 | ||
CN202010971444.5A CN112184575B (zh) | 2020-09-16 | 2020-09-16 | 图像渲染的方法和装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022057598A1 true WO2022057598A1 (fr) | 2022-03-24 |
Family
ID=73921318
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/115203 WO2022057598A1 (fr) | 2020-09-16 | 2021-08-30 | Procédé et dispositif de rendu d'image |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112184575B (fr) |
WO (1) | WO2022057598A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116485678A (zh) * | 2023-04-28 | 2023-07-25 | 深圳联安通达科技有限公司 | 基于嵌入式操作系统的图像处理方法 |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112184575B (zh) * | 2020-09-16 | 2024-09-13 | 华为技术有限公司 | 图像渲染的方法和装置 |
CN113067960B (zh) * | 2021-03-16 | 2022-08-12 | 合肥合芯微电子科技有限公司 | 影像插补方法、装置和存储介质 |
CN113592998A (zh) * | 2021-06-29 | 2021-11-02 | 北京百度网讯科技有限公司 | 重光照图像的生成方法、装置及电子设备 |
CN113947547B (zh) * | 2021-10-19 | 2024-04-09 | 东北大学 | 基于多尺度核预测卷积神经网络的蒙特卡洛渲染图降噪方法 |
US11836844B2 (en) * | 2022-03-03 | 2023-12-05 | Nvidia Corporation | Motion vector optimization for multiple refractive and reflective interfaces |
CN116453456B (zh) * | 2023-06-14 | 2023-08-18 | 北京七维视觉传媒科技有限公司 | Led屏幕校准方法、装置、电子设备及存储介质 |
CN116672707B (zh) * | 2023-08-04 | 2023-10-20 | 荣耀终端有限公司 | 生成游戏预测帧的方法和电子设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103778656A (zh) * | 2014-02-12 | 2014-05-07 | 腾讯科技(深圳)有限公司 | 一种图像渲染方法、装置及电子设备 |
US20170330496A1 (en) * | 2016-05-16 | 2017-11-16 | Unity IPR ApS | System and method for rendering images in virtual reality and mixed reality devices |
CN109743626A (zh) * | 2019-01-02 | 2019-05-10 | 京东方科技集团股份有限公司 | 一种图像显示方法、图像处理方法和相关设备 |
CN112184575A (zh) * | 2020-09-16 | 2021-01-05 | 华为技术有限公司 | 图像渲染的方法和装置 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010046989A1 (fr) * | 2008-10-23 | 2010-04-29 | パイオニア株式会社 | Dispositif de conversion de taux de trame, dispositif de traitement d’images, dispositif d’affichage, procédé de conversion de taux de trame, programme associé, et support d'enregistrement sur lequel le programme est enregistré |
WO2014160510A2 (fr) * | 2013-03-13 | 2014-10-02 | Massachusetts Institute Of Technology | Stéréo-endoscopie photométrique |
CN105517671B (zh) * | 2015-05-25 | 2020-08-14 | 北京大学深圳研究生院 | 一种基于光流法的视频插帧方法及系统 |
US10116916B2 (en) * | 2016-03-17 | 2018-10-30 | Nvidia Corporation | Method for data reuse and applications to spatio-temporal supersampling and de-noising |
US10636201B2 (en) * | 2017-05-05 | 2020-04-28 | Disney Enterprises, Inc. | Real-time rendering with compressed animated light fields |
CN110136055B (zh) * | 2018-02-02 | 2023-07-14 | 腾讯科技(深圳)有限公司 | 图像的超分辨率方法和装置、存储介质、电子装置 |
US10922790B2 (en) * | 2018-12-21 | 2021-02-16 | Intel Corporation | Apparatus and method for efficient distributed denoising of a graphics frame |
CN111510691B (zh) * | 2020-04-17 | 2022-06-21 | Oppo广东移动通信有限公司 | 颜色插值方法及装置、设备、存储介质 |
-
2020
- 2020-09-16 CN CN202010971444.5A patent/CN112184575B/zh active Active
-
2021
- 2021-08-30 WO PCT/CN2021/115203 patent/WO2022057598A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103778656A (zh) * | 2014-02-12 | 2014-05-07 | 腾讯科技(深圳)有限公司 | 一种图像渲染方法、装置及电子设备 |
US20170330496A1 (en) * | 2016-05-16 | 2017-11-16 | Unity IPR ApS | System and method for rendering images in virtual reality and mixed reality devices |
CN109743626A (zh) * | 2019-01-02 | 2019-05-10 | 京东方科技集团股份有限公司 | 一种图像显示方法、图像处理方法和相关设备 |
CN112184575A (zh) * | 2020-09-16 | 2021-01-05 | 华为技术有限公司 | 图像渲染的方法和装置 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116485678A (zh) * | 2023-04-28 | 2023-07-25 | 深圳联安通达科技有限公司 | 基于嵌入式操作系统的图像处理方法 |
CN116485678B (zh) * | 2023-04-28 | 2024-02-09 | 深圳联安通达科技有限公司 | 基于嵌入式操作系统的图像处理方法 |
Also Published As
Publication number | Publication date |
---|---|
CN112184575B (zh) | 2024-09-13 |
CN112184575A (zh) | 2021-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022057598A1 (fr) | Procédé et dispositif de rendu d'image | |
TWI764974B (zh) | 使用一類神經網路過濾影像資料 | |
US10970816B2 (en) | Motion blur and depth of field reconstruction through temporally stable neural networks | |
US9881391B2 (en) | Procedurally defined texture maps | |
US9129443B2 (en) | Cache-efficient processor and method of rendering indirect illumination using interleaving and sub-image blur | |
Navarro et al. | Motion blur rendering: State of the art | |
US11373358B2 (en) | Ray tracing hardware acceleration for supporting motion blur and moving/deforming geometry | |
US10909659B2 (en) | Super-resolution image processing using a machine learning system | |
US11734890B2 (en) | Three-dimensional model recovery from two-dimensional images | |
US11887256B2 (en) | Deferred neural rendering for view extrapolation | |
US11615602B2 (en) | Appearance-driven automatic three-dimensional modeling | |
WO2022143367A1 (fr) | Procédé de rendu d'image et son dispositif associé | |
WO2023233423A1 (fr) | Entraînement de réseau neuronal pour rendu implicite | |
US20240177394A1 (en) | Motion vector optimization for multiple refractive and reflective interfaces | |
WO2024148898A1 (fr) | Procédé et appareil de débruitage d'image, et dispositif informatique et support de stockage | |
Schmitz et al. | High-fidelity point-based rendering of large-scale 3-D scan datasets | |
Gao et al. | Neural global illumination: Interactive indirect illumination prediction under dynamic area lights | |
Schwandt et al. | Environment estimation for glossy reflections in mixed reality applications using a neural network | |
Kolhatkar et al. | Real-time virtual viewpoint generation on the GPU for scene navigation | |
WO2005124693A2 (fr) | Systeme graphique en 3d avec mappage de texture inverse | |
US10453247B1 (en) | Vertex shift for rendering 360 stereoscopic content | |
US10559122B2 (en) | System and method for computing reduced-resolution indirect illumination using interpolated directional incoming radiance | |
Smit et al. | A shared-scene-graph image-warping architecture for VR: Low latency versus image quality | |
US20230410425A1 (en) | Real-time rendering of image content generated using implicit rendering | |
Li et al. | Real-time volume rendering with octree-based implicit surface representation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21868426 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21868426 Country of ref document: EP Kind code of ref document: A1 |