WO2022228383A1 - 一种图形渲染方法以及装置 - Google Patents
一种图形渲染方法以及装置 Download PDFInfo
- Publication number
- WO2022228383A1 WO2022228383A1 PCT/CN2022/088979 CN2022088979W WO2022228383A1 WO 2022228383 A1 WO2022228383 A1 WO 2022228383A1 CN 2022088979 W CN2022088979 W CN 2022088979W WO 2022228383 A1 WO2022228383 A1 WO 2022228383A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- rendering
- scene data
- area
- rendered
- Prior art date
Links
- 238000009877 rendering Methods 0.000 title claims abstract description 609
- 238000000034 method Methods 0.000 title claims abstract description 185
- 238000005070 sampling Methods 0.000 claims abstract description 8
- 238000013528 artificial neural network Methods 0.000 claims description 79
- 238000012545 processing Methods 0.000 claims description 79
- 230000033001 locomotion Effects 0.000 claims description 54
- 239000013598 vector Substances 0.000 claims description 50
- 230000004927 fusion Effects 0.000 claims description 46
- 238000005286 illumination Methods 0.000 claims description 42
- 238000012937 correction Methods 0.000 claims description 32
- 238000004364 calculation method Methods 0.000 claims description 31
- 230000008859 change Effects 0.000 claims description 21
- 238000003860 storage Methods 0.000 claims description 17
- 239000000463 material Substances 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 5
- 230000000875 corresponding effect Effects 0.000 description 58
- 238000012549 training Methods 0.000 description 48
- 230000006870 function Effects 0.000 description 37
- 230000008569 process Effects 0.000 description 33
- 238000010586 diagram Methods 0.000 description 23
- 238000004422 calculation algorithm Methods 0.000 description 17
- 238000013527 convolutional neural network Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 13
- 230000000694 effects Effects 0.000 description 12
- 239000012634 fragment Substances 0.000 description 12
- 230000001537 neural effect Effects 0.000 description 12
- 239000000872 buffer Substances 0.000 description 11
- 238000013473 artificial intelligence Methods 0.000 description 10
- 239000011159 matrix material Substances 0.000 description 8
- 210000002569 neuron Anatomy 0.000 description 8
- 238000007781 pre-processing Methods 0.000 description 8
- 239000011521 glass Substances 0.000 description 5
- 238000012805 post-processing Methods 0.000 description 5
- 239000000047 product Substances 0.000 description 5
- 230000004913 activation Effects 0.000 description 4
- 238000013480 data collection Methods 0.000 description 4
- 238000013500 data storage Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000003062 neural network model Methods 0.000 description 4
- 238000007500 overflow downdraw method Methods 0.000 description 4
- 230000003190 augmentative effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 241000282326 Felis catus Species 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 238000004040 coloring Methods 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 238000005315 distribution function Methods 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000004566 building material Substances 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 238000003702 image correction Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000001550 time effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/50—Controlling the output signals based on the game progress
- A63F13/52—Controlling the output signals based on the game progress involving aspects of the displayed game scene
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/60—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/005—General purpose rendering architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/50—Lighting effects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
Definitions
- the present application relates to the field of image rendering, and in particular, to a graphics rendering method and apparatus.
- the graphics pipeline can easily provide the auxiliary information required for the calculation of the upsampling algorithm, the displacement vector (motion vector, mv) of the pixel point, and the depth information of the pixel point, etc., while the ordinary image field can only pass the algorithm. Estimation or additional sensor equipment to assist in acquisition.
- current GPU hardware can support a programmable rendering pipeline, which usually includes several stages such as vertex shading, shape (primitive) assembly, geometry shading, rasterization, fragment shading, testing and blending, among which vertex shading, geometry shading, and fragment shading
- stages support user customization through programming.
- vertex shading is generally used for coordinate transformation of model vertices
- fragment shading is used to calculate the final displayed pixel color.
- computing power required for rendering increases exponentially. Therefore, how to reduce the computing power required for rendering has become an urgent problem to be solved.
- the present application provides a graphics rendering method and device, which are used to reduce the computing power required for rendering by requesting rendering at high and low intervals, and improve the clarity of the target rendered image by multiplexing the high-definition information of the historical rendering image, so that the Devices with lower computing power can also get high-definition rendered images.
- the present application provides a graphics rendering method, including: rendering first scene data to obtain a first image, the first scene data is obtained according to the viewing angle area of a virtual camera, and the rendering method includes the first In at least one of the rendering mode or the second rendering mode, the resolution of the image obtained by the first rendering mode is higher than that of the image obtained by the second rendering mode.
- the first rendering mode is a high-definition rendering mode
- the The second rendering mode is the low-definition rendering mode
- the rendering mode of the first scene data includes the second rendering mode, the first image is up-sampled to obtain the up-sampled image; the historical rendering image is obtained, and the historical rendering image is the second scene.
- the data is rendered and obtained, the target object coexists in the historical rendering image and the first image, and the resolution of the target object in the historical rendering image is higher than the resolution of the target object in the first image; the target object in the historical rendering image is Projecting into the first image to obtain a projected frame; fusing the up-sampled image and the projected frame to obtain a target rendered image.
- At least one method may be selected from the first rendering mode or the second rendering mode to render the current viewing angle area. If the rendering mode of the current viewing angle area includes the second rendering mode, the resulting rendering If the resolution of the image is low, the first image obtained in the second rendering manner may be up-sampled to obtain an up-sampled image, thereby increasing the resolution of the rendered image. Then, the high-definition object in the historical rendering image is projected into the first image, so as to improve the projection frame with high-definition information and matching the object of the first image, and then fuse the up-sampled image and the projected frame to obtain high-definition target rendering image.
- the high-definition information in the high-definition rendered image that has been rendered can be reused to supplement the details of the low-resolution image obtained by the second rendering method, thereby obtaining a high-definition rendered image.
- the computing power required by the second rendering method is lower than that of the first rendering method, so that the method provided in this application can be deployed in devices with lower computing power, so that devices with lower computing power can also use the method provided in this application.
- the aforementioned rendering of the first scene data to obtain the first image may include: rendering data corresponding to the first ROI region of interest in the first scene data by using a second rendering method to obtain a first image, where the first ROI area is a preset area or an area determined from a viewing angle area corresponding to the first scene data according to a preset method.
- low-definition rendering when performing low-definition rendering, low-definition rendering may be performed on a region of interest (ROI) in the first scene data.
- ROI region of interest
- the model or lighting conditions of the ROI region are relatively complex. Performing low-definition rendering on the ROI area can significantly reduce the computing power required for rendering, so that the method provided in this application can be applied to devices with low computing power.
- the aforementioned projecting the object in the historical rendering image into the first image to obtain the projected frame may include: using the region including the target object in the historical rendering image as the second ROI region; The region corresponding to the target object in the second ROI region in the rendered image is projected into the first image to obtain a projection frame; the second ROI region and the first image in the projection frame are fused to obtain the target rendered image.
- the ROI area in the historical rendering image can also be projected, so as to facilitate the follow-up.
- the aforementioned rendering in the current viewing angle region may further include: rendering the data of the background region in the first scene data by a first rendering method to obtain a background rendering image, where the background region is an area other than the first ROI area in the viewing angle area corresponding to the first scene data; the above method may further include: fusing the target rendering image and the background rendering image to obtain an updated target rendering image.
- high-definition rendering can also be performed on the background area, and the high-definition rendering of the background area requires low computing power. equipment, and the rendered background part also needs to integrate the background rendering part and the rendering part of the ROI area to form a complete rendered image and obtain a high-definition rendered image.
- the above method may further include: if the first scene data meets a preset condition, determining that the rendering mode of the first scene data includes the first rendering mode, and the preset condition includes one or more of the following : background switching between the first scene data and the image obtained by rendering last time; or, the motion vector of at least one object in the first scene data is higher than the preset motion value, and the motion vector includes at least one object in the first scene
- high-definition rendering or low-definition rendering can be determined by means of background switching, motion vector, illumination change, or interval N frames, etc., so that a rendering method suitable for the scene can be selected to adapt to different 3D renderings.
- high-definition rendering images can also be obtained, so that low-definition rendering can be selected in suitable scenes to reduce the computing power requirements of the device.
- the first ROI area is calculated and determined based on one or more items of lighting information, shadow information, lighting reflection information, or material of the target object of the first scene data. Therefore, in the embodiments of the present application, when rendering the ROI region, more complex objects can be selected based on information such as lighting, shadows, light reflections, or the material of the object. Usually, if these objects are rendered in high-definition, more computing power is required , so this application performs low-definition rendering on this, which can significantly reduce the computing power requirement, thereby improving the possibility of deploying the method provided in this application to a terminal with low computing power.
- the aforementioned fusion of the up-sampled image and the projection frame to obtain the target rendered image may include: obtaining the first weight corresponding to each pixel in the first image through a weight coefficient network, The second weight corresponding to each pixel point of , the weight coefficient network is a neural network used to calculate the corresponding weights of at least two input frames of images; based on the first weight and the second weight, the up-sampled image and the projection frame are fused to obtain the target rendering image.
- the up-sampled image and the projected frame can be fused through a neural network.
- the neural network can be implemented by a network processor (neural-network processing unit, NPU) in the device, and rendering is usually performed by graphics A processor (graphics processing unit, GPU) is used for execution. Therefore, the application uses NPU to realize image fusion, which can reduce the calculation amount of the GPU, further reduce the computing power requirement of the GPU, thereby improving the rendering efficiency and improving the deployment of the method provided in this application. Possibility of using terminals with lower computing power.
- the aforementioned fusion of the up-sampled image and the projection frame based on the first weight and the second weight to obtain the target rendered image may include: interpolating the first weight to obtain each pixel in the up-sampled image The third weight corresponding to the point; based on the third weight and the second weight, the up-sampled image and the projected frame are fused to obtain the target rendered image.
- the resolution of the first image is relatively low, or the image input to the weight coefficient network may be down-sampled, so the resolution of the obtained first weight is also lower.
- the method of interpolation is used to obtain a higher resolution weight, so that the upsampled image and the projected frame are fused based on the higher resolution weight, so as to obtain a high-definition target rendering image.
- the above method may further include: correcting the projected frame by correcting the up-sampled image on the network to obtain a corrected projected frame, and correcting the network is the neural network used to filter the input image.
- the projection frame can be corrected to reduce ghosts in the projection frame, and the corrected image can be obtained.
- the present application can correct the projection frame by correcting the network, so the NPU can be used to realize the correction, reducing the computing power requirement of the GPU, thereby improving the possibility of deploying the method provided in the present application to a terminal with low computing power.
- the method before rendering the first scene data to obtain the first image, the method further includes: rendering the second scene data in a first rendering manner to obtain a historical rendering image.
- high-definition rendering can be performed on the second scene data, thereby obtaining a high-definition historical rendering image.
- projecting the object in the historical rendering image to the first image includes: according to the motion vector of the target object, projecting the historical rendering image The target object in the image is projected into the first image to obtain a projected frame, and the motion vector includes the offset of the target object between the first scene data and the second scene data.
- the projection can be performed according to the motion vector of the object, so that the object in the historical rendering image can be projected more accurately, ghosting in the projected frame can be reduced, and the definition of the projected frame can be improved.
- the aforementioned projecting the target object in the historical rendering image into the first image according to the motion vector of the target object to obtain the projected frame may include: determining the historical rendering according to the motion vector of the target object The position of the target object in the image in the first image, and the position is assigned to obtain the projection frame.
- the position of the object in the historical rendering image in the first image can be determined according to the motion vector of the object, so that the corresponding pixel value or color value of the object in the historical rendering image can be used for the first image.
- the position is assigned to obtain a high-definition projection frame.
- the present application provides a terminal, including a GPU and an NPU, as follows:
- the GPU is used to render the first scene data to obtain the first image, the first scene data is obtained according to the viewing angle area of the virtual camera, and the rendering mode includes at least one of the first rendering mode or the second rendering mode.
- the resolution of the image obtained by the rendering mode is higher than the resolution of the image obtained by the second rendering mode;
- the NPU is used to upsample the first image to obtain an upsampled image when the rendering mode of the first scene data includes the second rendering mode;
- the GPU is also used to obtain a historical rendering image, which is obtained by rendering the second scene data.
- the historical rendering image and the first image share a target object, and the resolution of the target object in the historical rendering image is higher than the resolution of the target object in the first image;
- the GPU is also used to project the target object in the historical rendering image into the first image to obtain a projected frame
- the NPU is also used to fuse the upsampled image and the projected frame to obtain the target rendered image.
- the GPU is further configured to render data corresponding to the first ROI region of interest in the first scene data by using the second rendering method to obtain a first image
- the first ROI region is a preset area or an area determined from the viewing angle area corresponding to the first scene data according to a preset method.
- the GPU is further configured to use the area including the target object in the historical rendering image as the second ROI area; project the area corresponding to the target object in the second ROI area in the historical rendering image to the first ROI area.
- a projection frame is obtained;
- the NPU is specifically used to fuse the second ROI region and the first image in the projection frame to obtain the target rendered image.
- the GPU is further configured to: render the data of the background area in the first scene data by using the first rendering method to obtain a background rendered image, where the background area is the viewing angle area corresponding to the first scene data Except for the first ROI area in the ROI; fuse the target rendering image and the background rendering image to obtain the updated target rendering image.
- the GPU is further used to determine that the rendering mode of the current viewing angle area includes the first rendering mode, and the preset conditions include one or more of the following:
- the motion vector of at least one object in the first scene data is higher than the preset motion value, and the motion vector includes at least one object in the first scene data. and the offset in the second scene data; or, the change in illumination between the first scene data and the second scene data exceeds a preset amount of change, and the illumination information includes illumination intensity, light source type, incident direction of the light source, or number of light sources At least one of ; or, there is an interval of N frames between the second scene data and the area that was last rendered using the first rendering mode, where N is a positive integer.
- the first ROI area is calculated and determined based on one or more items of lighting information, shadow information, lighting reflection information, or material of the target object of the first scene data.
- the NPU is further configured to: obtain a first weight corresponding to each pixel in the first image through a weight coefficient network, and a second weight corresponding to each pixel in the projection frame, the weight coefficient
- the network is a neural network used to calculate the respective weights of the input at least two frames of images; based on the first weight and the second weight, the up-sampled image and the projected frame are fused to obtain the target rendered image.
- the NPU is further configured to correct the projected frame by correcting the up-sampled image on the network before fusing the up-sampled image and the projected frame to obtain the target rendered image, to obtain the corrected projected frame, where the correction network is Neural network for filtering the input image.
- the GPU is further configured to render the second scene data in a first rendering manner to obtain a historically rendered image before rendering the first scene data to obtain the first image.
- an embodiment of the present application provides a graphics rendering device, and the graphics rendering device has a function of implementing the image processing method of the first aspect.
- This function can be implemented by hardware or by executing corresponding software by hardware.
- the hardware or software includes one or more modules corresponding to the above functions.
- an embodiment of the present application provides a graphics rendering device, including: a processor and a memory, wherein the processor and the memory are interconnected through a line, and the processor invokes program codes in the memory to execute any one of the first aspects above
- the processing-related functions shown are used in the graphics rendering method.
- the graphics rendering device may be a chip.
- an embodiment of the present application provides a graphics rendering device.
- the graphics rendering device may also be called a digital processing chip or a chip.
- the chip includes a processing unit and a communication interface.
- the processing unit obtains program instructions through the communication interface, and the program instructions are
- the processing unit executes, and the processing unit is configured to execute the processing-related functions in the first aspect or any optional implementation manner of the first aspect.
- an embodiment of the present application provides a computer-readable storage medium, including instructions, which, when executed on a computer, cause the computer to execute the method in the first aspect or any optional implementation manner of the first aspect. .
- an embodiment of the present application provides a computer program product including instructions, which, when run on a computer, causes the computer to execute the method in the first aspect or any optional implementation manner of the first aspect.
- Fig. 1 is a schematic diagram of a main frame of artificial intelligence applied by the application
- FIG. 2A is a schematic diagram of a system architecture provided by an embodiment of the present application.
- FIG. 2B is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- FIG. 3 is a schematic flowchart of a graphics rendering method provided by the present application.
- FIG. 5 is a schematic diagram of a viewing angle area provided by the present application.
- FIG. 6 is a schematic diagram of a projection method provided by the present application.
- FIG. 7 is a schematic diagram of a projection frame provided by the application.
- FIG. 8 is a schematic diagram of a projection frame correction method provided by the present application.
- FIG. 9 is a schematic diagram of a fusion weight calculation method provided by the present application.
- 10 is a schematic diagram of the difference between low-definition rendering and high-definition rendering provided by the application;
- FIG. 11 is a schematic diagram of a rendered image provided by the application.
- FIG. 13 is a schematic diagram of a ROI area provided by the application.
- FIG. 14 is a schematic diagram of a low-definition rendered image of a ROI region provided by the application.
- 15 is a schematic diagram of a rendered image of a ROI area in a current viewing angle area provided by the application.
- 16 is a schematic diagram of an output image provided by this application.
- 17 is a schematic diagram of the rendering effect for the ROI area in the solution provided by the application and the commonly used solution;
- 19 is a schematic structural diagram of another terminal provided by the application.
- 20 is a schematic structural diagram of a graphics rendering device provided by the application.
- FIG. 21 is a schematic structural diagram of another graphics rendering apparatus provided by the present application.
- Figure 1 shows a schematic structural diagram of the main frame of artificial intelligence.
- the above-mentioned artificial intelligence theme framework is explained in two dimensions (vertical axis).
- the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
- the "IT value chain” reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
- the infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communicate with the outside through sensors; computing power is provided by intelligent chips, such as central processing unit (CPU), network processor (neural-network processing unit, NPU), graphics processing unit (graphics processing unit, GPU), dedicated integration Circuit (application specific integrated circuit, ASIC) or field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips) are provided; the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include Cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
- intelligent chips such as central processing unit (CPU), network processor (neural-network processing unit, NPU), graphics processing unit (graphics processing unit, GPU), dedicated integration Circuit (application specific integrated circuit, ASIC) or field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips
- the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
- the data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
- Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
- machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
- Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
- Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
- some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.
- Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall solution of artificial intelligence, and the productization of intelligent information decision-making to achieve landing applications. Its application areas mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart city, etc.
- the embodiments of the present application involve a large number of related applications of neural networks and images.
- related terms and concepts in the fields of neural networks and images that may be involved in the embodiments of the present application are first introduced below.
- a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs and intercept 1 as inputs, and the output of the operation unit can be shown in the following formula:
- s 1, 2,...n, n is a natural number greater than 1
- Ws is the weight of xs
- b is the bias of the neural unit.
- f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
- the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
- a neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit.
- the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
- a deep neural network also known as a multi-layer neural network, can be understood as a neural network with multiple intermediate layers.
- the DNN is divided according to the position of different layers.
- the neural network inside the DNN can be divided into three categories: input layer, intermediate layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all intermediate layers, or hidden layers.
- the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
- each layer can be expressed as a linear relational expression: in, is the input vector, is the output vector, is the offset vector or bias parameter, w is the weight matrix (also known as the coefficient), and ⁇ () is the activation function.
- in is the input vector
- output vector is the offset vector or bias parameter
- w is the weight matrix (also known as the coefficient)
- ⁇ () is the activation function.
- Each layer is just an input vector After such a simple operation to get the output vector Due to the large number of DNN layers, the coefficient W and offset vector The number is also higher.
- the DNN Take the coefficient w as an example: Suppose that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
- the coefficient from the kth neuron in the L-1 layer to the jth neuron in the Lth layer is defined as
- the input layer does not have a W parameter.
- more intermediate layers allow the network to better capture the complexities of the real world.
- a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks.
- Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).
- Convolutional neural network is a deep neural network with a convolutional structure.
- a convolutional neural network consists of a feature extractor consisting of convolutional layers and subsampling layers, which can be viewed as a filter.
- the convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal.
- a convolutional layer of a convolutional neural network a neuron can only be connected to some of its neighbors.
- a convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location.
- the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network.
- the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
- the loss function can generally include loss functions such as mean square error, cross entropy, logarithm, and exponential.
- mean squared error can be used as a loss function, defined as Specifically, a specific loss function can be selected according to the actual application scenario.
- the neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller.
- BP error back propagation
- the input signal is passed forward until the output will generate error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges.
- the back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.
- the process of converting 3D/2D models into display images is widely used in games, movie special effects and other fields.
- the rendering process includes: modeling, building materials, building animation, and rendering display, etc. .
- TAA is an anti-aliasing algorithm widely used by commercial game engines in recent years. It is a framework based on time-domain multiplexing in the rendering field. Its post-processing method is integrated into the rendering process, the existing DLSS technology, and the ray tracing denoising algorithm SVGF. , BMFR, etc. are also improved technologies based on TAA.
- each object in the scene traverses all light sources to calculate the final displayed color.
- the pixel color to be displayed is finally synthesized.
- This function is generally supported by hardware, suitable for forward rendering pipeline, and can better handle geometric aliasing.
- a mask can be understood as data similar to an image.
- the image and the mask can be fused, so as to make some content in the image more attention.
- the mask can be used to extract the region of interest (ROI), for example, a pre-made ROI mask is used to fuse the image to be processed to obtain an ROI image, and the image value of the ROI remains unchanged. , and the image values outside the region are all 0. It can also play a shielding role, using a mask to shield some areas on the image so that it does not participate in processing or the calculation of processing parameters, or only processes or statistics the shielded area.
- ROI region of interest
- the foreground can be understood as the subject included in the image, or the object that needs attention, etc.
- the object in the foreground in the image is referred to as the foreground object.
- the foreground can also be understood as the region of interest (ROI) in the image.
- the background is the area of the image other than the foreground. For example, if an image includes a traffic light, the foreground (or called a foreground object) in the image is the area where the traffic light is located, and the background is the area other than the foreground in the image.
- the graphics rendering method provided by the embodiments of the present application may be executed on a server or a terminal device.
- the neural network mentioned below in the present application may be deployed on a server or a terminal.
- the graphics rendering method provided by the present application may be deployed in a terminal by means of a plug-in.
- the terminal device can be a mobile phone with image processing function, tablet personal computer (TPC), media player, smart TV, laptop computer (LC), personal digital assistant (PDA) ), a personal computer (PC), a camera, a video camera, a smart watch, a wearable device (WD), or an autonomous vehicle, etc., which are not limited in this embodiment of the present application.
- the graphics rendering method provided by the present application is deployed on a terminal as an example for illustrative description.
- All or part of the processes in the graphics rendering method provided by the present application may be implemented by a neural network, for example, the steps of upsampling, projection or fusion, etc., may be implemented by a neural network. Usually, the neural network needs to be deployed on the terminal after training.
- FIG. 2A an embodiment of the present application provides a system architecture 100 .
- a data collection device 160 is used to collect training data.
- the training data may include a large number of rendered high-quality images and unrendered three-dimensional models.
- the data collection device 160 After collecting the training data, the data collection device 160 stores the training data in the database 130 , and the training device 120 obtains the target model/rule 101 by training based on the training data maintained in the database 130 .
- the training set mentioned in the following embodiments of the present application may be obtained from the database 130 or obtained through user input data.
- the target model/rule 101 may be a trained neural network in this embodiment of the present application, and the neural network may include one or more networks for calculating fusion weights or correcting projected images.
- the training device 120 processes the input 3D model and compares the output image with the high-quality rendered image corresponding to the input 3D model until the training device 120 The difference between the output image and the high-quality rendered image is less than a certain threshold, so that the training of the target model/rule 101 is completed.
- the above-mentioned target model/rule 101 can be used to implement the neural network obtained by training the graphics rendering method in the embodiment of the present application, that is, pass the data to be processed (such as an image to be rendered or an image that needs to be further processed after rendering) through correlation. After preprocessing, input the target model/rule 101 to obtain the processing result.
- the target model/rule 101 in the embodiment of the present application may specifically be the neural network mentioned below in the present application, and the neural network may be the aforementioned CNN, DNN, or RNN and other types of neural networks.
- the training data maintained in the database 130 may not necessarily come from the collection of the data collection device 160, and may also be received from other devices.
- the training device 120 may not necessarily train the target model/rule 101 completely based on the training data maintained by the database 130, and may also obtain training data from the cloud or other places for model training, which is not limited in this application. .
- the target model/rule 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 2A , the execution device 110 can also be referred to as a computing device, and the execution device 110 It can be a terminal, such as a mobile phone terminal, a tablet computer, a laptop computer, augmented reality (AR)/virtual reality (VR), a vehicle terminal, etc., or a server or a cloud device.
- the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices, and the user can input data to the I/O interface 112 through the client device 140, the In this embodiment of the present application, the input data may include: pending data input by the client device.
- the client can be other hardware devices, such as a terminal or a server, and the client can also be software deployed on the terminal, such as an APP, a web page, and the like.
- the preprocessing module 113 and the preprocessing module 114 are used to perform preprocessing according to the input data (such as data to be processed) received by the I/O interface 112.
- the preprocessing module 113 and the preprocessing module may also be absent.
- 114 or only one of the preprocessing modules, and directly use the calculation module 111 to process the input data.
- the execution device 110 When the execution device 110 preprocesses the input data, or the calculation module 111 of the execution device 110 performs calculations and other related processing, the execution device 110 can call the data, codes, etc. in the data storage system 150 for corresponding processing , the data and instructions obtained by corresponding processing may also be stored in the data storage system 150 .
- the I/O interface 112 returns the processing result to the client device 140 so as to be provided to the user.
- the I/O interface 112 The classification result obtained above is returned to the client device 140 to be provided to the user.
- the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete The above task, thus providing the user with the desired result.
- the execution device 110 and the training device 120 may be the same device, or located inside the same computing device.
- the present application will introduce the execution device and the training device separately, without limitation.
- the user can manually give input data, and the manual setting can be operated through the interface provided by the I/O interface 112 .
- the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set the corresponding permission in the client device 140 .
- the user can view the result output by the execution device 110 on the client device 140, and the specific presentation form can be a specific manner such as display, sound, and action.
- the client device 140 can also act as a data collection terminal to collect the input data input into the I/O interface 112 as shown in the figure and the predicted label output from the I/O interface 112 as new sample data, and store them in the database 130 .
- the data is stored in database 130 .
- FIG. 2A is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
- the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .
- a target model/rule 101 is obtained by training according to the training device 120 , and the target model/rule 101 may be the neural network in the present application in this embodiment of the present application.
- the neural network provided in the embodiment of the present application It can include CNN, deep convolutional neural networks (DCNN), recurrent neural networks (RNN), or constructed neural networks, etc.
- the graphics rendering method in this embodiment of the present application may be executed by an electronic device, and the electronic device is the aforementioned execution device.
- the electronic device includes a CPU and a GPU, which can render images.
- other devices such as NPU or ASIC, may also be included, which are merely exemplary descriptions here, and will not be repeated one by one.
- the electronic device may be, for example, a mobile phone (mobile phone), a tablet computer, a notebook computer, a PC, a mobile internet device (MID), a wearable device, a virtual reality (VR) device, an augmented Augmented reality (AR) equipment, wireless electronic equipment in industrial control, wireless electronic equipment in self-driving, wireless electronic equipment in remote medical surgery, smart grid ( Wireless electronic equipment in smart grid, wireless electronic equipment in transportation safety, wireless electronic equipment in smart city, wireless electronic equipment in smart home, etc.
- the electronic device may be a device running an Android system, an IOS system, a windows system, and other systems.
- the electronic device may run an application that needs to render a 3D scene to obtain a two-dimensional image, such as a game application, a lock screen application, a map application, or a monitoring application.
- FIG. 2B is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- the electronic device 2000 may include: a central processing unit 2001 , a graphics processor 2002 , a display device 2003 and a memory 2004 .
- the electronic device 2000 may further include at least one communication bus (not shown in FIG. 2B ) for realizing connection and communication between various components.
- the various components in the electronic device 2000 may also be coupled through other connectors, and the other connectors may include various types of interfaces, transmission lines, or buses.
- the various components in the electronic device 2000 may also be in a radial connection manner centered on the central processing unit 2001 .
- coupling refers to electrical connection or communication with each other, including direct connection or indirect connection through other devices.
- the central processing unit 2001 and the graphics processing unit 2002 may be located on the same chip, or may be separate chips.
- the functions of the central processing unit 2001 , the graphics processing unit 2002 , the display device 2003 and the memory 2004 are briefly introduced below.
- the application 2006 may be a graphics-type application, such as a game, a video player, and the like.
- the operating system 2005 provides a system graphics library interface, and the application 2006 generates graphics or graphics for rendering through the system graphics library interface and drivers provided by the operating system 2005, such as graphics library user mode drivers and/or graphics library kernel mode drivers.
- the system graphics library includes but is not limited to: embedded open graphics library for embedded system (OpenGL ES), Khronos platform graphics interface (the Khronos platform graphics interface) or Vulkan (a cross-platform graphics application program interface) and other system graphics libraries.
- the instruction stream contains a series of instructions, which are usually invocation instructions for the interface of the system graphics library.
- the central processing unit 2001 may include at least one of the following types of processors: an application processor, one or more microprocessors, a digital signal processor (DSP), a microcontroller (microcontroller unit, MCU) or artificial intelligence processor, etc.
- processors an application processor, one or more microprocessors, a digital signal processor (DSP), a microcontroller (microcontroller unit, MCU) or artificial intelligence processor, etc.
- DSP digital signal processor
- MCU microcontroller unit
- artificial intelligence processor etc.
- the central processing unit 2001 may further include necessary hardware accelerators, such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or an integrated circuit for implementing logic operations.
- the processor 2001 may be coupled to one or more data buses for transferring data and instructions between the various components of the electronic device 2000 .
- Graphics processor 2002 used to receive the graphics instruction stream sent by the processor 2001, generate a rendering target through a rendering pipeline, and display the rendering target to the display device 2003 through the layer composition display module of the operating system.
- the rendering pipeline may also be referred to as a rendering pipeline, a pixel pipeline or a pixel pipeline, and is a parallel processing unit inside the graphics processor 2002 for processing graphics signals.
- the graphics processor 2002 may include multiple rendering pipelines, and the multiple rendering pipelines may independently process graphics signals in parallel.
- the rendering pipeline can perform a series of operations in the process of rendering graphics or image frames. Typical operations can include: vertex processing (Vertex Processing), primitive processing (Primitive Processing), rasterization (Rasterization), fragment processing (Fragment) Processing) and so on.
- graphics processor 2002 may include a general-purpose graphics processor that executes software, such as a GPU or other type of dedicated graphics processing unit, or the like.
- Display device 2003 used to display various images generated by the electronic device 2000, which may be a graphical user interface (GUI) of the operating system or image data (including still images and videos) processed by the graphics processor 2002 data).
- GUI graphical user interface
- display device 2003 may include any suitable type of display screen.
- display screen Such as liquid crystal display (liquid crystal display, LCD) or plasma display or organic light-emitting diode (organic light-emitting diode, OLED) display and so on.
- LCD liquid crystal display
- OLED organic light-emitting diode
- the memory 2004 is a transmission channel between the central processing unit 2001 and the graphics processing unit 2002, and may be a double data rate synchronous dynamic random access memory (DDR SDRAM) or other types of cache.
- DDR SDRAM double data rate synchronous dynamic random access memory
- the commonly used rendering upsampling is similar to the common image and video oversampling, which is proposed to solve the problem of reducing the time effect caused by insufficient sampling.
- the difference is that rendering is a discrete sample point in space and time, and low-resolution rendering will cause a high degree of overlap or aliasing. Therefore, the upsampling algorithm in the rendering pipeline is usually an anti-aliasing and interpolation algorithm; while ordinary images Most of the data sources in the video field come from cameras, and the color of each pixel is the integral of a pixel area.
- the lack of low-resolution sampling leads to blurring problems, and the upsampling method is deblurring and interpolation algorithms.
- the current screen resolution of mobile terminals is usually 1080p, and 2K screens and 4K screens will gradually appear.
- higher rendering resolution means greater GPU computing load.
- the geometry shading stage takes 20% of the time
- the fragment shading stage takes 80% of the time.
- the theoretical load of rendering 540p and rendering 1080p increases by 4 times, and the actual test is ⁇ 3.4 times.
- the theoretical load of rendering 1080p and 2k is increased by 1.8 times, and the actual test is ⁇ 1.6 times.
- the fragment shading stage is more sensitive to resolution changes, and reducing the amount of computation in the fragment shading stage can greatly reduce the GPU load.
- the present application provides a graphics rendering method, which reduces the computing power required for rendering by performing high-definition rendering and low-definition rendering at intervals, and multiplexes the high-definition information included in the projection frame, thereby obtaining a rendered image with higher definition.
- the graphics rendering method provided by the present application will be introduced in detail below.
- the method provided in this application can be applied to some AR or VR applications, such as shopping, games, projection and other applications.
- the graphics rendering method provided in this application can be used by smart wearable devices (such as VR glasses/AR glasses) etc.), mobile terminals (such as mobile phones, tablets, etc.), or other devices that can carry AR or VR applications, etc.
- VR application programs can be installed in wearable devices to provide users with services.
- VR applications can provide users with a variety of 3D scenes. Users can watch the 3D scenes in VR applications through the display screen of the wearable device, which is equivalent to giving users Immersive experience, improve user experience.
- the three-dimensional model needs to be rendered, so as to provide the user with a visual experience in the form of a rendered image output by the graphics rendering method provided in the present application.
- the method provided in this application may be applied to an AR game
- the application of the AR game may be installed in the user's mobile terminal or wearable device, and the user can experience AR immersive experience through the mobile terminal or wearable device games to improve user experience.
- the 3D scene in the game needs to be rendered, so that the 3D scene can be presented in the form of an image, and the 3D scene in the game can be rendered by the graphics rendering method provided in this application, Thus, a rendered high-definition image is obtained, which improves user experience.
- FIG. 3 a schematic flowchart of a graphics rendering method provided by the present application is as follows.
- the first scene data may be three-dimensional scene data, two-dimensional scene data, or scene data with more dimensions, etc.
- the first scene data may include multiple two-dimensional or three-dimensional models, and each model may It consists of several basic primitives.
- the present application exemplarily takes a three-dimensional scene as an example for illustrative description.
- the three-dimensional scene or three-dimensional data mentioned below may also be replaced by a two-dimensional scene or two-dimensional data, which will not be repeated below.
- the first scene data can be determined by the viewing angle area of the virtual camera.
- the viewing angle area corresponding to the first scene data is called the current viewing angle area
- the previous frame or multiple frames of the second scene are referred to as the current viewing angle area.
- the viewing angle area corresponding to the data is called the adjacent viewing angle area or the previous viewing angle area of the virtual camera.
- the first scene data can be obtained by taking the data corresponding to the current viewing angle area in a larger two-dimensional or three-dimensional scene through the viewing angle area of the virtual camera.
- the present application exemplarily takes the data corresponding to the current viewing angle area obtained from the three-dimensional scene as an example for illustration.
- the three-dimensional scene data mentioned below can also be replaced with two-dimensional scene data or scene data of more dimensions, etc., This application does not limit this.
- the 3D scene data may be constructed by selecting 3D models from a 3D model library by the server, and these 3D models may include models or objects in shapes such as trees, light sources, houses, buildings, geographic environments, characters or animals.
- the resulting 3D model can form a 3D scene.
- the model in the viewing angle area may also change.
- the three-dimensional scene data may be data in various scenarios, and the three-dimensional data in some scenarios will be exemplarily introduced below.
- the 3D scene data may include data for constructing 3D scenes in AR or VR applications, such as some AR/VR games, AR maps, etc., each 3D model is composed of basic primitives, and then a plurality of 3D models are composed of similar to reality.
- the virtual 3D scene can then be rendered into a visible image through the following steps, which can be displayed on the display screen of the AR or VR device, so that the user can observe the 3D scene through the display screen and improve the user experience.
- high-definition rendering may also be performed on the second scene data to obtain a high-definition historically rendered image.
- the second scene data may be scene data corresponding to a previous viewing angle area of the virtual camera or an adjacent viewing angle area of the current viewing angle area.
- step 302. Determine whether to perform low-definition rendering on the first scene data, if yes, execute step 303, and if not, execute step 307.
- the rendering methods of scene data can be divided into high-definition rendering (or called the first rendering method) and low-definition rendering (or called the second rendering method).
- the resolution of the image obtained by high-definition rendering is higher than that obtained by low-definition rendering.
- the definition of the image obtained by high-definition rendering is also better than that of the image obtained by low-definition rendering, and the more complex the model, the more computing resources are consumed for rendering.
- the current perspective area is the area that needs to be rendered currently in the scene. It can be understood that when rendering, a virtual camera can be predetermined, and then rendering is performed from the perspective of the virtual machine. For example, when the user uses a game constructed with a 3D scene, the user's display angle of view can follow the movement of the character manipulated by the user, the user's angle of view is the angle of view of the virtual camera, and the visible area displayed on the display is the current angle of view area.
- the preset conditions may include but are not limited to one or more of the following:
- the information of illumination includes at least one of illumination intensity, type of light source, incident direction of light source or number of light sources.
- changes in illumination may cause changes in the 3D model. is larger, so high-definition rendering is required to obtain a clearer image; or, there is an interval of N frames between the current viewing angle area and the area where high-definition rendering was performed last time, N is a positive integer, and N can be a preset value, or is a value determined from data entered by the user.
- high-definition rendering can be performed, which is equivalent to performing high-definition rendering and low-definition rendering at intervals, and some scenes with large changes High-definition rendering can be used, while scenes with less changes can be rendered in low-definition, and then reuse the high-definition information in historically rendered images to improve the clarity of images output by low-definition rendering, and high-definition images can also be obtained.
- the computing resources required for rendering are reduced by low-definition rendering, so that the method provided in this application can be deployed not only in servers, but also in devices with lower computing power, such as terminal devices such as mobile phones, TVs, or tablets, thereby improving User experience of end devices.
- the way of rendering the scene can include high-definition rendering and/or low-definition rendering.
- high-definition rendering and low-definition rendering can be performed at the same time, such as the model or the foreground part with complex lighting conditions.
- Low-definition rendering is used to reduce computing power requirements, and high-definition rendering can be performed for background parts with simple lighting conditions, which consumes less computing power, so that the rendering effect of the background part is better and the user experience is improved.
- low-definition rendering can be performed on the current viewing angle area to obtain the first image.
- rendering methods such as PBR, forward rendering pipeline, or delayed rendering pipeline may be used.
- the difference between high-definition rendering and low-definition rendering is that the resolution of the image output by low-definition rendering lower rate.
- coloring can be performed in units of multiple points to reduce the computing power required for rendering, that is, there is no need to calculate its color value for each point, and multiple points can be used for coloring.
- the color value is calculated in units, thereby reducing the computing resources required to calculate the color value and improving rendering efficiency.
- low-resolution rendering may be performed on a first region of interest (region of interest, ROI) region in the first field data to obtain a first image, where the first ROI region is a preset region or The area determined by the preset method.
- ROI region of interest
- the current viewing angle area can be divided into foreground and background parts, the foreground can be used as the ROI area, low-definition rendering can be performed on the ROI area in the current viewing angle area, and high-definition rendering can be performed on the background part.
- the background part may have low model complexity or simple lighting conditions, so the computing power required for rendering is also low, while the ROI area may require high computing power for rendering due to high model complexity or complex lighting conditions , the present application provides a low-definition rendering method to reduce computing power requirements, thereby reducing the overall computing power requirements, so that the method provided in this application can be deployed in terminals with low computing power to improve user experience.
- the first ROI area may be determined with reference to the information of the illumination, the information of the shadow, the information of the illumination reflection or the material of the object in the current viewing angle area.
- the illumination information may specifically include information such as illumination intensity, light source type, incident direction of the light source, or the number of light sources.
- the shadow information may include shadow-related information such as the area of the shadow or the number of shadowed areas.
- the information of light reflection may include Information such as the direction of the light reflection, the intensity of the reflected light, etc.
- the first ROI area may also be an area determined according to data input by the user.
- the user can set the area that needs to be rendered in low-definition through the touch screen of the terminal, so that the user can select the ROI area according to the actual application scene, so that the user can determine the right Which areas are rendered in low-definition to improve user experience.
- high-definition rendering may also be performed on the background portion of the current viewing angle area except the first ROI area, thereby obtaining a high-definition rendered image of the background portion.
- the amount of computation required for rendering the background part is lower than that required for rendering the ROI area. Therefore, in this application, the ROI area can be rendered in low-definition, and the background part can be rendered in high-definition, which requires less computation. , the rendered image can be obtained, so that the solution provided by the present application can be applied to a device with low computing power, and the generalization ability is strong, and the user experience of using a device with low computing power is improved.
- the first image may be up-sampled to obtain an up-sampled image with a higher resolution.
- the upsampling method may include interpolation or transposed convolution, such as bilinear interpolation, bicubic interpolation, etc.
- interpolation or transposed convolution such as bilinear interpolation, bicubic interpolation, etc.
- the present application exemplarily takes upsampling by interpolation as an exemplary description, and the following The mentioned interpolation can also be replaced by other upsampling operations, which will not be repeated below.
- a historical rendering image is also acquired, and the historical rendering image may be obtained by rendering the second scene data.
- the historical rendering image and the first image include the same target object, and the resolution of the target object in the historical rendering image is higher than the resolution of the target object in the first image.
- the historical rendering image may be the rendering image corresponding to the adjacent viewing angle of the current viewing angle or the previous viewing angle area of the virtual camera, or the rendered image of the previous frame or multiple frames, the resolution of which is higher than that of the first image, which can be understood as High-definition rendered images.
- the target object in the historically rendered image can be projected into the first image to obtain a projected frame, where the target object is an object included in both the first image and the historically rendered image.
- an identifier can be added to the objects in the scene data, and each object can have a unique identifier, that is, the common target object included in the first scene data and the second data can be determined by the identifier of the object, and the target object can be determined.
- Information such as the orientation and motion status of objects in each scene.
- the target object in the historical rendering image can be projected into the first image according to the motion vector to obtain the projected frame, and the motion vector includes the offset of the target object in the adjacent viewing angle area of the three-dimensional scene and the current viewing angle area. quantity.
- the motion vector of the target object determine the position of the target object in the historical rendering image in the first image, and assign the position (such as color value, depth value or value corresponding to different ID, etc.) to obtain the projection frame .
- the position of the object moving from the adjacent viewing angle to the current viewing angle can be determined according to the moving speed of the object, and then the position is assigned in the first image, and the high-definition object in the historical rendering image can be projected to the first image. In the image, a high-definition first image is obtained.
- a second ROI area may be determined from historical rendered images, and the second ROI area and the aforementioned first ROI area include the same object.
- the second ROI region in the historically rendered image may be projected into the first image to obtain a projected frame. Therefore, in the embodiment of the present application, if only the first ROI area in the current viewing angle area is low-definition rendering, only the objects in the second ROI area in the historical rendering image can be projected into the first image, thereby obtaining Projection frame.
- the projection frame can also be corrected by the up-sampled image of the correction network to obtain the corrected projection frame, and the correction network is a neural network for filtering the input image. , which can be filtered from the temporal or spatial dimension to make the pixel values of objects in the projected frame more accurate. It can be understood that after projecting the object in the historical rendering image to the first image, the object in the historical rendering image may overlap with the object in the first image, or the object should be covered by other objects, etc.
- the correction network corrects the pixel value of the object in the projection frame to obtain the corrected projection frame, so that the pixel value of the object in the projection frame is more reasonable.
- Step 304 may be executed first, or step 305 may be executed first, or step 304 and step 305 may be executed at the same time. Specifically, it can be adjusted according to the actual application scenario. It will not be repeated here.
- the high-definition information in the projection frame is fused into the up-sampled image to obtain a high-definition target rendering image.
- the first weight corresponding to each pixel in the first image and the second weight corresponding to each pixel in the projection frame can be obtained through a weight coefficient network, the weight coefficient network can use A neural network for calculating the respective weights of the input at least two frames of images; then, based on the first weight and the second weight, the up-sampled image and the projected frame are fused to obtain the target rendered image.
- the first weight is calculated based on the first image, and the resolution of the first image is relatively low
- the first weight is interpolated, A weight matrix with a higher resolution, that is, a third weight, is obtained, so that the second weight and the third weight can be used to fuse the up-sampled image and the projected frame subsequently to obtain a target rendered image with a higher resolution.
- the up-sampled image and the projection frame can also be used as the input of the weight coefficient network, so that the first weight can be calculated by using the up-sampled image with higher resolution, that is, the weight does not need to be interpolated, The resulting first weight is more accurate.
- the weight coefficient network can be obtained by training a large number of samples.
- the sample may include a large number of image pairs, and weight values annotated (eg, manually annotated or calculated by other means), so that the weight coefficient network can output more accurate weight values.
- the present application outputs the weight value by means of a neural network, which can be calculated by the NPU of the device, and the shading load of the GPU of the Jiangdu device.
- the second ROI area in the historically rendered image is projected into the first image to obtain the projected frame
- low-definition rendering can be performed on the ROI area, and the definition of the ROI area in the target rendered image can be improved by multiplexing the high-definition ROI in the historically rendered image, while the computing resources required for high-definition rendering of the background part less, so it consumes less computing resources to get high-definition rendered images. Therefore, the method provided in this application can be deployed in various devices and has strong generalization ability. For example, it can adapt to devices with low computing power, such as mobile phones, tablets and other electronic devices, so as to improve the experience of users using these electronic devices.
- the second image can be used as a new historical rendering image to assist the subsequent processing of the viewing angle area and improve the clarity of the image.
- the computing power requirement can be reduced by means of interval high-definition rendering and low-definition rendering, so that the method provided by the present application can be deployed in devices with lower computing power, such as terminal devices, and the generalization ability powerful. Therefore, the terminal device can also realize the rendering of the three-dimensional model, for example, it is applied in scenarios such as three-dimensional games and AR applications, thereby improving the user experience of the terminal device. Moreover, after the low-definition rendering is performed, the high-definition information in the projection frame can be reused to improve the definition of the image corresponding to the current viewing angle area, thereby obtaining a high-definition rendered image.
- the graphics rendering method provided by the present application can be applied to the rendering of a three-dimensional scene, and the area to be rendered in the three-dimensional scene can be changed with the movement of the virtual camera.
- the viewing angle area of the virtual camera can be rendered, and the virtual camera may be moving, so it is necessary to continuously render the constantly changing viewing angle area of the virtual camera.
- the viewing angle area within a certain period of time is As a sub-scene that needs to be rendered in one frame, the current viewing angle area is hereinafter referred to as the current frame, and multiple frames of scenes can be rendered continuously.
- FIG. 4 shows the complete process of rendering one of the frames.
- the current viewing angle area 401 of the virtual camera is determined.
- the current viewing angle area is the area that needs to be rendered in the 3D scene.
- the AR glasses can be used as a virtual camera, and the user can view various areas in a three-dimensional scene by controlling the orientation of the AR glasses.
- a three-dimensional scene can be understood as a scene surrounding a virtual camera, and the virtual camera can be understood as processing a three-dimensional scene.
- the field of view of the virtual camera can be set according to the actual application scene. The larger the angle of view, the larger the range of the current viewing angle area, the smaller the angle of view can be preset, the smaller the range of the current viewing angle area accordingly.
- the data corresponding to the current viewing angle area ie the first scene data
- the data corresponding to the 3D scene can be extracted from the data corresponding to the 3D scene for subsequent rendering.
- the rendering method may include low-definition rendering 403 and/or high-definition rendering 408.
- High-definition rendering means setting more points for rendering in the scene, while low-definition rendering means setting fewer points for rendering in the scene.
- the resolution is higher than the resolution of the image output by the low-definition rendering.
- the rendering method can be determined in a variety of ways. If the scene is switched between the current viewing angle area and the adjacent viewing angle area, or the background is switched, for example, when the user is playing a 3D game, he enters the copy from the current scene, that is, enters the copy.
- the motion vector of the object in the current viewing angle area is higher than the preset motion value, and the motion vector includes the offset of the object between the adjacent viewing angle area of the 3D scene and the current viewing angle area, such as The optical flow or the movement speed of the object, etc.; or, the change of the illumination between the current viewing angle area and the adjacent viewing angle area exceeds the preset change amount, and the illumination information includes at least one of the illumination intensity, the type of the light source, the incident direction of the light source, or the number of the light sources.
- the change of illumination may cause large changes in the 3D model, so high-definition rendering is required to obtain a clearer image; or, there is an interval of N frames between the current viewing angle area and the area where the last high-definition rendering was performed, and N is a positive integer .
- the rendering mode can be determined adaptively according to various scenarios, and low-definition rendering can be used in some scenes with minor changes, and the high-definition information in the rendered image of the previous frame can be reused, thereby To obtain a high-definition image of the current frame, a high-definition image can also be obtained on the basis of reducing the amount of calculation.
- high-definition rendering and low-definition rendering of the current viewing angle area have different processing flows. If the rendering mode of the current viewing angle area includes low-definition rendering, steps 403-407 are performed as shown in FIG. 4 . If the rendering mode of the viewing angle area is high-definition rendering, step 408 is executed, which will be described separately below.
- low-definition rendering 403 is performed on the current viewing angle area to obtain a rendered first image.
- the rendering method may include various methods, such as rendering using a forward rendering pipeline, a delayed rendering pipeline, or AI shading.
- GPU hardware may support programmable rendering pipelines, which may typically include vertex shaders, shape (primitive) rigging, geometry shaders, rasterization or fragment shaders, and the like.
- the commonly used vertex shader is generally used for coordinate transformation of model vertices.
- the geometry shader can be used to color the graphics formed by each vertex. After the geometry output by the geometry shader is rasterized, the fragment shader is used for calculation. The final displayed pixel color and colorize the graphics. Then, the shaded graphics output by the fragment shader can be mixed to obtain a complete rendered image, and the rendered image can be tested, such as smoothing pixels or denoising, to obtain the final output rendered image.
- rendering can be performed through a physics based rendering (PBR) model.
- PBR physics based rendering
- L o is along the The color of the outgoing light in the direction
- L e is the direction along the surface of the object
- Li is the surface of the object along the The color of the incident light direction
- F is the reflection distribution function of the object surface according to the incident light
- the outgoing light such as in is the normal distribution function, is a geometric function
- the Fresnel function It can be seen from the calculation formula of the above shading equation that the calculation complexity of the color of the reflection part of each pixel is linearly related to the number of light sources, and the calculation function is also relatively complex. Therefore, the ROI area in the current viewing angle area can be determined according to the number of light sources in the model, and the accuracy is accurate. Find the area with high computational complexity in the current viewing angle area.
- Using AI for rendering can reduce the complexity of shading calculations, thereby reducing the load on the GPU.
- the model can be rendered to different high-definition or low-definition textures by offscreen rendering, that is, the GPU opens a new buffer outside the current screen buffer during rendering. Rendering operations are performed without displaying them on the display, resulting in a rendered image.
- the obtained first image has a lower resolution, and the first image can be up-sampled, thereby achieving the purpose of improving the image resolution.
- upsampling can be performed by means of interpolation, such as bilinear interpolation, trilinear interpolation, etc., or bicubic interpolation, or other interpolation such as 0-padding interpolation, transposed convolution, or bicubic interpolation, etc. algorithm.
- interpolation such as bilinear interpolation, trilinear interpolation, etc., or bicubic interpolation, or other interpolation such as 0-padding interpolation, transposed convolution, or bicubic interpolation, etc. algorithm.
- interval low-definition resolution rendering is adopted in the embodiments of the present application.
- the image output by the low-definition rendering may be up-sampled to improve the resolution of the image, thereby obtaining an image with a higher resolution.
- the historical rendering image is a high-definition image obtained after rendering the previous frame, and the method for obtaining the historical rendering image is similar to the method for obtaining the target rendering image. All or part of the objects in the historically rendered image may be projected into the first image obtained by low-definition rendering, so as to obtain a high-definition projected frame.
- the historical rendering image can be projected according to the motion vector, for example, the position of the object in the historical rendering image in the first image is calculated according to the motion vector, and the high-definition object in the historical rendering image is projected to the first image according to the position.
- a high-definition projection frame is obtained. It is equivalent to multiplexing the high-definition information included in the historical rendering image, and projecting the high-definition object included in the historical rendering image into the first image, so that the projection frame obtained after projection has high-definition information, which is equivalent to adding the first image.
- the sharpness of individual objects in the image is equivalent to adding the first image.
- each pixel of the projected frame (t-1) uses the motion vector (motion vector, mv) generated by the renderer to calculate the distance between the historical rendering image and the common objects in the current viewing area in the current frame space. Coordinate position, and then assign a color value based on the coordinate position to obtain a projected projection frame, that is, assign the information in the historical rendering image to the rendered image of the current viewing angle area, thereby improving the clarity of the rendered image in the current viewing angle area.
- motion vector motion vector, mv
- the projection frame can usually be corrected to obtain the corrected projection frame.
- a scenario is exemplified as an example for introduction.
- the current viewing angle area includes objects A and B
- a and B are independent objects in the historical rendering image
- a and B may partially overlap.
- a and B in the historical rendering image are projected into the current frame, which may cause both A and B to appear in the same area, resulting in ghosting of one of the objects. Therefore, it is necessary to correct the projected frame, such as determining the relative relationship between A and B. According to the depth of the virtual camera, which object in the current viewing angle area is closer to the virtual camera is determined according to the depth, and the pixel value of the overlapping area is corrected to the pixel value of the object.
- Specific correction methods can include a variety of methods, such as AABB clipping (AABB clipping and clamping), convex hull clipping (convex hull clipping), or variance clipping (variance clipping), etc., and neural network methods can also be used for correction to reduce GPU workload.
- the maximum and minimum values of the colors of several dry pixels (such as 5 or 9) around the current frame of the back-projected pixel are calculated to obtain the AABB of the pixel.
- Bounding box, the dashed box of the AABB bounding box represents the color space of the pixels around a certain pixel point projected to the current frame, and the 3 points (different colors, not shown in the figure) on the edge of the matrix represent the 3 points around the current projection position.
- the color of a point if the color of the projected point in the projected frame is outside the AABB bounding box, it means that the color does not conform to the color distribution of the projected point in the current frame, and needs to be corrected.
- the color of the projected point can be directly subtracted from the projected point to obtain the color vector, and then the color vector and the AABB bounding box are intersected to obtain the color paper of the projected point, and finally the final corrected color is obtained by interpolation.
- Other correction methods are also similar, the difference is that the method of calculating the bounding box is different, and will not be repeated here.
- a neural network can be used to implement the correction of the projected frame, and the network is hereinafter referred to as a correction network.
- the input to the rectification network may include projected frames and may also include upsampled images.
- the correction network can be implemented by various networks, such as CNN, DNN, deep convolutional neural networks (DCNN), recurrent neural networks (RNN), etc., and can also be constructed by structural search.
- the obtained neural network, the structure of the neural network mentioned here and below in this application is not limited.
- the neural network mentioned in this application can be a neural network obtained by training using a training set. For example, a large number of sample pairs can be collected, and each sample pair includes a projected frame and a current frame with lower definition (which can be understood as The above-mentioned up-sampled image), and the high-definition corrected image, the correction network is supervised training through a large number of samples, so that the correction network outputs a corrected higher-definition image.
- the correction network may include multi-layer convolutional layers.
- the L1loss may be calculated pixel by pixel for the output image and the rendered high-definition image, and the rules for image correction may be learned in a data-driven manner.
- the trained network model can be used for inference, and the corrected image is output.
- the up-sampled image and the projection frame can be fused to obtain a higher-definition target rendering image.
- the fusion method of the up-sampled image and the projection frame may include various methods, such as channel splicing or weighted fusion.
- the fusion method is not limited in this application.
- a fusion weight can be calculated, that is, the value of the weight respectively occupied by the upsampled image and the projected frame during fusion.
- the weights occupied by the up-sampled image and the projection frame can be calculated by combining the motion vector.
- the motion vector of each object and the weight of the projection frame are negatively correlated, that is, the larger the motion vector of the object, the higher the weight of the object in the projection frame.
- the weight of the area is also smaller, so as to avoid the object being unclear due to the object movement.
- the respective weights occupied by the up-sampled image and the projected frame may be output through a neural network, and the neural network is hereinafter referred to as a weight coefficient network.
- the weight coefficient network can be implemented through various structures, such as CNN, DNN, DCNN, RNN or regression network, etc., through the training of the weight coefficient network, or a neural network constructed through structure search. The application is not limited.
- each pixel value represents the fusion coefficient of the pixel.
- one of the first image or the up-sampled image and the projection frame can be used as the input of the weight coefficient network, and the fusion weight map can be output.
- the size of the output fusion weight map may be the same as that of the first image, then the fusion weight map can be interpolated and enlarged to make the fusion weight map
- the size of the image matches the size of the final output image to achieve the fusion of the upsampled image and the projected frame.
- the image input to the weight coefficient network can also be reduced, so that the image input to the weight coefficient network is smaller and the calculation amount of the weight coefficient network is reduced.
- the fusion weight map can be interpolated and enlarged, so that the size of the fusion weight map matches the size of the final output image, so as to realize the fusion of the up-sampled image and the projection frame. Therefore, in this embodiment, the calculation amount of the weight coefficient network can be reduced by reducing the size of the input image of the weight coefficient network, and the resolution of the fusion weight map can be improved by interpolation, so that the final fusion weight map The size matches the size of the upsampled image and the projected frame for subsequent fusion.
- the up-sampled image and the rectified projection frame are reduced, such as down-sampling to obtain an image with a smaller resolution, and then the image with a smaller resolution is used as the input of the weight coefficient network, and the output
- the images with smaller resolution are respectively the fusion weight map of the object, and then the fusion weight map is processed by methods, such as interpolation, to obtain a fusion weight map with higher resolution, including the weight corresponding to the upsampled image or the corrected projection frame value.
- the up-sampled image and the projection frame can be fused to obtain a high-definition target rendering image.
- the neural network used can be a lightweight U-Net network for mobile terminals
- the input of the rectification network can include projection frames
- the output can include rectified projection frames
- the input of the weight coefficient network can include upsampled images and rectified images.
- the projection frame of outputs a single-channel fusion weight map, and each pixel represents the weight of the corresponding pixel in the up-sampled image or the rectified projection frame.
- the data-driven method is used to solve the network parameters.
- the calculation method of the loss function can be expressed as:
- w 1 , w 2 , w 3 are different cumulative weight values, which can be preset values or values updated by training
- gt represents the high-definition reference frame used in the training phase
- prev correct is the projection frame after projection correction
- the color blend represents the final rendered image after blending.
- the last item is used for multi-frame temporal smoothing, which interpolates the different rendered frames within the range of N frames to make the output result smoother.
- the trained neural network can be deployed in the device alone to output the corresponding value.
- a high-definition target rendering image can be obtained.
- the difference is that when high-definition rendering is performed, more points are rendered than in low-definition rendering, so that a rendered image with higher resolution can be obtained.
- the difference between high-definition rendering and low-definition rendering is that when rendering, the number of points to be rendered is not the same. Therefore, the resolution of the image obtained by high-definition rendering is higher than that of the image obtained by low-definition rendering, and the image obtained by high-definition rendering is clearer.
- the difference between low-definition rendering and high-definition rendering can be shown in Figure 10.
- the rendered points are discrete points in time and space.
- the pixel points are colored once, and during high-definition rendering, each rendered point is colored four times.
- the resolution of the image obtained by high-definition rendering is obviously higher than that of the image obtained by low-definition rendering. More information is included in the rendered image.
- the target rendering image can be subjected to post-processing, and the post-processing process may be different in different scenes.
- the rendered image can be as shown in Figure 11, and the rendered image can be optimized or transmitted to the display module display, etc.
- the target rendered image can be saved in the memory, so that the rendered image can be directly obtained from the memory subsequently.
- the target rendering image When rendering the next frame, the target rendering image can be used as the historical rendering image, so that when rendering the next frame, the high-definition information included in the historical rendering image can be reused to obtain a high-definition rendered image.
- high-definition rendering or low-definition rendering can be adaptively selected to render the current frame, thereby reducing the amount of computation required for rendering the three-dimensional scene through low-definition rendering.
- the high-definition information included in the high-definition historical rendering images can be reused to improve the definition of the image output by low-definition rendering. Therefore, even if low-definition rendering is performed, high-definition rendering can be obtained.
- high-definition images can also be obtained. Therefore, the method provided in this application can be adapted to devices with lower computing power, has strong generalization ability, and has high user experience.
- the foreground part in the current viewing angle area is rendered with a larger amount of computation, while the Beijing part may require less computation. Therefore, you can choose to perform high-definition rendering for the foreground part or Low-definition rendering, and high-definition rendering of the background part, and then fuse the respectively rendered foreground and background parts to obtain high-definition foreground and background parts. This scenario is described in more detail below.
- FIG. 12 a schematic flowchart of another graphics rendering method provided by the present application.
- the ROI area (or called the first ROI area) may be an area determined according to the complexity of the model in the three-dimensional scene, or may be an area determined according to user input data, etc., and it is determined from the ROI area that the Low-res rendered model.
- objects in a 3D scene can be classified in advance, and the objects are classified according to their complexity. For some objects that require less geometry, they can be classified into simpler models, and for objects that require more geometry, they can be classified into simpler models. For complex models, etc., when rendering, the model complexity of the object can be determined according to the classification of the object, so as to determine whether to perform low-definition rendering.
- the ROI area may be determined according to the information of the light, the information of the shadow, the information of the light reflection or the material of the object in the current viewing angle area.
- the area where the model is located is used as the ROI area.
- the ROI area may be determined according to the area selected by the user.
- the user can select an area from the current viewing angle area of the 3D scene as the ROI area through the touch screen of the terminal, so that the rendering can be performed according to the user's needs and the user experience can be improved.
- the ROI area in the current viewing angle area may be a high-precision map area determined according to the user's selection, such as a workflow map selected by the user, such as a metal workflow: color map (albedo), metalness map (metalness map) ), roughness, normal; reflectance-smooth workflow: diffuse, specular, glossness/smoothness, normal, and more Pre-baked effect maps, etc.
- a workflow map selected by the user such as a metal workflow: color map (albedo), metalness map (metalness map) ), roughness, normal; reflectance-smooth workflow: diffuse, specular, glossness/smoothness, normal, and more Pre-baked effect maps, etc.
- one of the characters with more complex rendering is selected as the ROI area in the 3D scene, and the background part of the ROI area is filtered to obtain the character model that needs to be rendered in low definition.
- the character model here is only an exemplary description for the convenience of differentiation, and in an actual application scenario, the three-dimensional scene may not usually be displayed on the display interface.
- an area with complex lighting calculations or complex model materials in rendering can be used as the ROI area to perform interval high-definition rendering or low-definition rendering.
- the low-definition rendering method can significantly Reduce the amount of calculation required for rendering, reduce the computing power requirements of the device, so that devices with low computing power can also render the ROI area, and improve the ROI of the current viewing angle area through subsequent multiplexing of historical rendering images
- low-definition rendering can be performed on the ROI area.
- the rendering process is similar to the aforementioned step 403, the difference is only in the size of the rendering area, that is, the ROI area may be smaller than the current viewing angle area. Here No longer.
- the image of the rendered ROI region can be upsampled to obtain an upsampled image with a higher resolution, that is, a rendered image of the ROI region with a higher resolution.
- an upsampled image with a higher resolution that is, a rendered image of the ROI region with a higher resolution.
- the ROI region (or referred to as the second ROI region) in the historical rendering image is projected into the first image to obtain the projected frame.
- the second ROI region and the aforementioned first ROI region include the same object. If the object in the aforementioned first ROI region includes a "cat", the second ROI region also includes the "cat".
- the specific projection method is similar to the foregoing step 405, and details are not repeated here.
- the projection frame can usually be corrected to obtain the corrected projection frame.
- the correction method is similar to the foregoing step 406, and details are not repeated here.
- the projection frame is corrected to obtain the corrected projection frame
- the rendered image of the ROI area in the current viewing angle area and the corrected projection frame can be obtained.
- the projected frames are fused, and the fusion process is similar to the aforementioned step 407, and will not be repeated here.
- low-definition rendering can be performed on the ROI area in the current viewing angle area, and then the information of the high-definition ROI area included in the historical rendering image is reused to improve the clarity of the rendered image of the ROI area in the current viewing angle area. Therefore, even in the case of obtaining a low-definition rendered image using low-definition rendering with a lower amount of calculation, the high-definition information in the historical rendering image can be reused, so that the rendered image in the ROI area is more high-definition, and the consumption is low. High-definition rendered images are obtained under the condition of computational complexity.
- a low-definition rendered image is obtained after low-definition rendering of the ROI area in the current viewing angle area, and the ROI area in the historical rendering image is projected into the current viewing angle area to obtain a projected frame.
- the low-definition rendered image and the projected frame are then fused to obtain a clear output image.
- low-definition rendering results in a lower resolution image, and of course, the less computing power is required.
- the low-definition rendered images can be supplemented with more details, resulting in a higher-definition output image.
- high-definition rendering can also be performed on the background part of the current viewing angle area except the ROI area, thereby obtaining a high-definition background rendering image.
- the manner of performing high-definition rendering on the background part may refer to the foregoing step 408, which will not be repeated here.
- the rendered images of the ROI region and the background portion can be fused to obtain a complete rendered image of the current viewing angle region. Therefore, in the embodiment of the present application, the selected model can be rendered on a low-definition texture, and a high-definition texture can be obtained by upsampling and fusing the information included in the historical rendering image, and the high-definition texture and The high-definition rendered image of the background part gets a full high-definition rendered image.
- the fusion method may include splicing the rendered images of the ROI region and the background portion.
- the rendered image of the ROI area can be obtained by low-definition rendering and then by multiplexing the high-definition model in the historically rendered image
- the background rendered image can be obtained by performing high-definition rendering on the background part of the current viewing angle area except the ROI area.
- the rendered image of the ROI area and the background rendered image are stitched together, or the rendered image of the character model is projected into the background rendered image, so as to obtain a complete high-definition rendered image of the current viewing angle area.
- f t-1 represents the historical rendering image
- f t-1 t represents the projection frame obtained by projecting the ROI extracted from the historical rendering image to the current frame, and then correcting the projected frame through the correction network to obtain the corrected projection frame f t-1 recitify .
- select the ROI area from the current viewing angle area for rendering obtain the rendered ROI area, and then upsample the rendered ROI area.
- the sampled rendered image ft up is selected from the current viewing angle area for rendering.
- f t-out is the rendered image of the fused ROI area. Then, the rendered image of the ROI area in the current frame, that is, f t-out , and the rendered image of the background area of the current frame are fused to obtain a complete rendered image.
- steps 1211 and 1212 reference may be made to the aforementioned steps 408 and 409, which will not be repeated here.
- the amount of calculation can be reduced by separating high-definition rendering and low-definition rendering, and by multiplexing the high-definition information included in the historically rendered images, the quality of the image obtained by low-definition rendering in the current viewing angle area can be improved. Therefore, high-definition rendered images can be obtained on the basis of less computational effort. Therefore, the method provided by the present application can be applied not only to devices with high computing power, but also to devices with low computing power, and has strong generalization ability and can be adapted to more hardware devices.
- a refined neural network can be used to make full use of the prior information of TAA, and a dedicated network can be used to replace traditional GPU computing methods, such as interpolation algorithms, projection frame heuristic correction algorithms, or manual The way the weights are involved etc.
- projection frame correction or fusion weight calculation can be implemented by neural network respectively, so that the calculation of projection frame correction or fusion weight can be realized by CPU or NPU, so as to reduce the load of GPU and reduce the GPU needs for rendering. Computing power, so that terminal devices with low computing power can also output high-definition rendered images.
- the ROI area is rendered at 540p, and the background area is rendered at 1080p, and then the ROI area is upsampled, and the ROI area in the high-definition historical rendering image is fused to obtain a high-definition ROI area rendering image. Then the rendered image of the ROI area and the apparent image of the background area are fused to obtain the final output image.
- the rendered image obtained by the method provided by the present application please render the interval high and low as shown in FIG. 16
- the rendered image obtained by high-definition rendering the high-definition rendering as shown in FIG. 16
- the rendered image obtained by the method provided in the present application can achieve a rendering effect equivalent to or better than that of a 1080p image, for example, the jaggies in a local area are smaller.
- the graphics method provided in this application can be applied to the terminal as shown in FIG. 19 .
- the terminal can include a GPU and a CPU.
- the GPU and the NPU can also include other devices, such as a display screen or a camera, etc., which are not one by one here. Repeat.
- the GPU and the NPU can share the buffer, as shown in FIG. 19, the G-Buffer buffer.
- the GPU is used to render the first scene data to obtain the first image
- the first scene data is obtained according to the viewing angle area of the virtual camera
- the rendering mode includes at least one of the first rendering mode or the second rendering mode , the resolution of the image obtained by the first rendering mode is higher than the resolution of the image obtained by the second rendering mode;
- the NPU is used to upsample the first image to obtain an upsampled image when the rendering mode of the first scene data includes the second rendering mode;
- the GPU is also used to obtain a historical rendering image, which is obtained by rendering the second scene data.
- the historical rendering image and the first image share a target object, and the resolution of the target object in the historical rendering image is higher than the resolution of the target object in the first image;
- the GPU is also used to project the target object in the historical rendering image into the first image to obtain a projected frame
- the NPU is also used to fuse the upsampled image and the projected frame to obtain the target rendered image.
- the GPU is further configured to render data corresponding to the first ROI region of interest in the first scene data by using the second rendering method to obtain a first image
- the first ROI region is a preset area or an area determined from the viewing angle area corresponding to the first scene data according to a preset method.
- the GPU is further configured to use the area including the target object in the historical rendering image as the second ROI area; project the area corresponding to the target object in the second ROI area in the historical rendering image to the first ROI area.
- a projection frame is obtained;
- the NPU is specifically used to fuse the second ROI region and the first image in the projection frame to obtain the target rendered image.
- the GPU is further configured to: render the data of the background area in the first scene data by using the first rendering method to obtain a background rendered image, where the background area is the viewing angle area corresponding to the first scene data Except for the first ROI area in the ROI; fuse the target rendering image and the background rendering image to obtain the updated target rendering image.
- the GPU is further used to determine that the rendering mode of the current viewing angle area includes the first rendering mode, and the preset conditions include one or more of the following:
- the motion vector of at least one object in the first scene data is higher than the preset motion value, and the motion vector includes at least one object in the first scene data. and the offset in the second scene data; or, the change in illumination between the first scene data and the second scene data exceeds a preset amount of change, and the illumination information includes illumination intensity, light source type, incident direction of the light source, or number of light sources At least one of ; or, there is an interval of N frames between the second scene data and the area that was last rendered using the first rendering mode, where N is a positive integer.
- the first ROI area is calculated and determined based on one or more items of lighting information, shadow information, lighting reflection information, or material of the target object of the first scene data.
- the NPU is further configured to: obtain a first weight corresponding to each pixel in the first image through a weight coefficient network, and a second weight corresponding to each pixel in the projection frame, the weight coefficient
- the network is a neural network used to calculate the respective weights of the input at least two frames of images; based on the first weight and the second weight, the up-sampled image and the projected frame are fused to obtain the target rendered image.
- the NPU is further configured to correct the projected frame by correcting the up-sampled image on the network before fusing the up-sampled image and the projected frame to obtain the target rendered image, to obtain the corrected projected frame, where the correction network is Neural network for filtering the input image.
- the GPU is further configured to render the second scene data in a first rendering manner to obtain a historically rendered image before rendering the first scene data to obtain the first image.
- the rendering step in the method provided by the present application can be performed by the GPU, such as steps 301-303, 305, and 307 shown in the aforementioned FIG. 3, or steps 403, 405, 408 as shown in the aforementioned FIG. 4 Or steps such as 409 , or steps such as steps 1203 , 1204 , 1206 , 1209 or 1211 as shown in the aforementioned FIG. 12 .
- Steps other than rendering can be performed by the NPU, such as steps 304 and 306 in the aforementioned FIG. 3 , steps 404 , 406 , and 407 shown in the aforementioned FIG. 4 , or as shown in the aforementioned FIG. 12 .
- Steps such as steps 1207, 1208, or 1210, etc., can be implemented by a neural network, and can be executed by the NPU, thereby reducing the load of the GPU.
- Data interaction between GPU and NPU can be achieved through G-Buffer.
- the image obtained after low-definition rendering or high-definition rendering or the image obtained after projection can be stored in G-Buffer by GPU, and NPU can be obtained from G-Buffer. Read the image obtained after low-definition rendering or high-definition rendering stored by the GPU, or the image obtained after projection, for subsequent processing, such as correction, fusion and other steps.
- the NPU can run the neural network to realize part of the calculation in the rendering process, such as the correction of the projection frame or the image fusion, etc., thereby reducing the computing power requirement of the GPU when rendering, and when the required
- high-definition information in historically rendered images can be reused to obtain high-definition rendered images of the current viewing angle area. Therefore, the graphics rendering method provided by the present application can be deployed not only in devices with relatively high computing power, but also in terminals with relatively low computing power, thereby improving the user experience of terminals with relatively low computing power.
- the present application also provides a graphics rendering device, including:
- the rendering module 2002 is configured to render the first scene data to obtain the first image, the first scene data is obtained according to the viewing angle area of the virtual camera, and the rendering mode includes at least one of the first rendering mode or the second rendering mode, The resolution of the image obtained by the first rendering mode is higher than the resolution of the image obtained by the second rendering mode;
- an upsampling module 2003 configured to upsample the first image to obtain an upsampled image if the rendering mode of the current viewing angle region includes the second rendering mode;
- the acquisition module 2001 is used to acquire a historical rendering image, the historical rendering image is obtained by rendering the second scene data, the historical rendering image and the first image share a target object, and the resolution of the target object in the historical rendering image is high to the resolution of the target object in the first image;
- the projection module 2004 is used for projecting the target object in the historical rendering image into the first image to obtain a projection frame.
- the rendering module 2002 is specifically configured to use the second rendering method to render the data corresponding to the first ROI region of interest in the first scene data to obtain the first image, where the first ROI region is The preset area or the area determined from the viewing angle area corresponding to the first scene data according to a preset method.
- the projection module 2004 is specifically configured to use the region including the target object in the historical rendering image as the second ROI region, and project the region corresponding to the target object in the second ROI region in the historical rendering image In the first image, the projection frame is obtained;
- the fusion module 2005 is further configured to fuse the second ROI area and the first image in the projection frame to obtain the target rendered image, and the second ROI area and the first ROI area include the same object.
- the rendering module 2002 is specifically configured to render the data of the background area in the first scene data by the first rendering method to obtain a background rendered image, where the background area is the perspective corresponding to the first scene data the region in the region except the first ROI region;
- the fusion module 2005 is further configured to fuse the target rendering image and the background rendering image to obtain an updated target rendering image.
- the apparatus may further include: a determining module 2006, configured to determine that the rendering mode of the first scene data includes the first rendering mode if the first scene data meets a preset condition, and the preset condition includes One or more of the following: background switching between the first scene data and the image obtained by rendering last time; or, the motion vector of at least one object in the first scene data is higher than a preset motion value, and the motion vector includes at least The offset of an object between the first scene data and the second scene data; or, the change of illumination between the first scene data and the second scene data exceeds a preset change amount, and the illumination information includes the illumination intensity, light source type , at least one of the incident direction of the light source or the number of the light sources; or, there is an interval of N frames between the second scene data and the area rendered by the first rendering method last time, where N is a positive integer.
- a determining module 2006 configured to determine that the rendering mode of the first scene data includes the first rendering mode if the first scene data meets a preset condition, and the preset
- the first ROI area is calculated and determined based on one or more items of lighting information, shadow information, lighting reflection information, or material of the target object of the first scene data.
- the fusion module 2005 is specifically configured to: obtain a first weight corresponding to each pixel in the first image and a second weight corresponding to each pixel in the projection frame through a weight coefficient network,
- the weight coefficient network is a neural network used to calculate the respective weights of the input at least two frames of images; based on the first weight and the second weight, the up-sampled image and the projected frame are fused to obtain the target rendered image.
- the apparatus further includes: a correction module 2007, configured to correct the projected frame by correcting the up-sampled image on the network before fusing the up-sampled image and the projected frame to obtain the target rendered image, so as to obtain the corrected image.
- the projected frame, rectification network is a neural network used to filter the input image.
- the rendering module 2002 is further configured to render the second scene data in a first rendering manner to obtain a historically rendered image before rendering the first scene data to obtain the first image.
- FIG. 21 is a schematic structural diagram of another graphics rendering apparatus provided by the present application, as described below.
- the training device may include a processor 2101 and a memory 2102.
- the processor 2101 and the memory 2102 are interconnected by wires. Among them, the memory 2102 stores program instructions and data.
- the memory 2102 stores program instructions and data corresponding to the steps in the aforementioned FIG. 7 to FIG. 14 .
- the processor may also be a processor for processing images, such as a GPU or a CPU for processing images.
- the processor 2101 is configured to execute the method steps executed by the graphics rendering apparatus shown in any of the foregoing embodiments in FIG. 7 to FIG. 14 .
- the graphics rendering apparatus may further include a transceiver 2103 for receiving or sending data.
- Embodiments of the present application also provide a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, the computer is made to execute the program described in the foregoing embodiments shown in FIG. 7 to FIG. 14 . steps in the method.
- the aforementioned graphics rendering device shown in FIG. 21 is a chip.
- the embodiment of the present application also provides a graphics rendering device
- the training device may also be called a digital processing chip or a chip
- the chip includes a processing unit and a communication interface
- the processing unit acquires program instructions through the communication interface
- the program instructions are executed by the processing unit
- the processing unit is configured to execute the method steps shown in any of the foregoing embodiments in FIG. 3 to FIG. 17 .
- the embodiments of the present application also provide a digital processing chip.
- the digital processing chip integrates circuits and one or more interfaces for implementing the above-mentioned processor 2101, processor 2201, or the functions of processor 2101 and processor 2201.
- the digital processing chip can perform the method steps of any one or more of the foregoing embodiments.
- the digital processing chip does not integrate the memory, it can be connected with the external memory through the communication interface.
- the digital processing chip implements the method steps in the above embodiments according to the program codes stored in the external memory.
- the embodiments of the present application also provide a computer program product, which, when running on a computer, causes the computer to execute the steps in the method described in any of the foregoing embodiments in FIG. 3 to FIG. 17 .
- the image processing apparatus or training apparatus provided in this embodiment of the present application may be a chip, and the chip includes: a processing unit and a communication unit.
- the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or a circuit, etc.
- the processing unit can execute the computer-executed instructions stored in the storage unit, so that the chip in the server executes the graphics rendering method described in the embodiments shown in FIG. 3 to FIG. 17 .
- the storage unit is a storage unit in the chip, such as a register, a cache, etc.
- the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
- ROM Read-only memory
- RAM random access memory
- the aforementioned processing unit or processor may include a central processing unit (CPU), a network processor (neural-network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), digital signal processing digital signal processor (DSP), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- CPU central processing unit
- NPU network processor
- graphics processing unit graphics processing unit
- DSP digital signal processing digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- a general purpose processor may be a microprocessor or it may be any conventional processor or the like.
- the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
- U disk U disk
- mobile hard disk ROM
- RAM random access memory
- disk or CD etc.
- a computer device which can be a personal computer, server, or network device, etc. to execute the methods described in the various embodiments of the present application.
- the computer program product includes one or more computer instructions.
- the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
- wire eg, coaxial cable, fiber optic, digital subscriber line (DSL)
- wireless eg, infrared, wireless, microwave, etc.
- the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated.
- the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state disks (SSDs)), and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Graphics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Generation (AREA)
Abstract
本申请提供一种图形渲染方法以及装置,用于通过间隔高低请渲染的方式来降低渲染所需的算力,并通过复用历史渲染图像的高清信息来提升目标渲染图像的清晰度,从而使算力较低的设备也可以得到高清的渲染图像。该方法包括:对第一场景数据进行渲染得到第一图像,渲染的方式包括第一渲染方式和/或第二渲染方式,第一渲染方式得到的图像的分辨率高于第二渲染方式得到的图像的分辨率;当第一场景数据的渲染方式包括第二渲染方式,则对第一图像进行上采样得到上采样图像;将历史渲染图像中的目标对象投影至第一图像中,得到投影帧,目标对象在历史渲染图像中的分辨率高于目标对象在第一图像中的分辨率;融合上采样图像和投影帧,得到目标渲染图像。
Description
本申请要求于2021年04月30日提交中国专利局、申请号为“202110486261.9”、申请名称为“一种图形渲染方法以及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及图像渲染领域,尤其涉及一种图形渲染方法以及装置。
在渲染领域,图形管线中能较容易提供上采样算法计算所需要的辅助信息,像素点的位移矢量(motion vector,mv),以及像素点的深度信息等,而普通图像领域通常只能通过算法估计或者额外的传感器设备协助采集。
例如,当前GPU硬件可支持可编程渲染管线,通常包含顶点着色,形状(图元)装配,几何着色,光栅化,片段着色,测试与混合等几个阶段,其中顶点着色、几何着色、片段着色三个阶段支持用户通过编程进行自定义,常用的顶点着色一般用于对模型顶点的坐标变换,片段着色则对最终显示的像素点颜色计算。然而,随着图像的分辨率增加,渲染所需的算力也成倍增加,因此,如何降渲染所需的算力,成为亟待解决的问题。
发明内容
本申请提供一种图形渲染方法以及装置,用于通过间隔高低请渲染的方式来降低渲染所需的算力,并通过复用历史渲染图像的高清信息来提升目标渲染图像的清晰度,从而使算力较低的设备也可以得到高清的渲染图像。
有鉴于此,第一方面,本申请提供一种图形渲染方法,包括:对第一场景数据进行渲染,得到第一图像,第一场景数据根据虚拟相机的视角区域获得,渲染的方式包括第一渲染方式或第二渲染方式中的至少一种,第一渲染方式得到的图像的分辨率高于第二渲染方式得到的图像的分辨率,可以理解为第一渲染方式为高清渲染方式,而第二渲染方式为低清渲染方式;当第一场景数据的渲染方式包括第二渲染方式,则对第一图像进行上采样得到上采样图像;获取历史渲染图像,该历史渲染图像是对第二场景数据进行渲染得到,历史渲染图像和第一图像中共同存在目标对象,且目标对象在历史渲染图像中的分辨率高于目标对象在第一图像中的分辨率;将历史渲染图像中的目标对象投影至第一图像中,得到投影帧;融合上采样图像和投影帧,得到目标渲染图像。
因此,本申请实施方式中,可以从第一渲染方式或者第二渲染方式中选择至少一种方式来对当前视角区域进行渲染,若当前视角区域的渲染方式包括第二渲染方式,即得到的渲染图像的分辨率较低,则可以对第二渲染方式得到的第一图像进行上采样得到上采样图像,从而提高渲染后的图像的分辨率。然后将历史渲染图像中的高清对象投影至第一图像中,从而提高得到具有高清信息且和第一图像的对象匹配的投影帧,随后融合上采样图像和投影帧,即可得到高清的目标渲染图像。相当于在进行低清渲染时,可以复用已渲染得到的高清渲染图像中的高清信息,来补充第二渲染方式得到的低分辨率图像的细节,从而 得到高清的渲染图像。且第二渲染方式所需的算力低于第一渲染方式,从而使本申请提供的方法可以部署于算力较低的设备中,使算力较低的设备也可以通过本申请提供的方法得到高清的渲染图像。
在一种可能的实施方式中,前述的对第一场景数据进行渲染,得到第一图像,可以包括:使用第二渲染方式对第一场景数据中的第一感兴趣ROI区域对应的数据进行渲染,得到第一图像,第一ROI区域是预设区域或者根据预设方式从第一场景数据对应的视角区域中确定的区域。
因此,本申请实施方式中,在进行低清渲染时,可以对第一场景数据中的感兴趣(region of interest,ROI)区域进行低清渲染,通常ROI区域的模型或者光照条件等较为复杂,对ROI区域进行低清渲染可以非常显著地降低渲染所需算力,使本申请提供的方法应用到算力较低的设备中。
在一种可能的实施方式中,前述的将历史渲染图像中的对象投影至第一图像中,得到投影帧,可以包括:将历史渲染图像中包括目标对象的区域作为第二ROI区域;将历史渲染图像中的第二ROI区域中的目标对象对应的区域投影至第一图像中,得到投影帧;融合投影帧中的第二ROI区域和第一图像,得到目标渲染图像。
与前述的实施方式相应地,若对当前视角区域中的ROI区域进行了低清渲染,则在使用历史渲染图像进行投影时,也可以对历史渲染图像中的ROI区域进行投影,从而以便于后续使用历史渲染图形中的ROI区域的高清信息,对当前视角区域的渲染图像中的ROI区域的低清信息进行细节补充,从而提高当前视角区域中ROI区域的清晰度,得到高清图像。
在一种可能的实施方式中,前述的对当前视角区域中进行渲染,还可以包括:通过第一渲染方式对第一场景数据中的背景区域的数据进行渲染,得到背景渲染图像,背景区域是第一场景数据对应的视角区域中除第一ROI区域之外的区域;上述方法还可以包括:融合目标渲染图像和背景渲染图像,得到更新后的目标渲染图像。
因此,本申请实施方式中,除了可以对ROI区域进行,还对背景区域进行高清渲染,背景区域的高清渲染算力需求较低,因此即使对背景部分进行高清渲染,可以适应算力较低的设备,且渲染背景部分还需要融合背景渲染部分和ROI区域的渲染部分,以组成完整的渲染图像,得到高清的渲染图像。
在一种可能的实施方式中,上述方法还可以包括:若第一场景数据符合预设条件,则确定第一场景数据的渲染方式包括第一渲染方式,预设条件包括以下一项或者多项:第一场景数据与上一次进行渲染得到的图像之间的背景切换;或者,第一场景数据中的至少一个对象的运动矢量高于预设运动值,运动矢量包括至少一个对象在第一场景数据和第二场景数据中的偏移量;或者,第一场景数据与第二场景数据之间的光照的变化超过预设变化量,光照的信息包括光照强度、光源类型、光源入射方向或者光源数目中的至少一种;或者,第二场景数据与上一次使用第一渲染方式进行渲染的区域之间间隔N帧,N为正整数。
因此,本申请实施方式中,可以从背景切换、运动矢量、光照变化或者间隔N帧等方式来确定进行高清渲染或是低清渲染,从而可以选择与场景适配的渲染方式,适应不同的三维场景,针对不同的三维场景也能够得到高清的渲染图像,从而在合适的场景下选择低 清渲染,降低对设备的算力需求。
在一种可能的实施方式中,第一ROI区域基于第一场景数据的光照的信息、阴影的信息、光照反射的信息或者目标对象的材质中的一项或者多项计算确定。因此,本申请实施方式中,在渲染ROI区域时,可以基于光照、阴影、光照反射或者对象的材质等信息来选择较为复杂的对象,通常这些对象若使用高清渲染则需要耗费较多的算力,因此本申请对此进行低清渲染,可以显著地降低算力需求,从而提高本申请提供的方法部署于算力较低的终端的可能性。
在一种可能的实施方式中,前述的融合上采样图像和投影帧,得到目标渲染图像,可以包括:通过权重系数网络获取第一图像中的各个像素点对应的第一权重,和投影帧中的各个像素点对应的第二权重,权重系数网络是用于计算输入的至少两帧图像分别对应的权重的神经网络;基于第一权重和第二权重融合上采样图像和投影帧,得到目标渲染图像。
本申请实施方式中,可以通过神经网络来对上采样图像和投影帧进行融合,通常神经网络可以通过设备中的网络处理器(neural-network processing unit,NPU)来实现,而渲染通常是由图形处理器(graphics processing unit,GPU)来执行,因此本申请通过NPU来实现图像融合,可以降低GPU的计算量,进一步降低GPU的算力需求,从而提高渲染效率,并提高本申请提供的方法部署于算力较低的终端的可能性。
在一种可能的实施方式中,前述的基于第一权重和第二权重融合上采样图像和投影帧,得到目标渲染图像,可以包括:对第一权重进行插值,得到上采样图像中的各个像素点对应的第三权重;基于第三权重和第二权重融合上采样图像和投影帧,得到目标渲染图像。本申请实施方式中,第一图像的分辨率较低,或者输入至权重系数网络的图像可能进行了下采样,因此得到的第一权重的分辨率也就越低,因此可以通过对第一权重进行插值的方式,来得到分辨率更高的权重,从而基于分辨率更高的权重看来融合上采样图像和投影帧,从而得到高清的目标渲染图像。
在一种可能的实施方式中,在融合上采样图像和投影帧得到目标渲染图像之前,上述方法还可以包括:通过矫正网络上采样图像对投影帧进行矫正,得到矫正后的投影帧,矫正网络是用于对输入的图像进行滤波的神经网络。
本申请实施方式中,在使用历史渲染图像进行投影之后,可能存在对象之间遮挡或者光照反向错误等问题,因此可以对投影帧进行矫正,从而减少投影帧中的鬼影,得到矫正后的投影帧。并且,本申请可以通过矫正网络来实现投影帧的矫正,因此可以使用NPU来实现矫正,降低GPU的算力需求,从而提高本申请提供的方法部署于算力较低的终端的可能性。
在一种可能的实施方式中,在对第一场景数据进行渲染,得到第一图像之前,方法还包括:通过第一渲染方式对第二场景数据进行渲染,得到历史渲染图像。
因此,本申请实施方式中,可以对第二场景数据进行高清渲染,从而得到高清的历史渲染图像。
在一种可能的实施方式中,若当前视角区域中的ROI区域中包括运动的目标对象,将历史渲染图像中的对象投影至第一图像中,包括:根据目标对象的运动矢量,将历史渲染 图像中的目标对象投影至第一图像中,得到投影帧,运动矢量包括目标对象在第一场景数据和第二场景数据之间的偏移量。
本申请实施方式中,可以根据对象的运动矢量的来进行投影,从而可以对历史渲染图像中的对象进行更准确地投影,减少投影帧中的鬼影,提高投影帧的清晰度。
在一种可能的实施方式中,前述的根据目标对象的运动矢量,将历史渲染图像中的目标对象投影至第一图像中,得到投影帧,可以包括:根据目标对象的运动矢量,确定历史渲染图像中的目标对象在第一图像中的位置,并对位置进行赋值,得到投影帧。
本申请实施方式中,可以根据对象的运动矢量,确定历史渲染图像中的对象在第一图像中的位置,从而根据历史渲染图像中的对象的像素值或者颜色值等对第一图像中对应的位置进行赋值,从而得到高清的投影帧。
第二方面,本申请提供一种终端,包括GPU和NPU,如下:
GPU,用于对第一场景数据进行渲染,得到第一图像,第一场景数据根据虚拟相机的视角区域获得,渲染的方式包括第一渲染方式或第二渲染方式中的至少一种,第一渲染方式得到的图像的分辨率高于第二渲染方式得到的图像的分辨率;
NPU,用于当第一场景数据的渲染方式包括第二渲染方式,对第一图像进行上采样得到上采样图像;
GPU,还用于获取历史渲染图像,该历史渲染图像是对第二场景数据进行渲染得到,历史渲染图像和第一图像中共同存在目标对象,且目标对象在历史渲染图像中的分辨率高于目标对象在第一图像中的分辨率;
GPU,还用于将历史渲染图像中的目标对象投影至第一图像中,得到投影帧;
NPU,还用于融合上采样图像和投影帧,得到目标渲染图像。
其中,本申请第二方面以及任一可选实施方式的效果可以参阅前述第一方面以及任一可选实施方式的效果,此处不再赘述。
在一种可能的实施方式中,GPU,还用于使用第二渲染方式对第一场景数据中的第一感兴趣ROI区域对应的数据进行渲染,得到第一图像,第一ROI区域是预设区域或者根据预设方式从第一场景数据对应的视角区域中确定的区域。
在一种可能的实施方式中,GPU,还用于将历史渲染图像中包括目标对象的区域作为第二ROI区域;将历史渲染图像中的第二ROI区域中的目标对象对应的区域投影至第一图像中,得到投影帧;
NPU,具体用于融合投影帧中的第二ROI区域和第一图像,得到目标渲染图像。
在一种可能的实施方式中,GPU,还用于:通过第一渲染方式对第一场景数据中的背景区域的数据进行渲染,得到背景渲染图像,背景区域是第一场景数据对应的视角区域中除第一ROI区域之外的区域;融合目标渲染图像和背景渲染图像,得到更新后的目标渲染图像。
在一种可能的实施方式中,若当前视角区域符合预设条件,则GPU,还用于确定当前视角区域的渲染方式包括第一渲染方式,预设条件包括以下一项或者多项:
第一场景数据与上一次进行渲染得到的图像之间的背景切换;或者,第一场景数据中 的至少一个对象的运动矢量高于预设运动值,运动矢量包括至少一个对象在第一场景数据和第二场景数据中的偏移量;或者,第一场景数据与第二场景数据之间的光照的变化超过预设变化量,光照的信息包括光照强度、光源类型、光源入射方向或者光源数目中的至少一种;或者,第二场景数据与上一次使用第一渲染方式进行渲染的区域之间间隔N帧,N为正整数。
在一种可能的实施方式中,第一ROI区域基于第一场景数据的光照的信息、阴影的信息、光照反射的信息或者目标对象的材质中的一项或者多项计算确定。
在一种可能的实施方式中,NPU,还用于:通过权重系数网络获取第一图像中的各个像素点对应的第一权重,和投影帧中的各个像素点对应的第二权重,权重系数网络是用于计算输入的至少两帧图像分别对应的权重的神经网络;基于第一权重和第二权重融合上采样图像和投影帧,得到目标渲染图像。
在一种可能的实施方式中,NPU,还用于在融合上采样图像和投影帧得到目标渲染图像之前,通过矫正网络上采样图像对投影帧进行矫正,得到矫正后的投影帧,矫正网络是用于对输入的图像进行滤波的神经网络。
在一种可能的实施方式中,GPU,还用于在对第一场景数据进行渲染,得到第一图像之前,通过第一渲染方式对第二场景数据进行渲染,得到历史渲染图像。
第三方面,本申请实施例提供一种图形渲染装置,该图形渲染装置具有实现上述第一方面图像处理方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第四方面,本申请实施例提供一种图形渲染装置,包括:处理器和存储器,其中,处理器和存储器通过线路互联,处理器调用存储器中的程序代码用于执行上述第一方面任一项所示的用于图形渲染方法中与处理相关的功能。可选地,该图形渲染装置可以是芯片。
第五方面,本申请实施例提供了一种图形渲染装置,该图形渲染装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行如上述第一方面或第一方面任一可选实施方式中与处理相关的功能。
第六方面,本申请实施例提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面中任一可选实施方式中的方法。
第七方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面中任一可选实施方式中的方法。
图1为本申请应用的一种人工智能主体框架示意图;
图2A为本申请实施例提供的一种系统架构示意图;
图2B为本申请实施例提供的一种电子设备的结构示意图;
图3为本申请提供的一种图形渲染方法的流程示意图;
图4为本申请提供的另一种图形渲染方法的流程示意图;
图5为本申请提供的一种视角区域的示意图;
图6为本申请提供的一种投影方式示意图;
图7为本申请提供的一种投影帧的示意图;
图8为本申请提供的一种投影帧矫正方式示意图;
图9为本申请提供的一种融合权重计算方式示意图;
图10为本申请提供的一种低清渲染和高清渲染的区别示意图;
图11为本申请提供的一种渲染后的图像示意图;
图12为本申请提供的另一种图形渲染方法的流程示意图;
图13为本申请提供的一种ROI区域示意图;
图14为本申请提供的一种ROI区域的低清渲染图像示意图;
图15为本申请提供的一种当前视角区域中的ROI区域的渲染图像示意图;
图16为本申请提供的一种输出图像示意图;
图17为本申请提供的方案和常用的方案中针对ROI区域的渲染效果示意图;
图18为本申请提供的一种终端的结构示意图;
图19为本申请提供的另一种终端的结构示意图;
图20为本申请提供的一种图形渲染装置的结构示意图;
图21为本申请提供的另一种图形渲染装置的结构示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片,如中央处理器(central processing unit,CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和 支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、智慧城市等。
本申请实施例涉及了大量神经网络和图像的相关应用,为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的神经网络和图像领域的相关术语和概念进行介绍。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以如以下公式所示:
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多 层中间层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,中间层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是中间层,或者称为隐层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。
虽然DNN看起来很复杂,其每一层可以表示为线性关系表达式:
其中,
是输入向量,
是输出向量,
是偏移向量或者称为偏置参数,w是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
经过如此简单的操作得到输出向量
由于DNN层数多,系数W和偏移向量
的数量也比较多。这些参数在DNN中的定义如下所述:以系数w为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的中间层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(3)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(4)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之 间的差异”,这便是损失函数(loss function)或目标函数(objective function),是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。该损失函数通常可以包括误差平方均方、交叉熵、对数、指数等损失函数。例如,可以使用误差均方作为损失函数,定义为
具体可以根据实际应用场景选择具体的损失函数。
(5)反向传播算法
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。
(6)渲染
在具有显示功能的电子设备中,将3D/2D模型转换为显示图像的过程,在游戏,电影特效等领域应用较广泛,广义上渲染流程包含:建模、构建材质、构建动画以及渲染显示等。
(7)深度学习上采样(deep learning super sampling,DLSS)
主要指利用深度学习对渲染图像进行上采样从而减少渲染着色的耗时。
(8)时域反走样/抗锯齿(Temporal Anti-Aliasing,TAA)
TAA是近年来商业游戏引擎使用较广泛的一种反走样算法,是渲染领域基于时域复用的框架,其以后处理的方式融入渲染流程,现有的DLSS技术,以及光线追踪去噪算法SVGF,BMFR等也是基于TAA的改善技术。
(9)基于物理的渲染(physic based rendering,PBR)
是指一些在不同程度上都基于与现实世界的物理原理更相符的基本理论所构成的渲染技术集合。其核心是使用一种更符合物理规律的方式来模拟物体表面的光线。这种方式与传统的基于Phong光照模型或者Bilnn-Phong光照模型的光照算法相比更真实,且PBR的表现形式与物理性质更接近,因此可以以物理参数为依据来调节表面材质使得光照效果更正常。
(10)前向渲染管线(forward rendering)
也称前向着色(forward shadering),对场景中每个物体分别遍历所有光源计算最后显示的颜色。
(11)延时渲染管线(deferred rendering)
未为解决前向渲染着色计算耗时与光源数目以及物体数目线性相关的问题而提出,在渲染过程中将所有要渲染物体计算光照着色需要的几何信息首先渲染到几张贴图上(G-buffer),然后通过遍历G-buffer的像素点从而计算最终的光照颜色,适用于大量的渲染物体与多光源场景。
(12)多重采样反走样(multisampling anti-aliasing,MSAA)
通过对每个像素点计算多个子像素点的颜色,最终合成要显示的像素颜色,该功能一般是硬件支持,适合前向渲染管线,能较好的处理几何锯齿。
(13)掩膜(mask)
掩膜可以理解为与图像类似的数据,本申请实施方式中,可以通过在融合图像与掩膜,从而使图像中的部分内容的关注度更高。通常,掩膜可以用于提取感兴趣(region of interest,ROI)区域,例如用预先制作的感兴趣区掩模与待处理图像融合,得到感兴趣区图像,感兴趣区内图像值保持不变,而区外图像值都为0。还可以起屏蔽作用,用掩模对图像上某些区域作屏蔽,使其不参加处理或不参加处理参数的计算,或仅对屏蔽区作处理或统计等。
(14)前景、背景
通常,前景可以理解为图像中所包括的主体,或者需要关注的对象等,本申请以下实施方式中,将图像中的前景中的对象,称为前景对象。该前景也可以理解为图像中的感兴趣区域(region of interest,ROI)。背景则是图像中除前景外的其他区域。例如,一张包括了交通信号灯的图像,则该图像中的前景(或者称为前景对象)则为交通信号灯所在的区域,背景即为该图像中除前景之外的区域。
本申请实施例提供的图形渲染方法可以在服务器上被执行,还可以在终端设备上被执行,相应地,本申请以下提及的神经网络,可以部署于服务器,也可以部署于终端上,具体可以根据实际应用场景调整。例如,本申请提供的图形渲染方法,可以通过插件的方式部署于终端中。其中该终端设备可以是具有图像处理功能的移动电话、平板个人电脑(tablet personal computer,TPC)、媒体播放器、智能电视、笔记本电脑(laptop computer,LC)、个人数字助理(personal digital assistant,PDA)、个人计算机(personal computer,PC)、照相机、摄像机、智能手表、可穿戴式设备(wearable device,WD)或者自动驾驶的车辆等,本申请实施例对此不作限定。下面示例性地,以本申请提供的图形渲染方法部署于终端为例进行示例性说明。
本申请提供的图形渲染方法中的全部或者部分流程可以通过神经网络来实现,如其中的上采样、投影或者融合步骤等,可以通过神经网络来实现。而通常神经网络需要在训练之后部署在终端上,如图2A所示,本申请实施例提供了一种系统架构100。在图2A中,数据采集设备160用于采集训练数据。在一些可选的实现中,针对图形渲染来说,训练数据可以包括大量渲染后的高质量图像和未渲染的三维模型等。
在采集到训练数据之后,数据采集设备160将这些训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。可选地,在本申请以下实施方式中所提及的训练集,可以是从该数据库130中得到,也可以是通过用户的输入数据得到。
其中,目标模型/规则101可以为本申请实施例中进行训练后的神经网络,该神经网络可以包括一个或者多个网络,用于计算融合权重或者对投影后的图像进行矫正等。
下面对训练设备120基于训练数据得到目标模型/规则101进行描述,训练设备120对输入的三维模型进行处理,将输出的图像与输入的三维模型对应的高质量渲染图像进行对 比,直到训练设备120输出的图像与高质量渲染图像的差值小于一定的阈值,从而完成目标模型/规则101的训练。
上述目标模型/规则101能够用于实现本申请实施例的用于图形渲染方法训练得到的神经网络,即,将待处理数据(如待渲染的图像或者已渲染需要进行进一步处理的图像)通过相关预处理后输入该目标模型/规则101,即可得到处理结果。本申请实施例中的目标模型/规则101具体可以为本申请以下所提及的神经网络,该神经网络可以是前述的CNN、DNN或者RNN等类型的神经网络。需要说明的是,在实际的应用中,所述数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型/规则101的训练,也有可能从云端或其他地方获取训练数据进行模型训练,本申请对此并不作限定。
根据训练设备120训练得到的目标模型/规则101可以应用于不同的系统或设备中,如应用于图2A所示的执行设备110,该执行设备110也可以称为计算设备,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端设备等。在图2A中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括:客户设备输入的待处理数据。客户端可以是其他的硬件设备,如终端或者服务器等,客户端也可以是部署于终端上的软件,如APP、网页端等。
预处理模块113和预处理模块114用于根据I/O接口112接收到的输入数据(如待处理数据)进行预处理,在本申请实施例中,也可以没有预处理模块113和预处理模块114(也可以只有其中的一个预处理模块),而直接采用计算模块111对输入数据进行处理。
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理过程中,执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。
最后,I/O接口112将处理结果,则将处理结果返回给客户设备140,从而提供给用户,例如若第一神经网络用于进行图像分类,处理结果为分类结果,则I/O接口112将上述得到的分类结果返回给客户设备140,从而提供给用户。
需要说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则101,该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。在一些场景中,执行设备110和训练设备120可以是相同的设备,或者位于相同的计算设备内部,为便于理解,本申请将执行设备和训练设备分别进行介绍,并不作为限定。
在图2A所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体 的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的预测标签作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的预测标签,作为新的样本数据存入数据库130。
需要说明的是,图2A仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图2A中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。
如图2A所示,根据训练设备120训练得到目标模型/规则101,该目标模型/规则101在本申请实施例中可以是本申请中的神经网络,具体的,本申请实施例提供的神经网络可以包括CNN,深度卷积神经网络(deep convolutional neural networks,DCNN),循环神经网络(recurrent neural network,RNN)或者构建得到的神经网络等等。
本申请实施例中的图形渲染方法可以由电子设备来执行,该电子设备即前述的执行设备。该电子设备中包括CPU和GPU,能够对图像进行渲染处理。当然,还可以包括其他设备,如NPU或者ASIC等,此处仅仅是示例性说明,不再一一赘述。示例性地,该电子设备例如可以是手机(mobile phone)、平板电脑、笔记本电脑、PC、移动互联网设备(mobile internet device,MID)、可穿戴设备,虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备、工业控制(industrial control)中的无线电子设备、无人驾驶(self driving)中的无线电子设备、远程手术(remote medical surgery)中的无线电子设备、智能电网(smart grid)中的无线电子设备、运输安全(transportation safety)中的无线电子设备、智慧城市(smart city)中的无线电子设备、智慧家庭(smart home)中的无线电子设备等。该电子设备可以是运行安卓系统、IOS系统、windows系统以及其他系统的设备。在该电子设备中可以运行有需要对3D场景进行渲染而得到二维图像的应用程序,例如游戏应用、锁屏应用、地图应用或监控应用等应用。
为了便于理解,下面结合图2B对电子设备的具体结构进行详细的介绍。可以参阅图2B,图2B为本申请实施例提供的一种电子设备的结构示意图。
在一个可能的实施例中,如图2B所示,电子设备2000可以包括:中央处理器2001、图形处理器2002、显示设备2003和存储器2004。可选地,该电子设备2000还可以包括至少一个通信总线(图2B中未示出),用于实现各个组件之间的连接通信。
应当理解,电子设备2000中的各个组件还可以通过其他连接器相耦合,其他连接器可包括各类接口、传输线或总线等。电子设备2000中的各个组件还可以是以中央处理器2001为中心的放射性连接方式。在本申请的各个实施例中,耦合是指通过相互电连接或连通,包括直接相连或通过其他设备间接相连。
中央处理器2001和图形处理器2002的连接方式也有多种,并不局限于图2B所示的方式。电子设备2000中的中央处理器2001和图形处理器2002可以位于同一个芯片上,也可以分别为独立的芯片。
下面对中央处理器2001、图形处理器2002、显示设备2003和存储器2004的作用进行简单的介绍。
中央处理器2001:用于运行操作系统2005和应用程序2006。应用程序2006可以为图形类应用程序,比如游戏、视频播放器等等。操作系统2005提供了系统图形库接口,应用程序2006通过该系统图形库接口,以及操作系统2005提供的驱动程序,比如图形库用户态驱动和/或图形库内核态驱动,生成用于渲染图形或图像帧的指令流,以及所需的相关渲染数据。其中,系统图形库包括但不限于:嵌入式开放图形库(open graphics library for embedded system,OpenGL ES)、柯罗诺斯平台图形界面(the Khronos platform graphics interface)或Vulkan(一个跨平台的绘图应用程序接口)等系统图形库。指令流包含一些列的指令,这些指令通常为对系统图形库接口的调用指令。
可选地,中央处理器2001可以包括以下至少一种类型的处理器:应用处理器、一个或多个微处理器、数字信号处理器(digital signal processor,DSP)、微控制器(microcontroller unit,MCU)或人工智能处理器等。
中央处理器2001还可进一步包括必要的硬件加速器,如专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)、或者用于实现逻辑运算的集成电路。处理器2001可以被耦合到一个或多个数据总线,用于在电子设备2000的各个组件之间传输数据和指令。
图形处理器2002:用于接收处理器2001发送的图形指令流,通过渲染管线(pipeline)生成渲染目标,并通过操作系统的图层合成显示模块将渲染目标显示到显示设备2003。其中,渲染管线也可以称为渲染流水线、像素流水线或像素管线,是图形处理器2002内部用于处理图形信号的并行处理单元。图形处理器2002中可以包括多个渲染管线,多个渲染管线之间可以相互独立地并行处理图形信号。例如,渲染管线可以在渲染图形或图像帧的过程中执行一些列操作,典型的操作可以包括:顶点处理(Vertex Processing)、图元处理(Primitive Processing)、光栅化(Rasterization)、片段处理(Fragment Processing)等等。
可选地,图形处理器2002可以包括执行软件的通用图形处理器,如GPU或其他类型的专用图形处理单元等。
显示设备2003:用于显示由电子设备2000生成的各种图像,该图像可以为操作系统的图形用户界面(graphical user interface,GUI)或由图形处理器2002处理的图像数据(包括静止图像和视频数据)。
可选地,显示设备2003可以包括任何合适类型的显示屏。例如液晶显示器(liquid crystal display,LCD)或等离子显示器或有机发光二极管(organic light-emitting diode,OLED)显示器等。
存储器2004,是中央处理器2001和图形处理器2002之间的传输通道,可以为双倍速率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)或者其它类型的缓存。
常用的渲染上采样与常见的图像视频超分类似,都是为了解决采样数不足而导致的时 间效果降低的问题而提出的。不同之处在于,渲染在空间与时间上都是离散的样本点,低分辨率渲染会导致高度的重叠或者走样,因此,渲染管线中的上采样算法通常是反走样和插值算法;而普通图像视频领域的数据源大多来源于相机,其每个像素点的颜色是一个像素区域的积分,低分辨率采样不足导致的是模糊的问题,上采样方法则是去模糊和插值算法。例如,当前移动端(如手机)的屏幕分辨率通常是1080p,而2K屏,4K屏也会逐渐出现。而更高的渲染分辨率意味着更大的GPU计算负载。通常渲染一帧,几何着色阶段耗时占比20%,片段着色阶段耗时占比80%。渲染540p与渲染1080p理论上负载增加4倍,实际测试~3.4倍,渲染1080p与渲染2k理论上负载增加1.8倍,实际测试~1.6倍。同时片段着色阶段对于分辨率变化较为敏感,减少片段着色阶段的计算量能够极大降低GPU的负载。
然而,常用的时域抗锯齿方案都是针对PC端,其计算复杂,无法应用于移动终端。
因此,本申请提供一种图形渲染方法,通过间隔进行高清渲染和低清渲染来降低渲染所需的算力,且复用投影帧中所包括的高清信息,从而得到清晰度更高的渲染图像。下面对本申请提供的图形渲染方法进行详细介绍。
首先,为便于理解,对本申请提供的图形渲染方法的一些应用场景进行示例性介绍。
例如,一种场景,本申请提供的方法可以应用于一些AR或者VR应用中,如购物、游戏、投影等应用,如本申请提供的图形渲染方法可以由智能穿戴设备(如VR眼镜/AR眼镜等)、移动终端(如手机、平板等)或者其他可以承载AR或者VR应用的设备等。例如,可以在穿戴设备中安装VR应用的应用程序,为用户提供服务,VR应用可以为用户提供多种三维场景,用户可以通过穿戴设备的显示屏观看VR应用中的三维场景,相当于给用户沉浸式体验,提高用户体验。而对于这些三维场景的显示,在构建三维模型之后,还需要对三维模型进行渲染,从而通过本申请提供的图形渲染方法输出的渲染后的图像的形式提供给用户视觉体验。
又例如,另一种场景中,本申请提供的方法可以应用在AR游戏中,AR游戏的应用程序可以安装在用户的移动终端或者穿戴设备中,用户可以通过移动终端或者穿戴设备沉浸式体验AR游戏,从而提高用户体验。在移动终端或者穿戴设备运行AR游戏时,需要对游戏中的三维场景进行渲染,从而可以通过图像的形式呈现出三维场景,可以通过本申请提供的图形渲染方法对游戏中的三维场景进行渲染,从而得到渲染后的高清图像,提高用户体验。
参阅图3,本申请提供的一种图形渲染方法的流程示意图,如下所述。
301、获取第一场景数据。
该第一场景数据可以是三维场景数据也可以是二维场景数据,还可以是具有更多维度的场景数据等,该第一场景数据中可以包括多个二维或者三维模型,每个模型可以由多个基础图元构成。本申请示例性地,以三维场景为例进行示例性说明,以下所提及的三维场景或者三维数据也可以替换为二维场景或者二维数据,以下不再赘述。
具体地,该第一场景数据可以通过虚拟相机的视角区域来确定,本申请示例性地,将该第一场景数据对应的视角区域称为当前视角区域,将前一帧或多帧第二场景数据对应的 视角区域称为相邻视角区域或者虚拟相机的上一个视角区域。
可以理解的是,在进行渲染时,通常可以通过虚拟相机的视角区域,从而更大的二维或者三维场景中取当前视角区域对应的数据,即可得到第一场景数据。本申请示例性地,以从三维场景中取当前视角区域对应的数据为例进行示例性说明,以下所提及的三维场景数据也可以替换为二维场景数据或者更多维度的场景数据等,本申请对此并不作限定。
例如,该三维场景数据可以是由服务器从三维模型库中选择出三维模型构建得到,这些三维模型可以包括如树、光源、房屋、建筑、地理环境、人物或者动物等形状的模型或者对象,挑选出的三维模型即可组成三维场景。并且,在实际应用场景中,随着虚拟相机在三维场景中的视角偏移,视角区域内的模型也可能产生变化。
该三维场景数据可以是多种场景下的数据,下面示例性地对一些场景下的三维数据进行介绍。例如,该三维场景数据可以包括AR或者VR应用中构建三维场景的数据,如一些AR/VR游戏、AR地图等,通过基础图元构成各个三维模型,然后由多个三维模型组成与现实类似的虚拟的三维场景,然后可以通过以下的步骤将三维场景渲染为可见的图像,可以在AR或者VR设备的显示屏中显示,从而使用户可以通过显示屏观察到三维场景,提高用户体验。
可选地,在一种可能的实施方式中,在步骤301之前,还可以对第二场景数据进行高清渲染,得到高清的历史渲染图像。该第二场景数据可以是虚拟相机的上一个视角区域或者当前视角区域的相邻视角区域对应的场景数据。
302、判断是否对第一场景数据进行低清渲染,若是,则执行步骤303,若否,则执行步骤307。
其中,对场景数据的渲染方式可以分为高清渲染(或者称为第一渲染方式)和低清渲染(或者称为第二渲染方式),高清渲染得到的图像的分辨率高于低清渲染得到的图像的分辨率,相应地,高清渲染得到的图像的清晰度也比低清渲染得到的图像的清晰度更好,而越复杂的模型,渲染所耗费的计算资源也就越多。
当前视角区域即场景中当前需要渲染的区域。可以理解为,在进行渲染时,可以预先确定一个虚拟相机,然后以虚拟机的视角进行渲染。例如,在用户使用三维场景构建的游戏时,用户的显示视角可以跟随用户所操控的人物移动,用户的视角即为虚拟相机的视角,显示器中显示的可见区域即为当前视角区域。
具体地,可以判断第一场景数据是否符合预设条件,来确定对第一场景数据的渲染方式是否包括高清渲染。该预设条件可以包括但不限于以下一项或者多项:
第一场景数据与第二场景数据之间的场景切换,或者说背景切换,如用户在进行三维游戏时,从当前场景中进入了副本,即进去了新的场景中,则可以认为进行了场景切换;或者,当前视角区域,即第一场景数据中的对象的运动矢量高于预设运动值,运动矢量包括对象在三维场景的第一场景数据和第二场景数据之间的偏移量,即对象在当前视角区域和上一个视角区域之间的偏移量,如光流或者对象的运动速度等,该运动矢量可以理解为两帧视角之间的像素的偏移值;或者,当前视角区域与相邻视角区域之间的光照的变化超过预设变化量,光照的信息包括光照强度、光源类型、光源入射方向或者光源数目中的至 少一种,通常,光照变化可能引起三维模型的变化较大,因此需要高清渲染才能得到更清晰的图像;或者,当前视角区域与上一次进行高清渲染的区域之间间隔N帧,N为正整数,该N可以是预先设定的值,也可以是根据用户输入的数据确定的值。
因此,本申请实施方式中,当发生了场景切换、运动矢量过大、光照变化大或者间隔N帧,即可进行高清渲染,相当于间隔进行高清渲染和低清渲染,一些变化较大的场景可以使用高清渲染,而变化较少的场景则可以进行低清渲染,然后复用历史渲染图像中的高清信息来提升低清渲染输出的图像的清晰度,也可以得到高清的图像。通过低清渲染来减少渲染所需的计算资源,使本申请提供的方法不仅可以部署于服务器中,也可以部署于算力更低的设备中,如手机、电视或者平板等终端设备,从而提高终端设备的用户体验。
需要说明的是,对场景进行渲染的方式可以包括高清渲染和/或低清渲染,对同一个视角区域,可以同时进行高清渲染和低清渲染,如对模型或者光照条件较为复杂的前景部分进行低清渲染,以降低算力需求,对模型或者光照条件较为简单的背景部分则可以进行高清渲染,所耗费的算力也较低,从而使背景部分的渲染效果更好,提高用户体验。
303、对第一场景数据进行低清渲染,得到第一图像。
若第一场景数据的渲染方式包括低清渲染,即可对当前视角区域进行低清渲染,得到第一图像。
具体地,本实施例中的高清渲染或者低清渲染,可以采用PBR、前向渲染管线或延时渲染管线等渲染方式,高清渲染和低清渲染的区别在于,低清渲染输出的图像的分辨率更低。例如,在进行图形渲染时,若采用低清渲染,则可以以多个点为单位进行着色,以减少渲染所需的算力,即无需对每个点都计算其颜色值,以多个点为单位来计算颜色值,从而减少计算颜色值所需的计算资源,提高渲染效率。
在一种可能的实施方式中,可以对第一场数据中的第一感兴趣区域(region of interest,ROI)区域进行低清渲染,得到第一图像,该第一ROI区域是预设区域或者根据预设方式确定的区域。可以理解为,当前视角区域可以分为前景和背景部分,前景可以作为ROI区域,可以对当前视角区域中的ROI区域进行低清渲染,而对背景部分进行高清渲染。通常,背景部分可能模型的复杂度低或者光照条件简单,因此进行渲染所需的算力也较低,而ROI区域可能因模型复杂度高或者光照条件复杂,因此进行渲染所需的算力较大,本申请提供低清渲染的方式来降低算力需求,从而降低整体的算力需求,使本申请提供的方法可以部署于算力较低的终端中,提高用户体验。
其中,确定第一ROI区域时,可以参考当前视角区域中的光照的信息、阴影的信息、光照反射的信息或者对象的材质来确定该第一ROI区域。该光照的信息具体可以包括光照强度、光源类型、光源入射方向或者光源数目等信息,阴影的信息可以包括阴影的面积或者存在阴影的区域的数量等与阴影相关的信息,光照反射的信息可以包括光照反射的方向、反射后的光照强度等信息。
或者,第一ROI区域还可以是根据用户输入的数据确定的区域。例如,若本申请提供的方法部署于终端中,则可以由用户通过终端的触摸屏来设置需要进行低清渲染的区域,从而使用户可以根据实际应用场景来选择ROI区域,从而使用户可以确定对哪些区域进行 低清渲染,提高用户体验。
此外,若对第一场景数据的第一ROI区域进行低清渲染,则还可以对当前视角区域中除该第一ROI区域之外的背景部分进行高清渲染,从而得到高清的背景部分的渲染图像。通常,渲染背景部分所需的计算量低于渲染ROI区域部分所需的计算量,因此本申请中可以对ROI区域进行低清渲染,对背景部分进行高清渲染,仅需耗费较少的计算量,即可得到渲染后的图像,使本申请提供的方案可以应用于算力较低的设备中,泛化能力强,提高使用算力较低设备的用户体验。
304、对第一图像进行上采样得到上采样图像。
在对当前视角区域进行低清渲染得到第一图像之后,因低清渲染输出的图像的分辨率较低,因此可以对第一图像进行上采样,从而得到分辨率更高的上采样图像。
具体地,上采样的方式可以包括插值或者转置卷积等方式,如双线性插值、双立方插值等方式,本申请示例性地,以通过插值方式进行上采样作为示例性说明,以下所提及的插值也可以替换为其他上采样操作,以下不再赘述。
305、将历史渲染图像中的目标对象投影至第一图像中,得到投影帧。
本实施例中,还获取历史渲染图像,该历史渲染图像可以是对第二场景数据进行渲染得到的。该历史渲染图像和第一图像中包括了相同的目标对象,且目标对象在历史渲染图像中的分辨率高于该目标对象在第一图像中的分辨率。
其中,历史渲染图像可以是当前视角的相邻视角或者虚拟相机的上一个视角区域对应的渲染图像,或者前一帧或者多帧渲染得到的图像,其分辨率高于第一图像,可以理解为高清的渲染图像。可以将历史渲染图像中的目标对象投影至第一图像中,即可得到投影帧,该目标对象是第一图像和历史渲染图像都包括的对象。
通常,可以为场景数据中的对象添加标识,每个对象都可以具有一个唯一的标识,即可以通过对象的标识来确定第一场景数据和第二数据所包括的共同的目标对象,并确定目标对象在各个场景中的方位、运动状态等信息。
可选地,可以根据运动矢量,将历史渲染图像中的目标对象投影至第一图像中,得到投影帧,运动矢量包括目标对象在三维场景的相邻视角区域中和当前视角区域中的偏移量。
具体地,根据目标对象的运动矢量,确定历史渲染图像中的目标对象在第一图像中的位置,并对位置进行赋值(如颜色值、深度值或者不同ID对应的值等),得到投影帧。可以理解为,可以根据对象的运动速度,确定对象从相邻视角运动至当前视角的位置,然后在第一图像中对该位置进行赋值,即可将历史渲染图像中高清的对象投影至第一图像中,得到高清的第一图像。
在一种可能的实施方式中,可以从历史渲染图像中确定第二ROI区域,该第二ROI区域和前述的第一ROI区域中包括了相同的对象。可以对历史渲染图像中的第二ROI区域投影至第一图像中,从而得到投影帧。因此,本申请实施方式中,若仅对当前视角区域中的第一ROI区域进行低清渲染,则可以仅将历史渲染图像中的第二ROI区域中的对象投影至第一图像中,从而得到投影帧。
可选地,在得到投影帧之后,还可以通过矫正网络所述上采样图像对所述投影帧进行 矫正,得到矫正后的投影帧,该矫正网络是用于对输入的图像进行滤波的神经网络,可以从时间或者空间维度上进行滤波,使投影帧中对象的像素值更准确。可以理解为,将历史渲染图像中的对象投影至第一图像之后,可能存在该历史渲染图像中的对象和第一图像中的对象重叠,或者该对象应该被其他对象遮掩等,此时可以通过矫正网络,对投影帧中对象的像素值进行矫正,得到矫正后的投影帧,使投影帧中的对象的像素值更合理。
需要说明的是,本申请对步骤304和步骤305的执行顺序不作限定,可以先执行步骤304,也可以先执行步骤305,还可以同时执行步骤304和步骤305,具体可以根据实际应用场景调整,此处不再赘述。
306、融合上采样图像和投影帧,得到目标渲染图像。
在得到上采样图像和投影帧,将投影帧中的高清信息融合至上采样图像中,得到高清的目标渲染图像。
在一种可能的实施方式中,可以通过权重系数网络获取第一图像中的各个像素点对应的第一权重,和投影帧中的各个像素点对应的第二权重,该权重系数网络可以是用于计算输入的至少两帧图像分别对应的权重的神经网络;随后,基于第一权重和第二权重融合上采样图像和投影帧,得到目标渲染图像。
在一种可能的实施方式中,因计算第一权重时是基于第一图像来进行计算的,而第一图像的分辨率较低,因此在得到第一权重之后,对第一权重进行插值,得到分辨率更高的权重矩阵,即第三权重,以使后续可以使用第二权重和第三权重来对上采样图像和投影帧进行融合,得到分辨率更高的目标渲染图像。
在一种可能的实施方式中,也可以将上采样图像和投影帧作为权重系数网络的输入,从而可以使用分辨率更高的上采样图像来计算第一权重,即可无需对权重进行插值,得到的第一权重更准确。
该权重系数网络可以是通过大量样本进行训练得到。例如,该样本可以包括大量的图像对,以及标注(如人工标注或者通过其他方式计算得到)的权重值,从而使权重系数网络可以输出更准确的权重值。且相比于通过大量计算的方式来计算权重值,本申请通过神经网络的方式来输出权重值,可以通过设备的NPU来进行计算,江都设备的GPU的着色负载。
在一种可能的实施方式中,若前述仅对当前视角区域中的第一ROI区域进行了低清渲染,且将历史渲染图像中的第二ROI区域投影至第一图像中得到投影帧之后,还可以对当前视角区域中的背景部分进行高清渲染,得到高清的背景渲染图像,然后融合了上采样图像和投影帧图像的目标渲染图像与背景渲染图像进行融合,从而对目标渲染图像中的背景部分进行补充,得到更新后的目标渲染图像。因此,在本申请实施方式中,可以对ROI区域进行低清渲染,通过复用历史渲染图像中高清的ROI来提高目标渲染图像中ROI区域的清晰度,而背景部分高清渲染所需的计算资源较少,因此消耗较少的计算资源即可得到高清的渲染图像。因此本申请提供的方法可以部署于各种设备,具有较强的泛化能力。如可以适应算力较低的设备,如手机、平板等电子设备,从而提高使用这些电子设备的用户的体验。
307、对第一场景数据进行高清渲染,得到第二图像。
其中,若确定不对当前视角区域进行高清渲染,则确定该当前视角区域的渲染方式为高清渲染,即可对当前视角区域进行高清渲染,得到第二图像。
在对下一个视角区域进行渲染时,该第二图像即可作为新的历史渲染图像,以辅助后续的视角区域进行处理,提高图像的清晰度。
因此,本申请实施方式中,可以通过间隔高清渲染和低清渲染的方式来降低算力需求,从而使本申请提供的方法可以部署于算力更低的设备中,如终端设备,泛化能力强。从而使终端设备也可以实现对三维模型的渲染,如应用在三维游戏,AR应用等场景中,从而提高终端设备的用户的体验。并且,在进行低清渲染之后,还可以复用投影帧中的高清信息来提升当前视角区域对应的图像的清晰度,从而得到高清的渲染图像。
前述对本申请提供的图形渲染方法的流程进行了介绍,为便于理解,下面对本申请提供的图形渲染方法的流程进行更详细的描述。
应理解,本申请提供的图形渲染方法可以应用在对三维场景的渲染中,该三维场景中需要渲染的区域可以随着虚拟相机的移动而改变。通常为提高用户体验,可以对虚拟相机的视角区域进行渲染,而虚拟相机可能在移动,因此需要对虚拟相机的不断改变的视角区域进行连续渲染,本申请中,将某一段时间内的视角区域作为一帧需要渲染的子场景,以下将当前视角区域称为当前帧,即可连续对多帧场景进行渲染,图4中示出了对其中一帧进行渲染的完整流程。
首先,确定虚拟相机的当前视角区域401。
当前视角区域即三维场景中需要渲染的区域。例如,当用户使用AR眼镜来进行AR游戏时,该AR眼镜即可作为虚拟相机,用户通过控制AR眼镜的朝向,观看三维场景中的各个区域。例如,如图5所示,三维场景可以理解为环绕虚拟相机的场景,该虚拟相机可以理解为处理三维场景中,该虚拟相机的视场角可以根据实际应用场景设定,可以预先设定较大的视场角,相应地当前视角区域的范围也就越大,也可以预先设定较小的视场角,相应地当前视角区域的范围也就越小。
应理解,在确定了虚拟相机在三维场景中当前视角区域之后,即可以从三维场景对应的数据中提取到该当前视角区域对应的数据,即第一场景数据,以进行后续的渲染。
然后,确定渲染方式402。
其中,渲染方法可以包括低清渲染403和/或高清渲染408,高清渲染即为场景设置更多的点来渲染,低清渲染则为场景设置更少的点来渲染,高清渲染输出的图像的分辨率高于低清渲染输出的图像的分辨率。
具体地,可以通过多种方式来确定渲染方式,若当前视角区域与相邻视角区域之间的场景切换,或者说背景切换,如用户在进行三维游戏时,从当前场景中进入副本,即进去了新的场景中;或者,当前视角区域中的对象的运动矢量高于预设运动值,运动矢量包括对象在三维场景的相邻视角区域和所述当前视角区域之间的偏移量,如光流或者对象的运动速度等;或者,当前视角区域与相邻视角区域之间的光照的变化超过预设变化量,光照的信息包括光照强度、光源类型、光源入射方向或者光源数目中的至少一种,通常,光照 变化可能引起三维模型的变化较大,因此需要高清渲染才能得到更清晰的图像;或者,当前视角区域与上一次进行高清渲染的区域之间间隔N帧,N为正整数。
因此,本申请实施方式中,可以适应性地根据各种场景来确定渲染方式,在一些变化较小的场景中可以使用低清渲染,并复用前一帧的渲染图像中的高清信息,从而得到当前帧的高清的图像,在降低计算量的基础上也可以得到高清图像。
本申请中,对当前视角区域进行高清渲染和低清渲染具有不同的处理流程,其中,若当前视角区域的渲染方式包括低清渲染,则如图4中所示执行步骤403-407,若当前视角区域的渲染方式为高清渲染,则执行步骤408,下面分别进行说明。
一、低清渲染
403、低清渲染
其中,对当前视角区域进行低清渲染403,得到渲染后的第一图像。
具体地,渲染的方式可以包括多种,如使用前向渲染管线、延时渲染管线或者AI着色等方式来进行渲染。
示例性地,下面对一些可能的渲染方式进行举例说明。
例如,GPU硬件可支持可编程渲染管线,通常可以包括顶点着色器、形状(图元)装配、几何着色器、光栅化或片段着色器等。常用的顶点着色器一般用于模型顶点的坐标变换,几何着色器可以用于对各个顶点形成的图形进行着色,在对几何着色器输出的几何图形进行光栅化后,片段着色器则用于计算最终显示的像素点颜色,并对图形进行着色。然后即可对片段着色器输出的着色后的各个图形进行混合,从而得到完整的渲染图像,并对渲染图像进行测试,如平滑像素点或者去噪等,得到最终输出的渲染图像。
又例如,可以通过基于物理着色模型(physic based rendering,PBR)来进行渲染,在片段着色阶段,每个像素点的着色方程如下所示:
其中L
o为沿着
方向出射光的颜色,L
e为物体表面沿着
方向的自发光颜色,L
i为物体表面沿着
入射光方向的颜色,F为物体表面根据入射光与出射光的反射分布函数,如
其中
为法线分布函数,
为几何函数,
为菲涅尔函数。由上面着色方程的计算公式可知,每个像素的反射部分颜色的计算复杂度与光源数目线性相关,且计算函数也较为复杂,因此可以根据模型的光源数目确定当前视角区域中的ROI区域,准确地找到当前视角区域中计算复杂度较高的区域。
还例如,在AI领域,可以计算较为高效的乘法和加法操作,如卷积核的计算公式如下:
使用AI的方式来进行渲染,可以降低着色计算的复杂度,从而降低GPU的负载。
还例如,在图形渲染管线中,可以通过离屏渲染的方式(offscreen render)分别将模型渲染到高清或者低清不同的贴图上,即在渲染时GPU在当前屏幕缓冲区以外新开辟一个缓冲区进行渲染操作,而不通过显示屏显示,从而得到渲染后的图像。
404、上采样
其中,因对当前视角区域进行了低清渲染,因此得到的第一图像分辨率较低,可以对该第一图像进行上采样,从而实现提高图像分辨率的目的。
具体地,可以通过插值的方式进行上采样,如双线性插值、三线性插值等,也可以是使用双立方(Bicubic)插值,或者补0插值、转置卷积或者双立方插值等其他插值算法。
为了降低GPU负载,本申请实施方式中采用了间隔低清分辨率渲染。在进行了低清渲染之后,为提高最终得到的图像的分辨率,可以对低清渲染输出的图像进行上采样来提高图像的分辨率,从而得到分辨率更高的图像。
405、投影
其中,历史渲染图像为对上一帧进行渲染后得到的高清图像,得到历史渲染图像的方式与目标渲染图像的方式类似。可以将历史渲染图像中的全部或者部分对象投影至低清渲染得到的第一图像中,从而得到高清的投影帧。
可以理解为,相当于将历史渲染图像中所包括的高清的对象投影至当前帧的第一图像中,得到包括了高清信息的投影帧。
具体地,可以根据运动矢量来对历史渲染图像进行投影,如根据运动矢量计算出历史渲染图像中的对象在第一图像中的位置,根据该位置将历史渲染图像中的高清对象投影至第一图像中,得到高清的投影帧。相当于复用历史渲染图像中包括的高清信息,将历史渲染图像中包括的高清的对象投影至第一图像中,从而使投影后得到的投影帧中具有高清的信息,相当于增加了第一图像中各个对象的清晰度。
如图6所示,投影帧(t-1)的每个像素点通过渲染器生成的运动矢量(motion vector,mv),计算出历史渲染图像和当前视角区域中共同的对象在当前帧空间的坐标位置,然后基于该坐标位置赋予颜色值,得到投影后的投影帧,即将历史渲染图像中的信息赋予至当前视角区域的渲染图像中,从而提高当前视角区域的渲染图像的清晰度。
406、矫正
投影后得到的投影帧中可能存在部分对象被遮挡或者鬼影等问题,因此通常可以对投影帧进行矫正,得到矫正后的投影帧。
具体地,由于前一帧的部分像素点,在当前视角区域可能被遮挡,导致投影得到的图像会存在鬼影等负向问题,通常需要对反向投影的投影帧进行矫正。
为便于理解,示例性地以一个场景为例进行介绍。例如,如图7所示,若当前视角区域包括对象A和B,在历史渲染图像中A和B为分别独立的对象,而在当前视角区域中,A 和B可能出现部分重叠,若仅将历史渲染图像中的A和B投影至当前帧中,则可能导致A和B都出现在同一个区域中,导致其中一个对象出现鬼影,因此需要对投影帧进行校正,如确定A和B相对于虚拟相机的深度,然后根据该深度确定当前视角区域中哪一个对象更靠近虚拟相机,并将该重叠区域的像素值矫正为该对象的像素值。
具体的矫正方式可以包括多种,如AABB裁剪(AABB clipping and clamping),凸壳剪裁(convex hull clipping),或者方差裁剪(variance clipping)等,还可以采用神经网络的方式来进行校正,以减少GPU的工作量。
例如,如图8所示,以AABB裁剪为例,对反向投影的像素点在当前帧周围若干干像素点(如5或者9)的颜色计算最大值与最小值,得到该像素点的AABB包围盒,AABB包围盒的虚线框代表反向投影到当前帧的某个像素点周维像素的颜色空间,矩阵边上3个点(颜色不同,图中未示出)代表当前投影位置周围3个点的颜色,投影帧中投影点的颜色如果在AABB包围盒的外面,表示该颜色不符合当前帧被投影点的颜色分布,需要进行矫正,可以直接与被投影点的颜色相减得到颜色向量,然后颜色向量与AABB包围盒求交点,得到投影点的颜色纸,最后进行插值得到最终矫正后的颜色。其他矫正方式也类似,区别在于计算包围盒的方式不相同,此处不在赘述。
可选地,在一种可能的实施方式中,可以采用神经网络来实现对投影帧的矫正,以下将该网络称为矫正网络。该矫正网络的输入可以包括投影帧,还可以包括上采样图像。该矫正网络可以通过多种网络来实现,如CNN,DNN,深度卷积神经网络(deep convolutional neural networks,DCNN),循环神经网络(recurrent neural network,RNN)等等,还可以是通过结构搜索构建得到的神经网络,本申请此处以及以下所提及的神经网络的结构不做限定。
本申请所提及的神经网络,可以是使用训练集进行训练得到的神经网络,例如,可以采集大量的样本对,每个样本对中包括投影帧和清晰度较低的当前帧(可以理解为前述的上采样图像),以及高清的矫正后的图像,通过大量的样本对对矫正网络进行有监督训练,从而使矫正网络输出矫正后的更高清的图像。
示例性地,该矫正网络中可以包括多层卷积层,在训练过程中,可以通过对输出图像与渲染得到的高清图像逐像素计算L1loss,通过数据驱动的方式来学习矫正图像的规则。在推理阶段中,即可利用训练好的网络模型进行推理,输出矫正后的图像。
407、融合
在得到上采样图像和投影帧(或者替换为矫正后的投影帧)之后,即可融合该上采样图像和投影帧,从而得到更高清的目标渲染图像。
其中,上采样图像和投影帧的融合方式可以包括多种,如通道拼接或加权融合等方式,本申请示例性地以加权融合为例进行示例性说明,在实际应用场景中也可以替换为其他融合方式,本申请对此并不作限定。
首先,在融合上采样图像和投影帧之前,可以计算融合权重,即进行融合时上采样图像和投影帧分别所占的权重的值。
例如,可以结合运动矢量来计算上采样图像和投影帧分别所占的权重,如各个对象的 运动矢量和投影帧的权重呈负相关关系,即对象的运动矢量越大,投影帧中该对象的区域所占的权重也就越小,从而避免因对象运动导致的对象不清晰。
可选地,本申请实施方式中,可以通过神经网络来输出上采样图像和投影帧分别所占的权重,以下将该神经网络称为权重系数网络。该权重系数网络可以通过各种结构来实现,通过对该权重系数网络的训练来使,如CNN,DNN,DCNN,RNN或者回归网络等等,还可以是通过结构搜索构建得到的神经网络,本申请对此不做限定。
具体地,在权重系数网络的训练阶段,可以使用大量的样本来对权重系数网络进行训练,使权重系数网络可以输出一个和最终的输出图像的大小相同的单通道图像(以下称为融合权重图),每个像素值即表示该像素点的融合系数。在推理阶段,即可将第一图像或者上采样图像中的其中一个,以及投影帧作为权重系数网络的输入,输出融合权重图。其中,若将第一图像和投影帧作为权重系数网络的输入,则输出的融合权重图的大小可能与第一图像相同,则此时可以对该融合权重图进行插值放大,使该融合权重图的大小和最终输出的图像的大小匹配,实现上采样图像和投影帧的融合。
可选地,还可以对输入至权重系数网络的图像进行缩小处理,从而使输入至权重系数网络的图像跟小,降低权重系数网络的计算量。在得到融合权重图之后,即可对该融合权重图进行插值放大,使该融合权重图的大小和最终输出的图像的大小匹配,实现上采样图像和投影帧的融合。因此,本实施例中,可以通过缩小权重系数网络的输入图像的尺寸的方式,降低权重系数网络的计算量,并通过插值的方式来提高融合权重图的分辨率,从而使最终的融合权重图的大小与上采样图像以及投影帧的大小匹配,以便于后续融合。
例如,如图9所示,将上采样图像和矫正后的投影帧进行缩小处理,如进行下采样得到分辨率更小的图像,然后将分辨率更小的图像作为权重系数网络的输入,输出分辨率更小的图像分别对象的融合权重图,然后对融合权重图进行方法处理,如进行插值,得到分辨率更高的融合权重图,其中包括上采样图像或者矫正后的投影帧对应的权重值。
其次,在计算得到上采样图像和投影帧分别对应的权重之后,即可融合上采样图像和投影帧,得到高清的目标渲染图像。例如,每个像素点的像素值可以表示为:color=α*current+(1-α)*previous,其中,current表示上采样图像中的像素点的值,previous表示投影帧中对应像素点的值,α表示上采样图像对应的权重,(1-α)表示投影帧对应的权重。
下面示例性地介绍一些可能的神经网络。
通常,当神经网络的功能越聚焦,利用的先验知识越多,就可以大大降低神经网络求解的复杂度,因此本申请充分利用先验知识进行精细化专用AI网络设计,分别使用数据驱动的方式学习神经网络的权重,替换传统的矫正算法,插值算法或者手工设计融合系数的方法。精细化的专用AI网络大大缩小网络的求解空间,减少了网络复杂度。例如,使用的神经网络可以是针对移动端的轻量级U-Net网络,矫正网络的输入可以包括投影帧,输出可以包括矫正后的投影帧,权重系数网络的输入可以包括上采样图像和矫正后的投影帧,输出单通道的融合权重图,每个像素点表示上采样图像或者矫正后的投影帧中对应像素点的权重。
在训练阶段,通过对loss函数端到端的优化数据驱动的方式,求解网络参数,loss函数的计算方式可以表示为:
其中w
1,w
2,w
3为不同的累加权重值,可以是预先设定的值或者训练更新的值,gt表示用于训练阶段的高清参考帧,prev
correct为投影矫正后的投影帧,color
blend表示融合后最终渲染的图像。最后一项用于多帧的时域平滑,将渲染的不同帧在N帧范围内求插值,使得输出结果更加平滑。
当训练完成后,在推理阶段,即可单独将训练后的神经网络部署于设备中,从而输出对应的值。
二、高清渲染
408、高清渲染
若确定对当前视角区域进行高清渲染,则可以得到高清的目标渲染图像。其中,高清渲染的渲染过程可以参阅前述步骤403的描述,区别在于在进行高清渲染时,进行着色的点多于低清渲染,从而可以得到分辨率更高的渲染图像。高清渲染和低清渲染的区别在于,在进行渲染时,所渲染的点的数量不相同,如对同一个场景分别进行高清渲染和低清渲染时,低清渲染的点的数量少于高清渲染的点的数量,从而使高清渲染得到的图像的分辨率高于低清渲染得到的图像的分辨率,高清渲染得到的图像更清晰。
例如,低清渲染和高清渲染的区别可以如图10所示,在进行低清渲染或者高清渲染时,渲染的点在时间和空间上是离散的点,在低清渲染时,对每个渲染的像素点着色一次,而在高清渲染时,对每个渲染的点着色四次,在进行光栅化后,显然高清渲染得到的图像的分辨率高于低清渲染得到的图像的分辨率,高清渲染得到的图像中所包括的信息更多。
409、后处理
此外,在通过高清渲染或者低清渲染的方式得到高清的目标渲染图像之后,即可对该目标渲染图像进行后续处理,在不同的场景中后处理的过程也可能不相同。
例如,若当前场景为对VR游戏中的三维场景进行渲染,则得到渲染后的图像之后,该渲染后的图像可以如图11所示,即可将该渲染图像进行图像优化或者传输至显示模块进行显示等。
又例如,在得到目标渲染图像之后,即可将该目标渲染图像保存至存储器中,以便于后续可以直接从存储器中获取到渲染图像。
在对下一帧进行渲染时,即可将目标渲染图像作为历史渲染图像,使在对下一帧进行渲染时,可以复用历史渲染图像中所包括的高清信息,得到高清的渲染图像。
因此,本申请实施方式中,可以适应性地选择高清渲染或者低清渲染对当前帧进行渲染,从而通过低清渲染来降低对三维场景进行渲染所需的计算量。并且,在进行低清渲染 时,可以复用高清的历史渲染图像中所包括的高清信息来提高低清渲染输出的图像的清晰度,因此即使进行的是低清渲染,也可以得到高清的渲染图像,在降低渲染所需算力的同时,也可以得到高清的图像,因此本申请提供的方法可以适应算力更低的设备,泛化能力强,具有较高的用户体验。
通常,在一些场景中,当前视角区域中的前景部分被渲染时所耗费的计算量更大,而北京部分则可能渲染所需的计算量较小,因此,可以选择对前景部分进行高清渲染或者低清渲染,而对背景部分则进行高清渲染,然后融合分别渲染后的前景部分和背景部分,得到高清的前景部分和背景部分。下面对此场景进行更详细的介绍。
参阅图12,本申请提供的另一种图形渲染方法的流程示意图。
其中,流程与前述图4中的步骤类似,对于类似的部分不再赘述,下面对一些有区别的步骤进行介绍。
1201、确定当前视角区域
1202、确定渲染方式
1203,确定ROI区域
其中,该ROI区域(或者称为第一ROI区域)可以是根据三维场景中的模型复杂度来确定的区域,也可以是根据用户输入数据确定的区域等,并从该ROI区域中确定需要进行低清渲染的模型。
例如,可以预先对三维场景中的对象进行分类,将对象按照复杂度进行分类,对于一些需要较少几何图形组成的对象分为较简单的模型,而对于需要较多几何图形组成的对象则分为复杂模型等,在进行渲染时,即可根据对象的分类确定对象的模型复杂度,从而确定是否对其进行低清渲染。
又例如,可以根据当前视角区域中的光照的信息、阴影的信息、光照反射的信息或者对象的材质来确定ROI区域。如光照强度、光源类型、光源入射方向、光源数目、阴影的面积、存在阴影的区域的数量、光照反射的方向或者反射后的光照强度等信息来确定模型的复杂度,从而将复杂度高的模型所在的区域作为ROI区域。
还例如,可以根据用户选择的区域来确定ROI区域。如本申请提供的方法可以部署于终端中,用户可以通过终端的触控屏从三维场景的当前视角区域中选择区域作为ROI区域,从而可以根据用户的需求来进行渲染,提高用户体验。
还例如,当前视角区域中的ROI区域可以是根据用户的选择确定的高精度贴图区域,如可以是用户选定的工作流贴图,如金属工作流:颜色贴图(albedo)、金属度贴图(metalness)、粗糙度贴图(roughness)、法线贴图(normal);反射度-光滑度工作流:漫反射贴图(diffuse)、反射贴图(specular)、光泽度(glossness/smoothness)、法线贴图以及其他预烘焙的效果贴图等。
例如,如图13所示,在三维场景中选择出其中一个渲染较复杂的人物作为ROI区域,并将该ROI区域的背景部分过滤,得到需要进行低清渲染的人物模型。需要说明的是,此处的人物模型仅仅是为便于区分的示例性说明,在实际应用场景中该三维场景中通常可能不会在显示界面中显示。
因此,本申请实施方式中,可以对渲染中的光照计算复杂或者模型材质复杂的区域作为ROI区域来进行间隔高清渲染或低清渲染,相比于进行高清渲染,低清渲染的方式可以显著地降低渲染所需的计算量,降低对设备的算力需求,使算力较低的设备也可以对ROI区域进行渲染,并通过后续的复用历史渲染图像来提升当前视角区域的ROI
1204、低清渲染
在确定当前视角区域中的ROI区域之后,即可对该ROI区域进行低清渲染,其渲染过程与前述步骤403类似,区别仅在于渲染区域的大小,即ROI区域可能小于当前视角区域,此处不再赘述。
1205、上采样。
在对当前视角区域中的ROI区域进行渲染之后,即可对渲染后的ROI区域的图像进行上采样,得到分辨率更高的上采样图像,即分辨率更高的ROI区域的渲染图像。上采样过程可以参阅前述步骤404,此处不再赘述。
1206、投影
其中,与前述步骤405的区别在于,此处是将历史渲染图像中的ROI区域(或者称为第二ROI区域)投影至第一图像中,得到投影帧。该第二ROI区域和前述的第一ROI区域包括了相同的对象,如前述第一ROI区域中的对象包括“猫”,则该第二ROI区域中也包括了该“猫”。
具体的投影方式与前述步骤405类似,此处不再赘述。
1207、矫正
其中,投影后得到的投影帧中可能存在部分对象被遮挡或者鬼影等问题,因此通常可以对投影帧进行矫正,得到矫正后的投影帧。矫正方式与前述步骤406类似,此处不再赘述。
1208、融合ROI区域
在进行上采样得到分辨率更高的当前视角区域中ROI区域的渲染图像,以及对投影帧进行矫正,得到矫正后的投影帧之后,即可对当前视角区域中ROI区域的渲染图像以及矫正后的投影帧进行融合,其融合过程与前述步骤407类似,此处不再赘述。
因此,本申请实施方式中,可以对当前视角区域中的ROI区域进行低清渲染,然后复用历史渲染图像中包括的高清的ROI区域的信息来提高当前视角区域中ROI区域的渲染图像的清晰度,从而即使在使用较低计算量的低清渲染得到低清的渲染图像的情况下,也可以复用历史渲染图像中的高清信息,使ROI区域的渲染图像更高清,在消耗较低的计算量的情况下得到高清的渲染图像。
示例性地,如图14所示,对当前视角区域中的ROI区域的低清渲染后得到低清渲染图像,并将历史渲染图像中的ROI区域投影到当前视角区域中,得到投影帧。然后融合该低清渲染图像和投影帧,即可得到清晰的输出图像。显然,低清渲染得到的图像分辨率更低,当然所需的算力也就越低。通过融合高分辨率的投影帧,从而为低清渲染得到的图像补充更多的细节,得到更高清的输出图像。
1209、高清渲染
除了对当前视角区域中的ROI区域进行低清渲染之外,还可以对当前视角区域中除ROI区域之外的背景部分进行高清渲染,从而得到高清的背景渲染图像。
其中,对背景部分进行高清渲染的方式可以参阅前述步骤408,此处不再赘述。
1210、融合ROI区域和背景部分的渲染图像
在得到ROI区域和背景部分的高清的渲染图像之后,即可对ROI区域和背景部分的渲染图像进行融合,从而得到当前视角区域的完整的渲染图像。因此,本申请实施方式中,可以将选定的模型渲染到低清的贴图上,通过上采样并且融合历史渲染图像中所包括的信息,得到高清的贴图,并融合ROI区域部分的高清贴图和背景部分的高清渲染图像,得到完整的高清的渲染图像。
其中,融合方式可以包括对ROI区域和背景部分的渲染图像进行拼接。例如,ROI区域的渲染图像可以是通过低清渲染然后通过复用历史渲染图像中的高清模型得到,背景渲染图像则可以是对当前视角区域中除ROI区域之外的背景部分进行高清渲染得到,将ROI区域的渲染图像和背景渲染图像进行拼接,或者将人物模型的渲染图像投影至背景渲染图像中,从而得到当前视角区域的完整的高清渲染图像。
示例性地,对当前视角区域中的ROI区域进行低清渲染并复用历史渲染图像中的高清信息的方式可以参阅图15。其中,f
t-1表示历史渲染图像,f
t-1
t表示从历史渲染图像中提取的ROI投影到当前帧得到的投影帧,然后通过矫正网络针对投影帧进行矫正,得到矫正后的投影帧f
t-1
recitify。然后从当前视角区域中选取ROI区域进行渲染,得到渲染后的ROI区域,然后对渲染后的ROI区域进行上采样,具体可以是通过上采样神经网络来进行上采样,从而降低GPU负载,得到上采样后的渲染图像f
t
up。然后将f
t
up和f
t-1
recitify作为回归网络(即权重系数网络)的输入,输出f
t
up对应的权重α,然后融合f
t
up和f
t-1
recitify,融合的方式可以表示为:
f
t-out即融合后的ROI区域的渲染图像。然后将当前帧中ROI区域的渲染图像即f
t-out和当前帧的背景区域的渲染图像进行融合,得到完整的渲染图像。
1211、高清渲染
1212、后处理
其中,步骤1211和步骤1212可以参阅前述步骤408和步骤409,此处不再赘述。
因此,本申请实施方式中,可以通过间隔高清渲染和低清渲染的方式来降低计算量,通过复用历史渲染图像中所包括的高清信息,来提升当前视角区域中低清渲染得到的图像的清晰度,从而在耗费较少计算量的基础上也可以得到高清的渲染图像。因此本申请提供的方法不仅可以应用于算力较高的设备中,也可以应用于算力较低的设备中,泛化能力强,可以适应更多的硬件设备。
并且,本申请实施方式中,可以通过精细化的神经网络,充分利用了TAA的先验信息,使用专用网络代替传统的GPU的计算方式,如插值算法那、投影帧启发式矫正算法那或者手工涉及权重的方式等。如投影帧矫正或者融合权重计算等可以分别采用神经网络来实现,从而可以通过CPU或者NPU来实现投影帧矫正或者融合权重等的计算,以降低GPU的负载, 降低GPU在进行渲染时所需的算力,使算力较低的终端设备也可以输出高清的渲染图像。
为便于进一步理解,下面对本申请提供的图形渲染方法的效果进行示例性介绍。
首先,在使用平台制作游戏时,对ROI区域以540p渲染,背景区域1080p渲染,然后再对ROI区域进行上采样,并融合高清的历史渲染图像中的ROI区域,得到高清的ROI区域渲染图像,然后将ROI区域渲染图像和背景区域的显然图像进行融合,即可得到最终的输出图像。如图16所示,为通过本申请提供的方式得到的渲染图像(如图16中示出的间隔高低请渲染)和通过高清渲染得到的渲染图像(如图16中示出的高清渲染)的对比。显然,通过本申请提供的方式得到的渲染图像,可以达到与1080p图像相当或者更优的渲染效果,如局部区域的锯齿更小。
更详细的,本申请提供的方案和常用的方案中针对ROI区域的渲染效果可以如图17所示,其中,(a)为直接进行渲染得到的渲染图像,(b)为通过TAA方式渲染得到的图像,(c)为通过本申请提供的方式得到的渲染图像。显然,通过本申请提供的方式得到的渲染图像更清晰,细节更丰富。
前述对本申请提供的图形渲染方法的流程进行了详细介绍,下面结合终端的结构,对本申请提供的图形渲染方法的应用的设备进行示例性介绍。
首先,常用的终端中GPU和NPU组成的结构可以如图18所示,其中GPU和NPU不共用缓存器,渲染、上采样或者融合等都由GPU来来执行,因此GPU负载较大,无法在移动终端中首先图形渲染。
本申请提供的图形方法可以应用于如图19所示的终端中,该终端可以包括GPU和CPU,当然,除了GPU和NPU还可以包括其他设备,如显示屏或者摄像头等,此处不一一赘述。其中,GPU和NPU可以共用缓存器,如图19中所示出G-Buffer缓存器。
具体地,GPU,用于对第一场景数据进行渲染,得到第一图像,第一场景数据根据虚拟相机的视角区域获得,渲染的方式包括第一渲染方式或第二渲染方式中的至少一种,第一渲染方式得到的图像的分辨率高于第二渲染方式得到的图像的分辨率;
NPU,用于当第一场景数据的渲染方式包括第二渲染方式,对第一图像进行上采样得到上采样图像;
GPU,还用于获取历史渲染图像,该历史渲染图像是对第二场景数据进行渲染得到,历史渲染图像和第一图像中共同存在目标对象,且目标对象在历史渲染图像中的分辨率高于目标对象在第一图像中的分辨率;
GPU,还用于将历史渲染图像中的目标对象投影至第一图像中,得到投影帧;
NPU,还用于融合上采样图像和投影帧,得到目标渲染图像。
其中,本申请第二方面以及任一可选实施方式的效果可以参阅前述第一方面以及任一可选实施方式的效果,此处不再赘述。
在一种可能的实施方式中,GPU,还用于使用第二渲染方式对第一场景数据中的第一感兴趣ROI区域对应的数据进行渲染,得到第一图像,第一ROI区域是预设区域或者根据预设方式从第一场景数据对应的视角区域中确定的区域。
在一种可能的实施方式中,GPU,还用于将历史渲染图像中包括目标对象的区域作为第 二ROI区域;将历史渲染图像中的第二ROI区域中的目标对象对应的区域投影至第一图像中,得到投影帧;
NPU,具体用于融合投影帧中的第二ROI区域和第一图像,得到目标渲染图像。
在一种可能的实施方式中,GPU,还用于:通过第一渲染方式对第一场景数据中的背景区域的数据进行渲染,得到背景渲染图像,背景区域是第一场景数据对应的视角区域中除第一ROI区域之外的区域;融合目标渲染图像和背景渲染图像,得到更新后的目标渲染图像。
在一种可能的实施方式中,若当前视角区域符合预设条件,则GPU,还用于确定当前视角区域的渲染方式包括第一渲染方式,预设条件包括以下一项或者多项:
第一场景数据与上一次进行渲染得到的图像之间的背景切换;或者,第一场景数据中的至少一个对象的运动矢量高于预设运动值,运动矢量包括至少一个对象在第一场景数据和第二场景数据中的偏移量;或者,第一场景数据与第二场景数据之间的光照的变化超过预设变化量,光照的信息包括光照强度、光源类型、光源入射方向或者光源数目中的至少一种;或者,第二场景数据与上一次使用第一渲染方式进行渲染的区域之间间隔N帧,N为正整数。
在一种可能的实施方式中,第一ROI区域基于第一场景数据的光照的信息、阴影的信息、光照反射的信息或者目标对象的材质中的一项或者多项计算确定。
在一种可能的实施方式中,NPU,还用于:通过权重系数网络获取第一图像中的各个像素点对应的第一权重,和投影帧中的各个像素点对应的第二权重,权重系数网络是用于计算输入的至少两帧图像分别对应的权重的神经网络;基于第一权重和第二权重融合上采样图像和投影帧,得到目标渲染图像。
在一种可能的实施方式中,NPU,还用于在融合上采样图像和投影帧得到目标渲染图像之前,通过矫正网络上采样图像对投影帧进行矫正,得到矫正后的投影帧,矫正网络是用于对输入的图像进行滤波的神经网络。
在一种可能的实施方式中,GPU,还用于在对第一场景数据进行渲染,得到第一图像之前,通过第一渲染方式对第二场景数据进行渲染,得到历史渲染图像。
例如,本申请提供的方法中的渲染步骤可以由GPU来执行,如前述图3所示步骤301-303、步骤305、步骤307,或者如前述图4中所示出的步骤403、405、408或者409等步骤,或者如前述图12中所示出的步骤1203、1204、1206、1209或者1211等步骤。而除渲染之外步骤,则可以由NPU来执行,如前述图3中的步骤304、步骤306,前述图4中所示出的404、406、407等步骤,或者如前述图12中所示出的步骤1207、1208或者1210等步骤,可以通过神经网络来实现,并可以由NPU来执行,从而降低GPU的负载。GPU可以NPU之间可以通过G-Buffer来实现数据交互,如可以由GPU将低清渲染或者高清渲染后得到的图像或者投影后得到的图像存放于G-Buffer中,NPU可以从G-Buffer中读取由GPU存储的低清渲染或者高清渲染后得到的图像,或者投影后得到的图像,以进行后续的处理,如进行矫正、融合等步骤。
因此,本申请提供的终端中,可以由NPU来运行神经网络实现渲染过程中的部分计算, 如投影帧的矫正或者图像融合等,从而降低进行渲染时对GPU的算力需求,且在所需算力较低的基础上能够复用历史渲染图像中的高清信息得到当前视角区域的高清渲染图像。因此使本申请提供的图形渲染的方法不仅可以部署于算力较高的设备中,也可以部署于算力较低的终端中,从而提高算力较低的终端的用户的体验。
参阅图20,本申请还提供一种图形渲染装置,包括:
渲染模块2002,用于对第一场景数据进行渲染,得到第一图像,第一场景数据根据虚拟相机的视角区域获得,渲染的方式包括第一渲染方式或第二渲染方式中的至少一种,第一渲染方式得到的图像的分辨率高于第二渲染方式得到的图像的分辨率;
上采样模块2003,用于若当前视角区域的渲染方式包括第二渲染方式,则对第一图像进行上采样得到上采样图像;
获取模块2001,用于获取历史渲染图像,该历史渲染图像是对第二场景数据进行渲染得到,历史渲染图像和第一图像中共同存在目标对象,且目标对象在历史渲染图像中的分辨率高于目标对象在第一图像中的分辨率;
投影模块2004,用于将历史渲染图像中的目标对象投影至第一图像中,得到投影帧。
在一种可能的实施方式中,渲染模块2002,具体用于使用第二渲染方式对第一场景数据中的第一感兴趣ROI区域对应的数据进行渲染,得到第一图像,第一ROI区域是预设区域或者根据预设方式从第一场景数据对应的视角区域中确定的区域。
在一种可能的实施方式中,投影模块2004,具体用于将历史渲染图像中包括目标对象的区域作为第二ROI区域,将历史渲染图像中的第二ROI区域中的目标对象对应的区域投影至第一图像中,得到投影帧;
融合模块2005,还用于融合投影帧中的第二ROI区域和第一图像,得到目标渲染图像,第二ROI区域和第一ROI区域中包括相同的对象。
在一种可能的实施方式中,渲染模块2002,具体用于通过第一渲染方式对第一场景数据中的背景区域的数据进行渲染,得到背景渲染图像,背景区域是第一场景数据对应的视角区域中除第一ROI区域之外的区域;
融合模块2005,还用于融合目标渲染图像和背景渲染图像,得到更新后的目标渲染图像。
在一种可能的实施方式中,该装置还可以包括:确定模块2006,用于若第一场景数据符合预设条件,则确定第一场景数据的渲染方式包括第一渲染方式,预设条件包括以下一项或者多项:第一场景数据与上一次进行渲染得到的图像之间的背景切换;或者,第一场景数据中的至少一个对象的运动矢量高于预设运动值,运动矢量包括至少一个对象在第一场景数据和第二场景数据中的偏移量;或者,第一场景数据与第二场景数据之间的光照的变化超过预设变化量,光照的信息包括光照强度、光源类型、光源入射方向或者光源数目中的至少一种;或者,第二场景数据与上一次使用第一渲染方式进行渲染的区域之间间隔N帧,N为正整数。
在一种可能的实施方式中,第一ROI区域基于第一场景数据的光照的信息、阴影的信息、光照反射的信息或者目标对象的材质中的一项或者多项计算确定。
在一种可能的实施方式中,融合模块2005,具体用于:通过权重系数网络获取第一图像中的各个像素点对应的第一权重,和投影帧中的各个像素点对应的第二权重,权重系数网络是用于计算输入的至少两帧图像分别对应的权重的神经网络;基于第一权重和第二权重融合上采样图像和投影帧,得到目标渲染图像。
在一种可能的实施方式中,该装置还包括:矫正模块2007,用于在融合上采样图像和投影帧得到目标渲染图像之前,通过矫正网络上采样图像对投影帧进行矫正,得到矫正后的投影帧,矫正网络是用于对输入的图像进行滤波的神经网络。
在一种可能的实施方式中,渲染模块2002,还用于在对第一场景数据进行渲染,得到第一图像之前,通过第一渲染方式对第二场景数据进行渲染,得到历史渲染图像。
请参阅图21,本申请提供的另一种图形渲染装置的结构示意图,如下所述。
该训练装置可以包括处理器2101和存储器2102。该处理器2101和存储器2102通过线路互联。其中,存储器2102中存储有程序指令和数据。
存储器2102中存储了前述图7-图14中的步骤对应的程序指令以及数据。更具体地,该处理器还可以是用于处理图像的处理器,如GPU或者用于处理图像的CPU等。
处理器2101用于执行前述图7-图14中任一实施例所示的图形渲染装置执行的方法步骤。
可选地,该图形渲染装置还可以包括收发器2103,用于接收或者发送数据。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有程序,当其在计算机上运行时,使得计算机执行如前述图7-图14所示实施例描述的方法中的步骤。
可选地,前述的图21中所示的图形渲染装置为芯片。
本申请实施例还提供了一种图形渲染装置,该训练装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行前述图3-图17中任一实施例所示的方法步骤。
本申请实施例还提供一种数字处理芯片。该数字处理芯片中集成了用于实现上述处理器2101、处理器2201,或者处理器2101、处理器2201的功能的电路和一个或者多个接口。当该数字处理芯片中集成了存储器时,该数字处理芯片可以完成前述实施例中的任一个或多个实施例的方法步骤。当该数字处理芯片中未集成存储器时,可以通过通信接口与外置的存储器连接。该数字处理芯片根据外置的存储器中存储的程序代码来实现上述实施例中的方法步骤。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图3-图17中任一实施例描述的方法中的步骤。
本申请实施例提供的图像处理装置或者训练装置可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使服务器内的芯片执行上述图3-图17所示实施例描述的图形渲染方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所 述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
具体地,前述的处理单元或者处理器可以包括中央处理器(central processing unit,CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者也可以是任何常规的处理器等。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如, DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
最后应说明的是:以上,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。
Claims (30)
- 一种图形渲染方法,其特征在于,包括:对第一场景数据进行渲染,得到第一图像,所述第一场景数据根据虚拟相机的视角区域获得,所述渲染的方式包括第一渲染方式或第二渲染方式中的至少一种,所述第一渲染方式得到的图像的分辨率高于所述第二渲染方式得到的图像的分辨率;当所述第一场景数据的渲染方式包括所述第二渲染方式,则对所述第一图像进行上采样得到上采样图像;获取历史渲染图像,所述历史渲染图像是对第二场景数据进行渲染得到,所述历史渲染图像与所述第一图像存在共同的目标对象,所述目标对象在所述历史渲染图像中的分辨率高于所述目标对象在所述第一图像中的分辨率;将所述历史渲染图像中的所述目标对象投影至所述第一图像中,得到投影帧;融合所述上采样图像和所述投影帧,得到目标渲染图像。
- 根据权利要求1所述的方法,其特征在于,所述对第一场景数据进行渲染,得到第一图像,包括:使用所述第二渲染方式对第一场景数据中的第一感兴趣ROI区域对应的数据进行渲染,得到所述第一图像,所述第一ROI区域是预设区域或者根据预设方式从所述第一场景数据对应的视角区域中确定的区域。
- 根据权利要求2所述的方法,其特征在于,所述将历史渲染图像中的对象投影至所述第一图像中,得到投影帧,包括:将所述历史渲染图像中包括所述目标对象的区域作为第二ROI区域;将所述历史渲染图像中的第二ROI区域中的所述目标对象对应的区域投影至所述第一图像中,得到所述投影帧;所述融合所述上采样图像和所述投影帧,得到目标渲染图像,包括:融合所述投影帧中的所述第二ROI区域和所述第一图像,得到所述目标渲染图像。
- 根据权利要求3所述的方法,其特征在于,所述对当前视角区域中进行渲染,还包括:通过所述第一渲染方式对所述第一场景数据中的背景区域的数据进行渲染,得到背景渲染图像,所述背景区域是所述第一场景数据对应的视角区域中除所述第一ROI区域之外的区域;所述方法还包括:融合所述目标渲染图像和所述背景渲染图像,得到更新后的所述目标渲染图像。
- 根据权利要求1-4中任一项所述的方法,其特征在于,所述方法还包括:当所述第一场景数据符合预设条件,确定所述第一场景数据的渲染方式包括所述第一渲染方式,所述预设条件包括以下一项或者多项:所述第一场景数据与所述虚拟相机的上一个视角区域对应的数据之间的背景切换;或者,所述第一场景数据中的至少一个对象的运动矢量高于预设运动值,所述运动矢量包括所述至少一个对象在所述第一场景数据和所述第二场景数据中的偏移量;或者,所述第一场景数据与所述第二场景数据之间的光照的变化超过预设变化量,所述光照的信息包括光照强度、光源类型、光源入射方向或者光源数目中的至少一种;或者,所述第二场景数据与上一次使用所述第一渲染方式进行渲染的区域之间间隔N帧,所述N为正整数。
- 根据权利要求2-4中任一项所述的方法,其特征在于,所述第一ROI区域基于所述第一场景数据的光照的信息、阴影的信息、光照反射的信息或者所述目标对象的材质中的一项或者多项计算确定。
- 根据权利要求1-6中任一项所述的方法,其特征在于,所述融合所述上采样图像和所述投影帧,得到目标渲染图像,包括:通过权重系数网络获取所述第一图像中的各个像素点对应的第一权重,和所述投影帧中的各个像素点对应的第二权重,所述权重系数网络是用于计算输入的至少两帧图像分别对应的权重的神经网络;基于所述第一权重和所述第二权重融合所述上采样图像和所述投影帧,得到所述目标渲染图像。
- 根据权利要求1-7中任一项所述的方法,其特征在于,在所述融合所述上采样图像和所述投影帧得到目标渲染图像之前,所述方法还包括:通过矫正网络所述上采样图像对所述投影帧进行矫正,得到矫正后的投影帧,所述矫正网络是用于对输入的图像进行滤波的神经网络。
- 根据权利要求1-8中任一项所述的方法,其特征在于,在所述对第一场景数据进行渲染,得到第一图像之前,所述方法还包括:通过所述第一渲染方式对所述第二场景数据进行渲染,得到所述历史渲染图像。
- 一种终端,其特征在于,包括GPU和NPU,所述GPU,用于对第一场景数据进行渲染,得到第一图像,所述第一场景数据根据虚拟相机的视角区域获得,所述渲染的方式包括第一渲染方式或第二渲染方式中的至少一种,所述第一渲染方式得到的图像的分辨率高于所述第二渲染方式得到的图像的分辨率;所述NPU,用于当所述第一场景数据的渲染方式包括所述第二渲染方式,对所述第一图像进行上采样得到上采样图像;所述GPU,还用于获取历史渲染图像,所述历史渲染图像是对第二场景数据进行渲染得到,所述历史渲染图像与所述第一图像存在共同的目标对象,所述目标对象在所述历史渲染图像中的分辨率高于所述目标对象在所述第一图像中的分辨率;所述GPU,还用于将所述历史渲染图像中的目标对象投影至所述第一图像中,得到投影帧;所述NPU,还用于融合所述上采样图像和所述投影帧,得到目标渲染图像。
- 根据权利要求10所述的终端,其特征在于,所述GPU,还用于使用所述第二渲染方式对第一场景数据中的第一感兴趣ROI区域对应的数据进行渲染,得到所述第一图像,所述第一ROI区域是预设区域或者根据预设方式从所述第一场景数据对应的视角区域中确定的区域。
- 根据权利要求11所述的终端,其特征在于,所述GPU,还用于将所述历史渲染图像中包括所述目标对象的区域作为第二ROI区域;将所述历史渲染图像中的第二ROI区域中的所述目标对象对应的区域投影至所述第一图像中,得到所述投影帧;所述NPU,具体用于融合所述投影帧中的所述第二ROI区域和所述第一图像,得到所述目标渲染图像。
- 根据权利要求12所述的终端,其特征在于,所述GPU,还用于:通过所述第一渲染方式对所述第一场景数据中的背景区域的数据进行渲染,得到背景渲染图像,所述背景区域是所述第一场景数据对应的视角区域中除所述第一ROI区域之外的区域;融合所述目标渲染图像和所述背景渲染图像,得到更新后的所述目标渲染图像。
- 根据权利要求10-13中任一项所述的终端,其特征在于,当所述当前视角区域符合预设条件,所述GPU,还用于确定所述当前视角区域的渲染方式包括所述第一渲染方式,所述预设条件包括以下一项或者多项:所述第一场景数据与所述虚拟相机的上一个视角区域对应的数据之间的背景切换;或者,所述第一场景数据中的至少一个对象的运动矢量高于预设运动值,所述运动矢量包括所述至少一个对象在所述第一场景数据和所述第二场景数据中的偏移量;或者,所述第一场景数据与所述第二场景数据之间的光照的变化超过预设变化量,所述光照的信息包括光照强度、光源类型、光源入射方向或者光源数目中的至少一种;或者,所述第二场景数据与上一次使用所述第一渲染方式进行渲染的区域之间间隔N帧,所述N为正整数。
- 根据权利要求11-13中任一项所述的终端,其特征在于,所述第一ROI区域基于所述第一场景数据的光照的信息、阴影的信息、光照反射的信息或者所述目标对象的材质中的一项或者多项计算确定。
- 根据权利要求10-15中任一项所述的终端,其特征在于,所述NPU,还用于:通过权重系数网络获取所述第一图像中的各个像素点对应的第一权重,和所述投影帧中的各个像素点对应的第二权重,所述权重系数网络是用于计算输入的至少两帧图像分别对应的权重的神经网络;基于所述第一权重和所述第二权重融合所述上采样图像和所述投影帧,得到所述目标渲染图像。
- 根据权利要求10-16中任一项所述的终端,其特征在于,所述NPU,还用于在所述融合所述上采样图像和所述投影帧得到目标渲染图像之前,通过矫正网络所述上采样图像对所述投影帧进行矫正,得到矫正后的投影帧,所述矫正网络是用于对输入的图像进行滤波的神经网络。
- 根据权利要求10-17中任一项所述的终端,其特征在于,所述GPU,还用于在所述对第一场景数据进行渲染,得到第一图像之前,通过所述第一渲染方式对所述第二场景数据进行渲染,得到所述历史渲染图像。
- 一种图形渲染装置,其特征在于,包括:渲染模块,用于对第一场景数据进行渲染,得到第一图像,所述第一场景数据根据虚拟相机的视角区域获得,所述渲染的方式包括第一渲染方式或第二渲染方式中的至少一种,所述第一渲染方式得到的图像的分辨率高于所述第二渲染方式得到的图像的分辨率;上采样模块,用于当所述当前视角区域的渲染方式包括所述第二渲染方式,对所述第一图像进行上采样得到上采样图像;获取模块,用于获取历史渲染图像,所述历史渲染图像是对第二场景数据进行渲染得到,所述历史渲染图像与所述第一图像存在共同的目标对象,所述目标对象在所述历史渲染图像中的分辨率高于所述目标对象在所述第一图像中的分辨率;投影模块,用于将所述历史渲染图像中的目标对象投影至所述第一图像中,得到投影帧;融合模块,用于融合所述上采样图像和所述投影帧,得到目标渲染图像。
- 根据权利要求19所述的装置,其特征在于,所述渲染模块,具体用于使用所述第二渲染方式对第一场景数据中的第一感兴趣ROI区域对应的数据进行渲染,得到所述第一图像,所述第一ROI区域是预设区域或者根据预设方式从所述第一场景数据对应的视角区域中确定的区域。
- 根据权利要求20所述的装置,其特征在于,所述投影模块,具体用于将所述历史渲染图像中包括所述目标对象的区域作为第二ROI区域,将所述历史渲染图像中的第二ROI区域中的所述目标对象对应的区域投影至所述第一图像中,得到所述投影帧;所述融合模块,还用于融合所述投影帧中的所述第二ROI区域和所述第一图像,得到所述目标渲染图像,所述第二ROI区域和所述第一ROI区域中包括相同的对象。
- 根据权利要求21所述的装置,其特征在于,所述渲染模块,具体用于通过所述第一渲染方式对所述第一场景数据中的背景区域的数据进行渲染,得到背景渲染图像,所述背景区域是所述第一场景数据对应的视角区域中除所述第一ROI区域之外的区域;所述融合模块,还用于融合所述目标渲染图像和所述背景渲染图像,得到更新后的目标渲染图像。
- 根据权利要求19-22中任一项所述的装置,其特征在于,所述装置还包括:确定模块,用于当所述第一场景数据符合预设条件,确定所述第一场景数据的渲染方式包括所述第一渲染方式,所述预设条件包括以下一项或者多项:所述第一场景数据与所述虚拟相机的上一个视角区域对应的数据之间的背景切换;或者,所述第一场景数据中的至少一个对象的运动矢量高于预设运动值,所述运动矢量包括所述至少一个对象在所述第一场景数据和所述第二场景数据中的偏移量;或者,所述第一场景数据与所述第二场景数据之间的光照的变化超过预设变化量,所述光照的信息包括光照强度、光源类型、光源入射方向或者光源数目中的至少一种;或者,所述第二场景数据与上一次使用所述第一渲染方式进行渲染的区域之间间隔N 帧,所述N为正整数。
- 根据权利要求20-22中任一项所述的装置,其特征在于,所述第一ROI区域基于所述第一场景数据的光照的信息、阴影的信息、光照反射的信息或者所述目标对象的材质中的一项或者多项计算确定。
- 根据权利要求19-24中任一项所述的装置,其特征在于,所述融合模块,具体用于:通过权重系数网络获取所述第一图像中的各个像素点对应的第一权重,和所述投影帧中的各个像素点对应的第二权重,所述权重系数网络是用于计算输入的至少两帧图像分别对应的权重的神经网络;基于所述第一权重和所述第二权重融合所述上采样图像和所述投影帧,得到所述目标渲染图像。
- 根据权利要求19-25中任一项所述的装置,其特征在于,所述装置还包括:矫正模块,用于在所述融合所述上采样图像和所述投影帧得到目标渲染图像之前,通过矫正网络所述上采样图像对所述投影帧进行矫正,得到矫正后的投影帧,所述矫正网络是用于对输入的图像进行滤波的神经网络。
- 根据权利要求19-26中任一项所述的装置,其特征在于,所述渲染模块,还用于在所述对第一场景数据进行渲染,得到第一图像之前,通过所述第一渲染方式对所述第二场景数据进行渲染,得到所述历史渲染图像。
- 一种图形渲染装置,其特征在于,包括处理器,所述处理器和存储器耦合,所述存储器存储有程序,当所述存储器存储的程序指令被所述处理器执行时实现权利要求1-9中任一项所述的方法的步骤。
- 一种计算机可读存储介质,包括程序,当其被处理单元所执行时,执行如权利要求1至9中任一项所述的方法的步骤。
- 一种计算机程序产品,其特征在于,所述计算机程序产品包括软件代码,所述软件代码用于执行如权利要求1至9中任一项所述的方法的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110486261.9 | 2021-04-30 | ||
CN202110486261.9A CN115253300A (zh) | 2021-04-30 | 2021-04-30 | 一种图形渲染方法以及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022228383A1 true WO2022228383A1 (zh) | 2022-11-03 |
Family
ID=83744664
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/088979 WO2022228383A1 (zh) | 2021-04-30 | 2022-04-25 | 一种图形渲染方法以及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115253300A (zh) |
WO (1) | WO2022228383A1 (zh) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116894911A (zh) * | 2023-03-28 | 2023-10-17 | 网易(杭州)网络有限公司 | 三维重建方法、装置、电子设备及可读存储介质 |
CN117014591A (zh) * | 2023-04-03 | 2023-11-07 | 神力视界(深圳)文化科技有限公司 | 三维场景渲染处理方法及电子设备 |
CN117061791A (zh) * | 2023-10-12 | 2023-11-14 | 深圳云天畅想信息科技有限公司 | 云视频帧自适应协作渲染方法、装置及计算机设备 |
CN117541703A (zh) * | 2024-01-09 | 2024-02-09 | 腾讯科技(深圳)有限公司 | 一种数据渲染方法、装置、设备及计算机可读存储介质 |
CN118193728A (zh) * | 2024-05-17 | 2024-06-14 | 西交利物浦大学 | 一种题型分类数学题求解方法及其装置 |
CN118245232A (zh) * | 2024-05-27 | 2024-06-25 | 北京趋动智能科技有限公司 | 基于边云协同的图像渲染任务调度方法和装置 |
CN118397166A (zh) * | 2024-06-27 | 2024-07-26 | 杭州群核信息技术有限公司 | 图像渲染方法、装置、电子设备和存储介质 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118131884A (zh) * | 2022-12-02 | 2024-06-04 | 华为技术有限公司 | 渲染的方法、装置和电子设备 |
CN116012474B (zh) * | 2022-12-13 | 2024-01-30 | 昆易电子科技(上海)有限公司 | 仿真测试图像生成、回注方法及系统、工控机、装置 |
CN116524133B (zh) * | 2023-06-30 | 2024-04-02 | 腾讯科技(深圳)有限公司 | 虚拟植被的生成方法、装置、设备及存储介质 |
CN117333364A (zh) * | 2023-09-27 | 2024-01-02 | 浙江大学 | 一种参与介质实时渲染的时域升采样方法和装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130243334A1 (en) * | 2011-09-28 | 2013-09-19 | U.S. Army Research Laboratory Attn: Rdrl-Loc-I | System and Method for Image Enhancement and Improvement |
US20140313200A1 (en) * | 2013-04-23 | 2014-10-23 | Square Enix Co., Ltd. | Information processing apparatus, method of controlling the same, and storage medium |
CN108737724A (zh) * | 2017-04-17 | 2018-11-02 | 英特尔公司 | 用于360视频捕获和显示的系统和方法 |
CN111047516A (zh) * | 2020-03-12 | 2020-04-21 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、计算机设备和存储介质 |
-
2021
- 2021-04-30 CN CN202110486261.9A patent/CN115253300A/zh active Pending
-
2022
- 2022-04-25 WO PCT/CN2022/088979 patent/WO2022228383A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130243334A1 (en) * | 2011-09-28 | 2013-09-19 | U.S. Army Research Laboratory Attn: Rdrl-Loc-I | System and Method for Image Enhancement and Improvement |
US20140313200A1 (en) * | 2013-04-23 | 2014-10-23 | Square Enix Co., Ltd. | Information processing apparatus, method of controlling the same, and storage medium |
CN108737724A (zh) * | 2017-04-17 | 2018-11-02 | 英特尔公司 | 用于360视频捕获和显示的系统和方法 |
CN111047516A (zh) * | 2020-03-12 | 2020-04-21 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、计算机设备和存储介质 |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116894911A (zh) * | 2023-03-28 | 2023-10-17 | 网易(杭州)网络有限公司 | 三维重建方法、装置、电子设备及可读存储介质 |
CN117014591A (zh) * | 2023-04-03 | 2023-11-07 | 神力视界(深圳)文化科技有限公司 | 三维场景渲染处理方法及电子设备 |
CN117061791A (zh) * | 2023-10-12 | 2023-11-14 | 深圳云天畅想信息科技有限公司 | 云视频帧自适应协作渲染方法、装置及计算机设备 |
CN117061791B (zh) * | 2023-10-12 | 2024-01-26 | 深圳云天畅想信息科技有限公司 | 云视频帧自适应协作渲染方法、装置及计算机设备 |
CN117541703A (zh) * | 2024-01-09 | 2024-02-09 | 腾讯科技(深圳)有限公司 | 一种数据渲染方法、装置、设备及计算机可读存储介质 |
CN117541703B (zh) * | 2024-01-09 | 2024-04-30 | 腾讯科技(深圳)有限公司 | 一种数据渲染方法、装置、设备及计算机可读存储介质 |
CN118193728A (zh) * | 2024-05-17 | 2024-06-14 | 西交利物浦大学 | 一种题型分类数学题求解方法及其装置 |
CN118245232A (zh) * | 2024-05-27 | 2024-06-25 | 北京趋动智能科技有限公司 | 基于边云协同的图像渲染任务调度方法和装置 |
CN118397166A (zh) * | 2024-06-27 | 2024-07-26 | 杭州群核信息技术有限公司 | 图像渲染方法、装置、电子设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN115253300A (zh) | 2022-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022228383A1 (zh) | 一种图形渲染方法以及装置 | |
CN110910486B (zh) | 室内场景光照估计模型、方法、装置、存储介质以及渲染方法 | |
CN110084874B (zh) | 对于三维模型的图像风格迁移 | |
CN111656407B (zh) | 对动态三维模型的视图进行融合、纹理化和绘制 | |
WO2021164731A1 (zh) | 图像增强方法以及图像增强装置 | |
US10970816B2 (en) | Motion blur and depth of field reconstruction through temporally stable neural networks | |
US10909659B2 (en) | Super-resolution image processing using a machine learning system | |
WO2022057598A1 (zh) | 图像渲染的方法和装置 | |
US20220237879A1 (en) | Direct clothing modeling for a drivable full-body avatar | |
DE102021121109A1 (de) | Wiederherstellung dreidimensionaler modelle aus zweidimensionalen bildern | |
US11989846B2 (en) | Mixture of volumetric primitives for efficient neural rendering | |
US11615602B2 (en) | Appearance-driven automatic three-dimensional modeling | |
US11276150B2 (en) | Environment map generation and hole filling | |
CN110381268A (zh) | 生成视频的方法,装置,存储介质及电子设备 | |
WO2024002211A1 (zh) | 一种图像处理方法及相关装置 | |
Du et al. | Video fields: fusing multiple surveillance videos into a dynamic virtual environment | |
WO2024148898A1 (zh) | 图像降噪方法、装置、计算机设备和存储介质 | |
WO2021151380A1 (en) | Method for rendering virtual object based on illumination estimation, method for training neural network, and related products | |
Jiang et al. | A neural refinement network for single image view synthesis | |
US11875478B2 (en) | Dynamic image smoothing based on network conditions | |
Van Hoorick et al. | Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis | |
WO2023217867A1 (en) | Variable resolution variable frame rate video coding using neural networks | |
RU2757563C1 (ru) | Способ визуализации 3d портрета человека с измененным освещением и вычислительное устройство для него | |
EP4401041A1 (en) | Apparatus and method with image processing | |
RU2792721C2 (ru) | Способ асинхронной репроекции изображения 3d-сцены |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22794853 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22794853 Country of ref document: EP Kind code of ref document: A1 |