WO2022142546A1 - 稀疏渲染的数据驱动方法及装置、存储介质 - Google Patents

稀疏渲染的数据驱动方法及装置、存储介质 Download PDF

Info

Publication number
WO2022142546A1
WO2022142546A1 PCT/CN2021/121481 CN2021121481W WO2022142546A1 WO 2022142546 A1 WO2022142546 A1 WO 2022142546A1 CN 2021121481 W CN2021121481 W CN 2021121481W WO 2022142546 A1 WO2022142546 A1 WO 2022142546A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
texture
sparse
rendering
gpu
Prior art date
Application number
PCT/CN2021/121481
Other languages
English (en)
French (fr)
Inventor
王伟亮
Original Assignee
完美世界(北京)软件科技发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 完美世界(北京)软件科技发展有限公司 filed Critical 完美世界(北京)软件科技发展有限公司
Publication of WO2022142546A1 publication Critical patent/WO2022142546A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management

Definitions

  • the present invention relates to the field of display driving, and in particular, to a data driving method and device for sparse rendering, a storage medium, and an electronic device.
  • the mobile GPU driver (GPU Driven on Mobile) in related technologies includes TBDR (Tile Based Deferred Rendering) mode and Sparse Texture (sparse texture) mode.
  • the rendering under the Sparse Texture model is also called sparse rendering.
  • TBDR (Tile Based Deferred Rendering) mode Deferred Rendering, texture deferred rendering) is an extension of TBR (Tile Based Rendering, texture rendering), similar to TBR (texture rendering) principle; but through HSR (Hidden Surface Removal, hidden surface elimination) operation, in the implementation of Pixel Shader (pixel shader) ) before further reducing the number of fragments (fragments) that do not need to be rendered, reducing bandwidth requirements.
  • the advantages of the ST branch are: Sparse Texture (sparse texture) + Indirect Draw (indirect rendering instructions) can maximize batching in the same Shader (renderer); Geometry (geometry) information can be reconstructed through Visibility Buffer (visible buffer) , without GBuffer Pass (G buffer channel); the disadvantage is: among the three schemes of combining Texture (texture), the support rate of Bindless Texture (non-bonded texture) is very low and cannot reach the practical stage; while the soft implementation VT performance is not as good as Sparse Texture (sparse texture), Sparse Texture (sparse texture) has this feature in the A11+ GPU in iOS devices, so in iOS devices, it will take the branch of Sparse Texture (sparse texture), and this feature in Android devices Support is also very low.
  • the advantage of the TBDR (Texture Deferred Rendering) branch is that the current GPU structure of mobile devices is almost all Tile Based (texture-based) architecture. To obtain a huge performance improvement in bandwidth transmission, Special Hardware Feature (special hardware feature) support is not required; the disadvantage is: the performance is not as good as the branch process based on Sparse Texture (sparse texture).
  • a data-driven method for sparse rendering comprising: constructing layer Z buffer data of a rendering object in a mobile device; performing frustum culling and/or occlusion culling on the rendering object to obtain texture data; transferring the texture data from the GPU to the visible buffer; based on the texture data and the preset texture in the visible buffer
  • the coordinate information uses a sparse map to construct geometric data of the rendering object, and transmits the geometric data to the GPU for rendering.
  • a data driving apparatus for sparse rendering comprising: a building module for building hierarchical Z-buffer data of a rendering object in a mobile device; a processing module for building in the GPU based on The layer Z buffer data performs frustum culling and/or occlusion culling on the rendering object to obtain texture data; a transmission module, configured to transmit the texture data from the GPU to the visible buffer; a rendering module, The method is configured to use sparse texture to construct geometric data of the rendering object in the visible buffer based on the texture data and preset texture coordinate information, and transmit the geometric data to the GPU for rendering.
  • a computer-readable medium on which computer programs/instructions are stored, and when the computer programs/instructions are executed by a processor, implement the steps of the above data-driven method for sparse rendering.
  • an electronic device comprising a memory, a processor, and a computer program/instruction stored on the memory, and when the processor executes the computer program/instruction, the data-driven sparse rendering described above is implemented steps of the method.
  • a computer apparatus/device/system comprising a memory, a processor, and a computer program/instruction stored on the memory, the processor implements the above-mentioned sparse when executing the computer program/instruction Steps of a data-driven approach to rendering.
  • a computer program product comprising a computer program/instructions, when the computer program/instructions are executed by a processor, the steps of the above data-driven method for sparse rendering are implemented.
  • the layered Z buffer data of the rendering object is constructed in the mobile device, and the rendering object is subjected to view frustum culling and/or occlusion culling based on the layered Z buffer data in the GPU of the mobile device to obtain a map data, transfer the texture data from the GPU to the visible buffer; use the sparse texture to construct the geometric data of the rendering object in the visible buffer based on the texture data and the preset texture coordinate information, and use
  • the geometric data is transmitted to the GPU for rendering, and the textures related to all the texture data to be drawn in the current frame are obtained through sparse textures to construct the relevant geometric data.
  • the layer Z buffer data is transmitted and stored.
  • the data in the system memory is shunted to the texture storage area for transfer transmission, which can reduce the storage pressure and transmission bandwidth of the system memory, and solve the technical problem of excessive system memory usage and bandwidth overhead when the texture is rendered delayed in related technologies. , which balances the memory distribution in the GPU drive channel, and improves the data transmission speed and screen rendering speed of the entire GPU drive.
  • FIG. 1 is a hardware structure block diagram of a sparsely rendered data-driven mobile phone according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a data-driven method for sparse rendering according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a sparse map in an embodiment of the present invention.
  • FIG. 4 is a timing diagram of a sparse map according to an embodiment of the present invention.
  • FIG. 5 is a structural block diagram of a data driving apparatus for sparse rendering according to an embodiment of the present invention.
  • FIG. 6 is a structural diagram of an electronic device according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a computer apparatus/equipment/system according to an embodiment of the present invention.
  • Figure 8 is a block diagram of a computer program product according to an embodiment of the present invention.
  • FIG. 1 is a hardware structural block diagram of a sparsely rendered data-driven mobile phone according to an embodiment of the present invention.
  • the mobile phone 10 may include one or more (only one is shown in FIG. 1 ) processor 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, optionally, the above-mentioned mobile phone may further include a transmission device 106 and an input/output device 108 for communication functions.
  • FIG. 1 is only a schematic diagram, which does not limit the structure of the above-mentioned mobile phone.
  • cell phone 10 may also include more or fewer components than shown in FIG. 1 , or have a different configuration than that shown in FIG. 1 .
  • the memory 104 can be used to store mobile phone programs, for example, software programs and modules of application software, such as a mobile phone program corresponding to a data-driven method for sparse rendering in the embodiment of the present invention. program, so as to execute various functional applications and data processing, that is, to realize the above-mentioned method.
  • Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, memory 104 may further include memory located remotely from processor 102, which may be connected to handset 10 via a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • Transmission means 106 are used to receive or transmit data via a network.
  • the specific example of the above-mentioned network may include the wireless network provided by the communication provider of the mobile phone 10 .
  • the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission device 106 may be a radio frequency (Radio Frequency, RF for short) module, which is used to communicate with the Internet in a wireless manner.
  • RF Radio Frequency
  • FIG. 2 is a flowchart of a data driving method for sparse rendering according to an embodiment of the present invention. As shown in FIG. 2 , the process includes:
  • Step S202 constructing the HZB (Hierarchical-Z buffer, Hierarchical-Z buffer) data of the rendering object in the mobile device;
  • the mobile device in this embodiment of the present embodiment may be an IOS device, or an electronic device using a similar CPU architecture, and the rendering object is image data to be rendered and displayed using the GPU.
  • Step S204 performing frustum culling and/or occlusion culling on the rendering object based on the level Z buffer data in the GPU of the mobile device to obtain texture data;
  • Occlusion Culling means that an object is not rendered when it is occluded by other objects and is not within the visual range of the camera.
  • Occlusion culling is not automatic in 3D graphics computing.
  • Occlusion culling is different from frustum culling, which just doesn't render the camera because in the vast majority of cases the objects farthest from the camera are rendered first, and objects closer to the camera are rendered later and overwrite the previously rendered objects.
  • Objects outside the viewing angle range and objects that are occluded by other objects but still within the viewing angle range will not be culled, and still benefit from Frustum Culling when using occlusion culling.
  • By culling frustums and occluders you can reduce rendering data and increase rendering speed without changing the output image.
  • Step S206 transferring the texture data from the GPU to the visible buffer
  • Step S208 constructing geometric data of the rendering object by using sparse texture based on the texture data and preset texture coordinate information in the visible buffer, and transmitting the geometric data to the GPU for rendering.
  • the level Z buffer data of the rendering object is constructed in the mobile device, and the rendering object is subjected to frustum culling and/or occlusion culling based on the level Z buffer data in the GPU of the mobile device to obtain a map data, transfer the texture data from the GPU to the visible buffer; use the sparse texture to construct the geometric data of the rendering object in the visible buffer based on the texture data and the preset texture coordinate information, and use
  • the geometric data is transmitted to the GPU for rendering, and the textures related to all the texture data to be drawn in the current frame are obtained through sparse textures to construct the relevant geometric data.
  • the layer Z buffer data is transmitted and stored.
  • the data in the system memory is shunted to the texture storage area for transfer transmission, which can reduce the storage pressure and transmission bandwidth of the system memory, and solve the technical problem of excessive system memory usage and bandwidth overhead when the texture is rendered delayed in related technologies. , which balances the memory distribution in the GPU drive channel, and improves the data transmission speed and screen rendering speed of the entire GPU drive.
  • the level Z buffer data of the rendering object in the mobile device also includes: transmitting the level Z buffer data to the Hi-Z Buffer of the texture storage area Tile Memory (texture storage area) of the mobile device. (Tier Z buffer); transfer the layer Z buffer data from the Tile Memory (texture storage area) to the image processor GPU of the mobile device.
  • the Tile Memory (map storage area) in this embodiment is a storage area on the mobile device that is independent of the CPU, GPU, and System Memory (system memory), and is an integrated storage chip located at the device end close to the GPU.
  • Tile Memory texture storage area
  • constructing the geometric data of the rendering object using a sparse texture based on the texture data and preset texture coordinate information in the Visibility Buffer includes: requesting a sparse texture from the renderer The pixel data of the map, wherein the sparse map is the smallest texture unit in the loaded memory; the data content is loaded according to the map state of the sparse map, wherein the map state includes a mapped state and an unmapped state; The texture coordinate information is set to construct a geometric coordinate system, and the data content and the texture data are used to construct the geometric data of the rendering object in the geometric coordinate system.
  • the texture state of the sparse map after loading the data content according to the texture state of the sparse map, it also includes: judging whether the pixel data has been loaded in the texture of the current level; if the pixel data has not been loaded in the texture of the current level. , rolls the remaining pixel data back to the next level texture until loading is complete.
  • loading the data content according to the texture state of the sparse map includes: if the map state of the sparse map is a mapping state, loading the texture information of the sparse map into the Cache (cache) of the GPU, The texture information is then returned to the renderer; if the texture state of the sparse map is an unmapped state, the preset empty data is loaded into the Cache (cache memory), and the preset empty data is returned to the renderer.
  • requesting the pixel data of the sparse map from the renderer includes: judging whether the currently requested pixel data is stored in the Cache (cache) of the GPU; if the currently requested pixel data is stored in the Cache (cache) , transfer the pixel data from the Cache (cache) to the renderer; if the pixel data currently requested is not stored in the Cache (cache), increase the count of batching bufer (batch) The number of times, and determine to load the data content according to the texture state of the sparse texture.
  • FIG 3 is a schematic flowchart of a sparse texture in an embodiment of the present invention.
  • the Shader initiates a sampling command, it first checks whether the sampled pixels are located in the Cache (cache memory), and if so, directly returns the pixels to the Shader (renderer).
  • Sparse Texture supports Hardware Counter (hardware counter), in order to evaluate the frequently accessed Texture area and improve the Cache (cache memory) hit rate.
  • the Counter count increment occurs when the Cache (cache) Miss (miss), and the non-requested sample pixel does not exist.
  • the Sparse Texture (sparse texture) sampled in the Shader (renderer) adopts a Fallback (callback) mechanism.
  • the requested mip multi-level texture
  • it will fallback (callback) to the next layer of mip (multi-level texture). level texture) display.
  • the solution in this embodiment further executes the following processes: acquiring raster culling data from the GPU; transferring the raster culling data from the GPU to the light source of the Tile Memory (map storage area) In the list Light List (light source list); the raster culling data is transmitted from the Light List (light source list) to the GPU for rendering.
  • the TBDR process of this embodiment performs the following optimizations in timing: Hi-Z Buffer (level Z buffer), Light List (light list), Visibility Buffer (visible buffer) can be obtained from the System Memory (system memory). ), move it to Tile Memory (texture storage area), and combine Sparse Texture (sparse texture) in Visibility Buffer (visible buffer) to reconstruct Geometry info (geometric data), so that GBuffer Pass (G buffer channel) is not needed.
  • Hi-Z Buffer level Z buffer
  • Light List light list
  • Visibility Buffer visible buffer
  • Sparse Texture sparse texture
  • Visibility Buffer visible buffer
  • constructing the level Z buffer data of the rendering object on the mobile device includes: reading the previous frame of the rendering object from the depth buffer DepthBuffer (depth buffer) of the system memory of the mobile device the depth data; transferring the depth data from the depth buffer to the GPU of the mobile device, and constructing hierarchical Z-buffer data of the rendering object in the GPU based on the depth data.
  • DepthBuffer depth buffer
  • constructing the level Z buffer data of the rendering object on the mobile device includes: reading the previous frame of the rendering object from the depth buffer DepthBuffer (depth buffer) of the system memory of the mobile device the depth data; transferring the depth data from the depth buffer to the GPU of the mobile device, and constructing hierarchical Z-buffer data of the rendering object in the GPU based on the depth data.
  • a data driving apparatus for sparse rendering is also provided, which is used to implement the above-mentioned embodiments and preferred implementations, and what has been described will not be repeated.
  • the term "module” may be a combination of software and/or hardware that implements a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also possible and contemplated.
  • FIG. 5 is a structural block diagram of a data driving device for sparse rendering according to an embodiment of the present invention, which is applied to a mobile terminal.
  • the device includes: a building module 50 , a processing module 52 , a transmission module 54 , and a rendering module 56, of which,
  • a building module 50 used for building the level Z buffer data of the rendering object in the mobile device
  • a processing module 52 configured to perform frustum culling and/or occlusion culling on the rendering object based on the level Z buffer data in the GPU to obtain texture data;
  • a transmission module 54 configured to transmit the texture data from the GPU to the visible buffer
  • the rendering module 56 is configured to construct geometric data of the rendering object by using sparse texture based on the texture data and preset texture coordinate information in the visible buffer, and transmit the geometric data to the GPU for rendering.
  • the rendering module includes: a requesting unit for requesting pixel data of a sparse map from the renderer, where the sparse map is the smallest texture unit in the loaded memory; a loading unit for requesting pixel data according to the sparse map
  • the texture state loads the data content, wherein the texture state includes a mapped state and an unmapped state; a construction unit is used to construct a geometric coordinate system based on the preset texture coordinate information, and uses the data content and the texture data in the
  • the geometric data of the rendering object is constructed in the geometric coordinate system.
  • the loading unit further includes: a first loading sub-unit, configured to load the texture information of the sparse map into the cache memory of the GPU if the map state of the sparse map is the mapping state, and then load the texture information of the sparse map into the cache memory of the GPU. Return the texture information to the renderer; the second loading subunit is used to load the preset empty data into the cache memory if the texture state of the sparse map is an unmapped state, and then return the preset to the renderer empty data.
  • a first loading sub-unit configured to load the texture information of the sparse map into the cache memory of the GPU if the map state of the sparse map is the mapping state, and then load the texture information of the sparse map into the cache memory of the GPU.
  • the second loading subunit is used to load the preset empty data into the cache memory if the texture state of the sparse map is an unmapped state, and then return the preset to the renderer empty data.
  • the requesting unit further includes: a judging subunit for judging whether the currently requested pixel data is stored in the cache memory of the GPU; a processing subunit for judging whether the currently requested pixel data is stored in the high-speed In the buffer memory, transfer the pixel data from the cache memory to the renderer; if the pixel data currently requested is not stored in the cache memory, increase the count number of batch processing, and determine according to the sparse The texture state of the texture loads the data content.
  • the rendering module further includes: a judgment unit, configured to judge whether the pixel data is loaded in the texture of the current level after the loading unit loads the data content according to the texture state of the sparse map;
  • the scroll unit is configured to roll back the remaining pixel data to the texture of the next level until the loading is completed if the pixel data is not loaded in the texture of the current level.
  • the building module includes: a reading unit, configured to read the depth data of the previous frame of the rendering object from the depth buffer of the system memory of the mobile device; Data is transferred from the depth buffer to the mobile device's GPU, where the render object's hierarchical Z-buffer data is constructed based on the depth data.
  • the above modules can be implemented by software or hardware, and the latter can be implemented in the following ways, but not limited to this: the above modules are all located in the same processor; or, the above modules can be combined in any combination The forms are located in different processors.
  • FIG. 6 is a structural diagram of an electronic device according to an embodiment of the present invention. As shown in FIG. 6 , it includes a processor 61, a communication interface 62, a memory 63, and a communication bus 64. Among them, the processor 61, the communication interface 62, and the memory 63 complete the communication with each other through the communication bus 64.
  • the memory 63 is used to store computer programs; the processor 61 is used to execute the program stored in the memory 63.
  • the following steps are implemented : construct the level Z buffer data of the rendering object in the mobile device; perform frustum culling and/or occlusion culling on the rendering object based on the level Z buffer data in the GPU of the mobile device to obtain texture data;
  • the texture data is transmitted from the GPU to the visible buffer; based on the texture data and preset texture coordinate information, the geometry data of the rendering object is constructed using sparse textures in the visible buffer, and the geometry is Data is transferred to the GPU for rendering.
  • constructing the geometric data of the rendering object by using a sparse texture based on the texture data and preset texture coordinate information in the visible buffer includes: requesting pixel data of the sparse texture from the renderer, wherein the sparse texture A map is the smallest texture unit in the loaded memory; data content is loaded according to the map state of the sparse map, wherein the map state includes a mapped state and an unmapped state; a geometric coordinate system is constructed based on the preset map coordinate information, using The data content and the texture data construct geometric data of the rendering object in the geometric coordinate system.
  • loading the data content according to the texture state of the sparse map includes: if the map state of the sparse map is a mapping state, loading the texture information of the sparse map into the cache memory of the GPU, and then sending the data to the rendering state.
  • the renderer returns the texture information; if the texture state of the sparse map is an unmapped state, the preset null data is loaded into the cache memory, and then the preset null data is returned to the renderer.
  • requesting the pixel data of the sparse map from the renderer includes: judging whether the pixel data currently requested is stored in the cache memory of the GPU; if the pixel data currently requested is stored in the cache memory, from the cache memory The buffer memory transmits the pixel data to the renderer; if the currently requested pixel data is not stored in the cache memory, the count of batch processing is increased, and the data content is determined to be loaded according to the texture state of the sparse texture.
  • the method further includes: judging whether the pixel data is loaded in the texture of the current level; if the pixel data is in the texture of the current level If not loaded, roll back the remaining pixel data to the next level texture until loading is complete.
  • constructing the level Z buffer data of the rendering object on the mobile device includes: reading the depth data of the previous frame of the rendering object from the depth buffer of the system memory of the mobile device; The depth buffer is transmitted to the GPU of the mobile device, and the layered Z-buffer data of the rendering object is constructed in the GPU based on the depth data.
  • the communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI for short) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA for short) bus or the like.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used for communication between the above-mentioned terminal and other devices.
  • the memory may include random access memory (Random Access Memory, RAM for short), or may include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • RAM Random Access Memory
  • non-volatile memory such as at least one disk memory.
  • the memory may also be at least one storage device located away from the aforementioned processor.
  • the above-mentioned processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, referred to as CPU), a network processor (Network Processor, referred to as NP), etc.; may also be a digital signal processor (Digital Signal Processing, referred to as DSP) , Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • a computer-readable storage medium is also provided, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, the computer is made to execute any one of the foregoing embodiments.
  • a computer program product including instructions, which, when run on a computer, enables the computer to execute the data-driven method for sparse rendering described in any of the foregoing embodiments.
  • Various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all of the components in the data driving apparatus for sparse rendering according to the embodiments of the present invention.
  • DSP digital signal processor
  • the present invention can also be implemented as a program/instruction (eg, a computer program/instruction and a computer program product) for an apparatus or apparatus for performing some or all of the methods described herein.
  • Such programs/instructions implementing the present invention may be stored on a computer readable medium, or may exist in the form of one or more signals, such signals may be downloaded from an Internet website, or provided on a carrier signal, or in any form Available in other formats.
  • Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, disk storage, quantum memory, graphene-based storage media or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.
  • PRAM phase-change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM Electrically Erasable Programm
  • FIG. 7 schematically shows a computer apparatus/device/system that can implement a data-driven method for sparse rendering according to the present invention
  • the computer apparatus/device/system including a processor 710 and a computer-readable memory 720 medium.
  • Memory 720 is an example of a computer-readable medium having storage space 730 for storing computer programs/instructions 731 .
  • the computer program/instructions 731 are executed by the processor 710, each step in a data-driven method for sparse rendering described above can be implemented.
  • Figure 8 schematically shows a block diagram of a computer program product implementing the method according to the invention.
  • the computer program product includes a computer program/instructions 810 that, when executed by a processor, such as the processor 710 shown in FIG. Steps in a data-driven approach.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Generation (AREA)

Abstract

一种稀疏渲染的数据驱动方法及装置、存储介质、电子装置,其中,该方法包括:在移动设备中构建渲染对象的层次Z缓冲数据(S202);在所述移动设备的GPU中基于所述HZB层次Z缓冲数据对所述渲染对象进行视锥体剔除和/或遮挡剔除,得到贴图数据(S204);将所述贴图数据从所述GPU传输至可见缓冲区中(S206);在所述可见缓冲区中基于所述贴图数据和预设贴图坐标信息采用稀疏贴图构建所述渲染对象的几何数据,并将所述几何数据传输至所述GPU进行渲染(S208)。所述方法解决了相关技术在贴图延迟渲染时系统内存占用和带宽开销过大的技术问题,均衡了GPU驱动通道中的内存分布,提高了整个GPU驱动的数据传输速度和画面渲染速度。

Description

稀疏渲染的数据驱动方法及装置、存储介质
交叉引用
本申请要求2020年12月29日递交的、申请号为“202011598856.5”、发明名称为“稀疏渲染的数据驱动方法及装置、存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及显示驱动领域,具体而言,涉及一种稀疏渲染的数据驱动方法及装置、存储介质、电子装置。
背景技术
相关技术中的移动GPU驱动(GPU Driven on Mobile)包括TBDR(Tile Based Deferred Rendering,贴图延迟渲染)模式和Sparse Texture(稀疏贴图)模式,Sparse Texture模型下的渲染也叫稀疏渲染,TBDR(Tile Based Deferred Rendering,贴图延迟渲染)算是TBR(Tile Based Rendering,贴图渲染)的延伸,跟TBR(贴图渲染)原理相似;但是通过HSR(Hidden Surface Removal,隐藏面消除)操作,在执行Pixel Shader(像素着色)之前进一步减少了不需要渲染的fragment(片段),降低了带宽需求。
在执行Pixel Shader(像素着色)之前,对Raster(光栅)生成的每个像素都做depth test(深度测试)的比较,剔除被遮挡的像素,这就是HSR(隐藏面消除)的原理,理论上经过HSR(隐藏面消除)剔除以后,TBDR(贴图延迟渲染)每帧需要渲染的像素上限就是屏幕像素的数量(没有考虑alpha blend(alpha混合)的情况下)。而传统的TBR(贴图渲染)在执行复杂一点的游戏时可能需要渲染6倍于屏幕的像素。对比Sparse Texture(稀疏贴图)分支与TBDR(贴图延迟渲染)分支的优缺点。ST分支的优点在于:Sparse Texture(稀疏贴图)+Indirect Draw(非直接的渲染指令)可以在同一Shader(渲染器) 中最大化合批;Geometry(几何)信息可通过Visibility Buffer(可见缓冲区)重建,无需GBuffer Pass(G缓冲区通道);缺点在于:在合并Texture(贴图)的三种方案中,Bindless Texture(无粘结贴图)的支持率非常低,达不到实用阶段;而软实现的VT性能不如Sparse Texture(稀疏贴图),Sparse Texture(稀疏贴图)在iOS设备中A11+的GPU都具备该特性,所以在iOS设备中,会走Sparse Texture(稀疏贴图)的分支,Android设备中该特性也支持率非常低。TBDR(贴图延迟渲染)分支的优点在于:当前移动端设备GPU结构已几乎全部为Tile Based(基于贴图)架构,TBDR(贴图延迟渲染)分支流程在可控的Tile Buffer(贴图缓冲区)上将获得在带宽传输方面巨大的性能提升,不需要Special Hardware Feature(特殊硬件功能)支持;缺点在于:性能不如基于Sparse Texture(稀疏贴图)的分支流程。
相关技术的GPU Driven(GPU驱动)渲染流程中GPU Driven TBDR(贴图延迟渲染GPU驱动)所有Buffer(缓存区)都位于System Memory(系统存储器)中,导致System Memory(系统存储器)占用大量的宽带开销,而移动设备受限于设备和内部主板的面积,不能向PC端一样设置高宽带,进而导致移动设备的GPU渲染能力有限。
针对相关技术中存在的上述问题,目前尚未发现有效的解决方案。
发明内容
本发明提出以下技术方案以克服或者至少部分地解决或者减缓上述问题:
根据本发明的一个方面,提供了一种稀疏渲染的数据驱动方法,包括:在移动设备中构建渲染对象的层次Z缓冲数据;在所述移动设备的GPU中基于所述层次Z缓冲数据对所述渲染对象进行视锥体剔除和/或遮挡剔除,得到贴图数据;将所述贴图数据从所述GPU传输至可见缓冲区中;在所述可见缓冲区中基于所述贴图数据和预设贴图坐标信息采用稀疏贴图构建所述渲染对象的几何数据,并将所述几何数据传输至所述GPU进行渲染。
根据本发明的另一个方面,提供了一种稀疏渲染的数据驱动装置, 包括:构建模块,用于在移动设备中构建渲染对象的层次Z缓冲数据;处理模块,用于在所述GPU中基于所述层次Z缓冲数据对所述渲染对象进行视锥体剔除和/或遮挡剔除,得到贴图数据;传输模块,用于将所述贴图数据从所述GPU传输至可见缓冲区中;渲染模块,用于在所述可见缓冲区中基于所述贴图数据和预设贴图坐标信息采用稀疏贴图构建所述渲染对象的几何数据,并将所述几何数据传输至所述GPU进行渲染。
根据本发明的再一个方面,提供了一种计算机可读介质,其上存储有计算机程序/指令,所述计算机程序/指令被处理器执行时实现上述稀疏渲染的数据驱动方法的步骤。
根据本发明的又一个方面,提供了一种电子装置,包括存储器、处理器及存储在存储器上的计算机程序/指令,所述处理器执行所述计算机程序/指令时实现上述稀疏渲染的数据驱动方法的步骤。
根据本发明的又一个方面,提供了一种计算机装置/设备/系统,包括存储器、处理器及存储在存储器上的计算机程序/指令,所述处理器执行所述计算机程序/指令时实现上述稀疏渲染的数据驱动方法的步骤。
根据本发明的又一个方面,提供了一种计算机程序产品,包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现上述稀疏渲染的数据驱动方法的步骤。
通过本发明,在移动设备中构建渲染对象的层次Z缓冲数据,在所述移动设备的GPU中基于所述层次Z缓冲数据对所述渲染对象进行视锥体剔除和/或遮挡剔除,得到贴图数据,将所述贴图数据从所述GPU传输至可见缓冲区中;在所述可见缓冲区中基于所述贴图数据和预设贴图坐标信息采用稀疏贴图构建所述渲染对象的几何数据,并将所述几何数据传输至所述GPU进行渲染,利用将与当前帧要绘制的全部贴图数据相关的贴图都通过稀疏贴图来获取,构建出相关的几何数据,同时通过将层次Z缓冲数据传输和存储至贴图存储区,将系统存储器的数据分流至贴图存储区进行中转传输,可以减少系统存储器的存储 压力和传输带宽,解决了相关技术在贴图延迟渲染时系统内存占用和带宽开销过大的技术问题,均衡了GPU驱动通道中的内存分布,提高了整个GPU驱动的数据传输速度和画面渲染速度。
附图说明
通过阅读下文优选实施方式的详细描述,本发明的上述及各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。在附图中:
图1是本发明实施例的一种稀疏渲染的数据驱动手机的硬件结构框图;
图2是根据本发明实施例的一种稀疏渲染的数据驱动方法的流程图;
图3是本发明实施例中稀疏贴图的流程示意图;
图4是本发明实施例的稀疏贴图的时序图;
图5是根据本发明实施例的一种稀疏渲染的数据驱动装置的结构框图;
图6是本发明实施例的一种电子设备的结构图;
图7是根据本发明实施例的一种计算机装置/设备/系统的示意图;
图8是根据本发明实施例的一种计算机程序产品的框图。
具体实施方式
下面结合附图和具体的实施方式对本发明作进一步的描述。以下描述仅为说明本发明的基本原理而并非对其进行限制。
实施例1
本申请实施例一所提供的方法实施例可以在手机、平板或者类似的移动终端中执行。以运行在手机上为例,图1是本发明实施例的一种稀疏渲染的数据驱动手机的硬件结构框图。如图1所示,手机10可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)和用于存储数据的存储器104,可选地,上述手机还可以包括用于通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述手机的结构造成限定。 例如,手机10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。
存储器104可用于存储手机程序,例如,应用软件的软件程序以及模块,如本发明实施例中的一种稀疏渲染的数据驱动方法对应的手机程序,处理器102通过运行存储在存储器104内的手机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至手机10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输装置106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括手机10的通信供应商提供的无线网络。在一个实例中,传输装置106包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输装置106可以为射频(Radio Frequency,简称为RF)模块,其用于通过无线方式与互联网进行通讯。
在本实施例中提供了一种稀疏渲染的数据驱动方法,图2是根据本发明实施例的一种稀疏渲染的数据驱动方法的流程图,如图2所示,该流程包括:
步骤S202,在移动设备中构建渲染对象的HZB(层次Z缓冲,Hierarchical-Z buffer)数据;
本实施例的本实施例的移动设备可以是IOS设备,或者是采用类似CPU构架的电子设备,渲染对象是待使用GPU进行渲染和展示的图像数据。
步骤S204,在所述移动设备的GPU中基于所述层次Z缓冲数据对所述渲染对象进行视锥体剔除和/或遮挡剔除,得到贴图数据;
在本实施例中,Occlusion Culling(遮挡剔除)是当一个物体被其他物体遮挡住而不在摄像机的可视范围内时不对其进行渲染。遮挡剔除在3D图形计算中并不是自动进行的。因为在绝大多数情况下离camera(相机)最远的物体首先被渲染,靠近摄像机的物体后渲染并覆盖先前渲染的物体,遮挡剔除不同于视锥体剔除,视锥体剔除只是 不渲染摄像机视角范围外的物体而对于被其他物体遮挡但依然在视角范围内的物体,则不会被剔除,当使用遮挡剔除时依然受益于Frustum Culling(视锥体剔除)。通过剔除视锥体和遮挡物,可以减少渲染数据,在不改变输出图像的前提下提高渲染速度。
步骤S206,将所述贴图数据从所述GPU传输至可见缓冲区中;
步骤S208,在所述可见缓冲区中基于所述贴图数据和预设贴图坐标信息采用稀疏贴图构建所述渲染对象的几何数据,并将所述几何数据传输至所述GPU进行渲染。
通过上述步骤,在移动设备中构建渲染对象的层次Z缓冲数据,在所述移动设备的GPU中基于所述层次Z缓冲数据对所述渲染对象进行视锥体剔除和/或遮挡剔除,得到贴图数据,将所述贴图数据从所述GPU传输至可见缓冲区中;在所述可见缓冲区中基于所述贴图数据和预设贴图坐标信息采用稀疏贴图构建所述渲染对象的几何数据,并将所述几何数据传输至所述GPU进行渲染,利用将与当前帧要绘制的全部贴图数据相关的贴图都通过稀疏贴图来获取,构建出相关的几何数据,同时通过将层次Z缓冲数据传输和存储至贴图存储区,将系统存储器的数据分流至贴图存储区进行中转传输,可以减少系统存储器的存储压力和传输带宽,解决了相关技术在贴图延迟渲染时系统内存占用和带宽开销过大的技术问题,均衡了GPU驱动通道中的内存分布,提高了整个GPU驱动的数据传输速度和画面渲染速度。
可选的,在移动设备中构建渲染对象的层次Z缓冲数据之后,还包括:将所述层次Z缓冲数据传输至所述移动设备的贴图存储区Tile Memory(贴图存储区)的Hi-Z Buffer(层次Z缓冲区)中;将所述层次Z缓冲数据从所述Tile Memory(贴图存储区)传输至所述移动设备的图像处理器GPU。
本实施例的Tile Memory(贴图存储区)是移动设备上独立于CPU、GPU、System Memory(系统存储器)的存储区域,是位于设备端靠近GPU的集成存储芯片。Tile Memory(贴图存储区)与移动设备的系统存储器连接,CPU、GPU、系统存储器一起组成移动设备的GPU Driven Pipeline(GPU驱动管线)。
在本实施例的一个实施方式中,在所述Visibility Buffer(可见缓冲区)中基于所述贴图数据和预设贴图坐标信息采用稀疏贴图构建所 述渲染对象的几何数据包括:向渲染器请求稀疏贴图的像素数据,其中,所述稀疏贴图是加载内存中的最小纹理单元;根据所述稀疏贴图的贴图状态加载数据内容,其中,所述贴图状态包括映射状态和未映射状态;基于所述预设贴图坐标信息构建几何坐标系,采用数据内容和所述贴图数据在所述几何坐标系中构建所述渲染对象的几何数据。
可选的,在根据所述稀疏贴图的贴图状态加载数据内容之后,还包括:判断所述像素数据在当前级的纹理中是否加载完成;若所述像素数据在当前级的纹理中未加载完成,将剩余的像素数据回滚至下一级纹理,直到加载完成。
在Sparse Texture(稀疏贴图)中,将与当前帧要绘制的全部Primitive(图元)相关的texture(贴图)都通过Sparse Texture(稀疏贴图)来获取,那么与之相关的Geometry info(几何数据)也就可以重构出来了,在Visibility Buffer(可见缓冲区)中额外保存一套uv信息(预设贴图坐标信息),结合Sparse Texture(稀疏贴图)来重建Geometry info(几何数据),这样就不需要GBuffer Pass(G缓冲区通道)了,可以进一步提高渲染效率。
可选的,根据所述稀疏贴图的贴图状态加载数据内容包括:若所述稀疏贴图的贴图状态为映射状态,将所述稀疏贴图的纹理信息加载至所述GPU的Cache(高速缓冲存储器),再向渲染器返回所述纹理信息;若所述稀疏贴图的贴图状态为未映射状态,将预设空数据加载至Cache(高速缓冲存储器),再向渲染器返回所述预设空数据。
可选的,向渲染器请求稀疏贴图的像素数据包括:判断当前请求的像素数据是否存储在所述GPU的Cache(高速缓冲存储器)中;若当前请求的像素数据存储在Cache(高速缓冲存储器)中,从所述Cache(高速缓冲存储器)向渲染器传输所述像素数据;若当前请求的像素数据未存储在所述Cache(高速缓冲存储器)中,增加批处理batching bufer(批处理)的计数次数,并确定根据所述稀疏贴图的贴图状态加载数据内容。
图3是本发明实施例中稀疏贴图的流程示意图,当Shader(渲染器)发起采样命令时,先检查采样像素是否位于Cache(高速缓冲存储器)中,如果在则直接返回像素给Shader(渲染器);如果不在,继续以下步骤;增加Counter(计数器)计数,向Sparse Tile(稀疏 贴图)请求像素数据,如果Sparse Tile(稀疏贴图)是mapped(映射)状态,则加载texture(贴图数据)到Cache(高速缓冲存储器),再返回给Shader(渲染器);如果是unmapped(未映射)状态,则加载0字节的数据到Cache(高速缓冲存储器),向shader(渲染器)中返回空内容。
通过引入了Counter(计数器)机制,Sparse Texture(稀疏贴图)支持Hardware Counter(硬件计数器),为了评估经常访问的Texture区域,提高Cache(高速缓冲存储器)命中率。Counter(计数器)计数增加是发生在Cache(高速缓冲存储器)Miss(丢失)时,而非请求的采样像素不存在时。
在Shader(渲染器)采样Sparse Texture(稀疏贴图)采用的是一种Fallback(回调)机制,在请求的mip(多级纹理)未加载完成时,会fallback(回调)到下一层mip(多级纹理)显示。
在GPU渲染过程中,除了几何数据之外,还包括Light Grid Culling(光栅剔除)数据,通过将缓存光栅剔除数据存储buff(缓存区)放置在Tile Memory(贴图存储区)中,可以进一步节省系统存储器的带宽,提高渲染速度。
在渲染光栅剔除数据时,本实施例的方案还执行以下流程:从所述GPU中获取光栅剔除数据;将所述光栅剔除数据从所述GPU传输至所述Tile Memory(贴图存储区)的光源列表Light List(光源列表)中;从所述Light List(光源列表)将所述光栅剔除数据传输至所述GPU进行渲染。
图4是本发明实施例的稀疏贴图的时序图,在设置了Tile Memory Feature(贴图存储区特征)之后,尽可能的将Buffer(缓存区)都置于Tile Memory(贴图存储区)中,以节省带宽开销,本实施例的TBDR流程在时序上执行了如下优化:Hi-Z Buffer(层次Z缓冲区)、Light List(光源列表)、Visibility Buffer(可见缓冲区)可从System Memory(系统存储器)中移到Tile Memory(贴图存储区)中,在Visibility Buffer(可见缓冲区)中结合Sparse Texture(稀疏贴图)来重建Geometry info(几何数据),这样就不需要GBuffer Pass(G缓冲区通道)了。
在本实施例的一个实施方式中,在移动设备构建渲染对象的层次Z 缓冲数据包括:从所述移动设备的系统存储器的深度缓冲区DepthBuffer(深度缓冲区)读取所述渲染对象上一帧的深度数据;将所述深度数据从所述深度缓冲区传输至所述移动设备的GPU,并在所述GPU中基于所述深度数据构建所述渲染对象的层次Z缓冲数据。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。
实施例2
在本实施例中还提供了一种稀疏渲染的数据驱动装置,用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图5是根据本发明实施例的一种稀疏渲染的数据驱动装置的结构框图,应用在移动端,如图5所示,该装置包括:构建模块50,处理模块52,传输模块54,渲染模块56,其中,
构建模块50,用于在移动设备中构建渲染对象的层次Z缓冲数据;
处理模块52,用于在所述GPU中基于所述层次Z缓冲数据对所述渲染对象进行视锥体剔除和/或遮挡剔除,得到贴图数据;
传输模块54,用于将所述贴图数据从所述GPU传输至可见缓冲区中;
渲染模块56,用于在所述可见缓冲区中基于所述贴图数据和预设贴图坐标信息采用稀疏贴图构建所述渲染对象的几何数据,并将所述几何数据传输至所述GPU进行渲染。
可选的,所述渲染模块包括:请求单元,用于向渲染器请求稀疏贴图的像素数据,其中,所述稀疏贴图是加载内存中的最小纹理单元;加载单元,用于根据所述稀疏贴图的贴图状态加载数据内容,其中, 所述贴图状态包括映射状态和未映射状态;构建单元,用于基于所述预设贴图坐标信息构建几何坐标系,采用数据内容和所述贴图数据在所述几何坐标系中构建所述渲染对象的几何数据。
可选的,所述加载单元还包括:第一加载子单元,用于若所述稀疏贴图的贴图状态为映射状态,将所述稀疏贴图的纹理信息加载至所述GPU的高速缓冲存储器,再向渲染器返回所述纹理信息;第二加载子单元,用于若所述稀疏贴图的贴图状态为未映射状态,将预设空数据加载至高速缓冲存储器,再向渲染器返回所述预设空数据。
可选的,所述请求单元还包括:判断子单元,用于判断当前请求的像素数据是否存储在所述GPU的高速缓冲存储器中;处理子单元,用于若当前请求的像素数据存储在高速缓冲存储器中,从所述高速缓冲存储器向渲染器传输所述像素数据;若当前请求的像素数据未存储在所述高速缓冲存储器中,增加批处理批处理的计数次数,并确定根据所述稀疏贴图的贴图状态加载数据内容。
可选的,所述渲染模块还包括:判断单元,用于在所述加载单元根据所述稀疏贴图的贴图状态加载数据内容之后,判断所述像素数据在当前级的纹理中是否加载完成;回滚单元,用于若所述像素数据在当前级的纹理中未加载完成,将剩余的像素数据回滚至下一级纹理,直到加载完成。
可选的,所述构建模块包括:读取单元,用于从所述移动设备的系统存储器的深度缓冲区读取所述渲染对象上一帧的深度数据;构建单元,用于将所述深度数据从所述深度缓冲区传输至所述移动设备的GPU,并在所述GPU中基于所述深度数据构建所述渲染对象的层次Z缓冲数据。
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述各个模块以任意组合的形式分别位于不同的处理器中。
实施例3
本申请实施例还提供了一种电子设备,图6是本发明实施例的一种电子设备的结构图,如图6所示,包括处理器61、通信接口62、存储器63和通信总线64,其中,处理器61,通信接口62,存储器63通 过通信总线64完成相互间的通信,存储器63,用于存放计算机程序;处理器61,用于执行存储器63上所存放的程序时,实现如下步骤:在移动设备中构建渲染对象的层次Z缓冲数据;在所述移动设备的GPU中基于所述层次Z缓冲数据对所述渲染对象进行视锥体剔除和/或遮挡剔除,得到贴图数据;将所述贴图数据从所述GPU传输至可见缓冲区中;在所述可见缓冲区中基于所述贴图数据和预设贴图坐标信息采用稀疏贴图构建所述渲染对象的几何数据,并将所述几何数据传输至所述GPU进行渲染。
可选的,在所述可见缓冲区中基于所述贴图数据和预设贴图坐标信息采用稀疏贴图构建所述渲染对象的几何数据包括:向渲染器请求稀疏贴图的像素数据,其中,所述稀疏贴图是加载内存中的最小纹理单元;根据所述稀疏贴图的贴图状态加载数据内容,其中,所述贴图状态包括映射状态和未映射状态;基于所述预设贴图坐标信息构建几何坐标系,采用所述数据内容和所述贴图数据在所述几何坐标系中构建所述渲染对象的几何数据。
可选的,根据所述稀疏贴图的贴图状态加载数据内容包括:若所述稀疏贴图的贴图状态为映射状态,将所述稀疏贴图的纹理信息加载至所述GPU的高速缓冲存储器,再向渲染器返回所述纹理信息;若所述稀疏贴图的贴图状态为未映射状态,将预设空数据加载至高速缓冲存储器,再向渲染器返回所述预设空数据。
可选的,向渲染器请求稀疏贴图的像素数据包括:判断当前请求的像素数据是否存储在所述GPU的高速缓冲存储器中;若当前请求的像素数据存储在高速缓冲存储器中,从所述高速缓冲存储器向渲染器传输所述像素数据;若当前请求的像素数据未存储在所述高速缓冲存储器中,增加批处理批处理的计数次数,并确定根据所述稀疏贴图的贴图状态加载数据内容。
可选的,在根据所述稀疏贴图的贴图状态加载数据内容之后,所述方法还包括:判断所述像素数据在当前级的纹理中是否加载完成;若所述像素数据在当前级的纹理中未加载完成,将剩余的像素数据回滚至下一级纹理,直到加载完成。
可选的,在移动设备构建渲染对象的层次Z缓冲数据包括:从所述移动设备的系统存储器的深度缓冲区读取所述渲染对象上一帧的深 度数据;将所述深度数据从所述深度缓冲区传输至所述移动设备的GPU,并在所述GPU中基于所述深度数据构建所述渲染对象的层次Z缓冲数据。
上述终端提到的通信总线可以是外设部件互连标准(Peripheral Component Interconnect,简称PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,简称EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
通信接口用于上述终端与其他设备之间的通信。
存储器可以包括随机存取存储器(Random Access Memory,简称RAM),也可以包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(Digital Signal Processing,简称DSP)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)、现场可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
在本申请提供的又一实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述实施例中任一所述的稀疏渲染的数据驱动方法。
在本申请提供的又一实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述实施例中任一所述的稀疏渲染的数据驱动方法。
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的稀疏渲染的数据驱动装置中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置的程序/指令(例如, 计算机程序/指令和计算机程序产品)。这样的实现本发明的程序/指令可以存储在计算机可读介质上,或者可以一个或者多个信号的形式存在,这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带、磁盘存储、量子存储器、基于石墨烯的存储介质或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。
图7示意性地示出了可以实现根据本发明的一种稀疏渲染的数据驱动方法的计算机装置/设备/系统,该计算机装置/设备/系统包括处理器710和以存储器720形式的计算机可读介质。存储器720是计算机可读介质的一个示例,其具有用于存储计算机程序/指令731的存储空间730。当所述计算机程序/指令731由处理器710执行时,可实现上文所描述的一种稀疏渲染的数据驱动方法中的各个步骤。
图8示意性地示出了实现根据本发明的方法的计算机程序产品的框图。所述计算机程序产品包括计算机程序/指令810,当所述计算机程序/指令810被诸如图7所示的处理器710之类的处理器执行时,可实现上文所描述的一种稀疏渲染的数据驱动方法中的各个步骤。
上文对本说明书特定实施例进行了描述,其与其它实施例一并涵盖于所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定遵循示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可行的或者有利的。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体 意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
应可理解,以上所述实施例仅为举例说明本发明之目的而并非对本发明进行限制。在不脱离本发明基本精神及特性的前提下,本领域技术人员还可以通过其他方式来实施本发明。本发明的范围当以后附的权利要求为准,凡在本说明书一个或多个实施例的精神和原则之内所做的任何修改、等同替换、改进等,皆应涵盖其中。

Claims (12)

  1. 一种稀疏渲染的数据驱动方法,其特征在于,包括:
    在移动设备中构建渲染对象的层次Z缓冲数据;
    在所述移动设备的GPU中基于所述层次Z缓冲数据对所述渲染对象进行视锥体剔除和/或遮挡剔除,得到贴图数据;
    将所述贴图数据从所述GPU传输至可见缓冲区中;
    在所述可见缓冲区中基于所述贴图数据和预设贴图坐标信息采用稀疏贴图构建所述渲染对象的几何数据,并将所述几何数据传输至所述GPU进行渲染。
  2. 根据权利要求1所述的方法,其特征在于,在所述可见缓冲区中基于所述贴图数据和预设贴图坐标信息采用稀疏贴图构建所述渲染对象的几何数据包括:
    向渲染器请求稀疏贴图的像素数据,其中,所述稀疏贴图是加载内存中的最小纹理单元;
    根据所述稀疏贴图的贴图状态加载数据内容,其中,所述贴图状态包括映射状态和未映射状态;
    基于所述预设贴图坐标信息构建几何坐标系,采用所述数据内容和所述贴图数据在所述几何坐标系中构建所述渲染对象的几何数据。
  3. 根据权利要求2所述的方法,其特征在于,根据所述稀疏贴图的贴图状态加载数据内容包括:
    若所述稀疏贴图的贴图状态为映射状态,将所述稀疏贴图的纹理信息加载至所述GPU的高速缓冲存储器,再向渲染器返回所述纹理信息;若所述稀疏贴图的贴图状态为未映射状态,将预设空数据加载至高速缓冲存储器,再向渲染器返回所述预设空数据。
  4. 根据权利要求2所述的方法,其特征在于,向渲染器请求稀疏贴图的像素数据包括:
    判断当前请求的像素数据是否存储在所述GPU的高速缓冲存储器中;
    若当前请求的像素数据存储在高速缓冲存储器中,从所述高速缓冲存储器向渲染器传输所述像素数据;若当前请求的像素数据未存储在所述高速缓冲存储器中,增加批处理批处理的计数次数,并确定根据 所述稀疏贴图的贴图状态加载数据内容。
  5. 根据权利要求2所述的方法,其特征在于,在根据所述稀疏贴图的贴图状态加载数据内容之后,所述方法还包括:
    判断所述像素数据在当前级的纹理中是否加载完成;
    若所述像素数据在当前级的纹理中未加载完成,将剩余的像素数据回滚至下一级纹理,直到加载完成。
  6. 根据权利要求1所述的方法,其特征在于,在移动设备构建渲染对象的层次Z缓冲数据包括:
    从所述移动设备的系统存储器的深度缓冲区读取所述渲染对象上一帧的深度数据;
    将所述深度数据从所述深度缓冲区传输至所述移动设备的GPU,并在所述GPU中基于所述深度数据构建所述渲染对象的层次Z缓冲数据。
  7. 一种稀疏渲染的数据驱动装置,其特征在于,包括:
    构建模块,用于在移动设备中构建渲染对象的层次Z缓冲数据;
    处理模块,用于在GPU中基于所述层次Z缓冲数据对所述渲染对象进行视锥体剔除和/或遮挡剔除,得到贴图数据;
    传输模块,用于将所述贴图数据从所述GPU传输至可见缓冲区中;
    渲染模块,用于在所述可见缓冲区中基于所述贴图数据和预设贴图坐标信息采用稀疏贴图构建所述渲染对象的几何数据,并将所述几何数据传输至所述GPU进行渲染。
  8. 根据权利要求7所述的装置,其特征在于,所述渲染模块包括:
    请求单元,用于向渲染器请求稀疏贴图的像素数据,其中,所述稀疏贴图是加载内存中的最小纹理单元;
    加载单元,用于根据所述稀疏贴图的贴图状态加载数据内容,其中,所述贴图状态包括映射状态和未映射状态;
    构建单元,用于基于所述预设贴图坐标信息构建几何坐标系,采用数据内容和所述贴图数据在所述几何坐标系中构建所述渲染对象的几何数据。
  9. 一种电子装置,包括存储器和处理器,其特征在于,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行权利要求1至6任一项中所述的方法。
  10. 一种计算机装置/设备/系统,包括存储器、处理器及存储在存储器上的计算机程序/指令,所述处理器执行所述计算机程序/指令时实现根据权利要求1-6中任一项所述的方法的步骤。
  11. 一种计算机可读介质,其上存储有计算机程序/指令,所述计算机程序/指令被处理器执行时实现根据权利要求1-6中任一项所述的方法的步骤。
  12. 一种计算机程序产品,包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现根据权利要求1-6中任一项所述的方法的步骤。
PCT/CN2021/121481 2020-12-29 2021-09-28 稀疏渲染的数据驱动方法及装置、存储介质 WO2022142546A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011598856.5 2020-12-29
CN202011598856.5A CN112614041B (zh) 2020-12-29 2020-12-29 稀疏渲染的数据驱动方法及装置、存储介质、电子装置

Publications (1)

Publication Number Publication Date
WO2022142546A1 true WO2022142546A1 (zh) 2022-07-07

Family

ID=75248899

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/121481 WO2022142546A1 (zh) 2020-12-29 2021-09-28 稀疏渲染的数据驱动方法及装置、存储介质

Country Status (2)

Country Link
CN (1) CN112614041B (zh)
WO (1) WO2022142546A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117541744A (zh) * 2024-01-10 2024-02-09 埃洛克航空科技(北京)有限公司 一种城市级实景三维图像的渲染方法及装置

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112614041B (zh) * 2020-12-29 2022-10-25 完美世界(北京)软件科技发展有限公司 稀疏渲染的数据驱动方法及装置、存储介质、电子装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080238919A1 (en) * 2007-03-27 2008-10-02 Utah State University System and method for rendering of texel imagery
US20100231588A1 (en) * 2008-07-11 2010-09-16 Advanced Micro Devices, Inc. Method and apparatus for rendering instance geometry
WO2019088865A1 (ru) * 2017-11-01 2019-05-09 Вебгирз А Гэ Способ и система удаления невидимых поверхностей трёхмерной сцены
CN112017271A (zh) * 2019-05-31 2020-12-01 苹果公司 用于使用稀疏纹理的图形系统和方法
CN112614041A (zh) * 2020-12-29 2021-04-06 完美世界(北京)软件科技发展有限公司 稀疏渲染的数据驱动方法及装置、存储介质、电子装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1889128A (zh) * 2006-07-17 2007-01-03 北京航空航天大学 基于gpu的预计算辐射度传递全频阴影的方法
CN101527051B (zh) * 2009-03-26 2011-11-16 北京像素软件科技股份有限公司 基于大气散射原理对天空进行渲染的方法和装置
GB2534225B (en) * 2015-01-19 2017-02-22 Imagination Tech Ltd Rendering views of a scene in a graphics processing unit
US10078883B2 (en) * 2015-12-03 2018-09-18 Qualcomm Incorporated Writing graphics data from local memory to system memory
GB2546810B (en) * 2016-02-01 2019-10-16 Imagination Tech Ltd Sparse rendering
CN108648254B (zh) * 2018-04-27 2022-05-17 中科创达软件股份有限公司 一种图像渲染方法及装置
CN111311478B (zh) * 2020-03-23 2024-02-09 西安芯云半导体技术有限公司 一种gpu渲染核数据的预读取方法、装置及计算机存储介质
CN111476858B (zh) * 2020-04-10 2023-03-14 浙江无端科技股份有限公司 一种基于WebGL的2d引擎渲染方法、装置及设备
CN111508053B (zh) * 2020-04-26 2023-11-28 网易(杭州)网络有限公司 模型的渲染方法、装置、电子设备及计算机可读介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080238919A1 (en) * 2007-03-27 2008-10-02 Utah State University System and method for rendering of texel imagery
US20100231588A1 (en) * 2008-07-11 2010-09-16 Advanced Micro Devices, Inc. Method and apparatus for rendering instance geometry
WO2019088865A1 (ru) * 2017-11-01 2019-05-09 Вебгирз А Гэ Способ и система удаления невидимых поверхностей трёхмерной сцены
CN112017271A (zh) * 2019-05-31 2020-12-01 苹果公司 用于使用稀疏纹理的图形系统和方法
CN112614041A (zh) * 2020-12-29 2021-04-06 完美世界(北京)软件科技发展有限公司 稀疏渲染的数据驱动方法及装置、存储介质、电子装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117541744A (zh) * 2024-01-10 2024-02-09 埃洛克航空科技(北京)有限公司 一种城市级实景三维图像的渲染方法及装置
CN117541744B (zh) * 2024-01-10 2024-04-26 埃洛克航空科技(北京)有限公司 一种城市级实景三维图像的渲染方法及装置

Also Published As

Publication number Publication date
CN112614041B (zh) 2022-10-25
CN112614041A (zh) 2021-04-06

Similar Documents

Publication Publication Date Title
WO2022142546A1 (zh) 稀疏渲染的数据驱动方法及装置、存储介质
US7042462B2 (en) Pixel cache, 3D graphics accelerator using the same, and method therefor
US9489313B2 (en) Conditional page fault control for page residency
US9047686B2 (en) Data storage address assignment for graphics processing
CN116917927A (zh) 用于机器学习工作负载中张量对象支持的方法和装置
US9135172B2 (en) Cache data migration in a multicore processing system
US8009172B2 (en) Graphics processing unit with shared arithmetic logic unit
WO2022142547A1 (zh) 贴图延迟渲染的数据驱动方法及装置
CN109064535B (zh) Gpu中一种纹理贴图的硬件加速实现方法
WO2022062719A1 (zh) 视频处理方法、计算机可读存储介质及电子设备
EP1721298A2 (en) Embedded system with 3d graphics core and local pixel buffer
KR20060044124A (ko) 3차원 그래픽 가속을 위한 그래픽 시스템 및 메모리 장치
US20080055326A1 (en) Processing of Command Sub-Lists by Multiple Graphics Processing Units
CN116529772A (zh) 经压缩的几何形状渲染与流式传输
US9053040B2 (en) Filtering mechanism for render target line modification
US20210200255A1 (en) Higher graphics processing unit clocks for low power consuming operations
US11257277B2 (en) Methods and apparatus to facilitate adaptive texture filtering
CN116348904A (zh) 用simo方法优化gpu内核以用于利用gpu高速缓存进行缩小
TW202236205A (zh) 計算工作負載的光柵化
US8081182B2 (en) Depth buffer for rasterization pipeline
CN117435521B (zh) 基于gpu渲染的纹理显存映射方法、装置及介质
TW202301116A (zh) 波內紋理循環
TW202311940A (zh) 基於圖塊的結構中深度和陰影遍歷渲染的最佳化
JP2005063329A (ja) 3次元描画装置および描画方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21913291

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21913291

Country of ref document: EP

Kind code of ref document: A1