CN117274030A

CN117274030A - Vulkan drawing flow optimization method for mobile terminal

Info

Publication number: CN117274030A
Application number: CN202311284471.5A
Authority: CN
Inventors: 胡思超; 米楠
Original assignee: Faceunity Technology Co ltd; Zhejiang University ZJU
Current assignee: Faceunity Technology Co ltd; Zhejiang University ZJU
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2023-12-22

Abstract

The invention discloses a Vulkan drawing flow optimization method of a mobile terminal, which is designed based on the rendering API characteristics of the Vulkan and the characteristics of a GPU of the mobile terminal. Through a series of methods, the starting times of the rendering channel are reduced, and meanwhile, unnecessary frame buffering is discarded as much as possible, so that the reading and writing expense on the frame buffering when the rendering channel is used is reduced. The invention aims to reduce the data bandwidth required by the mobile terminal when drawing, designs a series of methods for packaging the graphic API using Vulkan, and optimizes the performance of the mobile terminal on the premise of not increasing the complexity of upper-layer calling.

Description

Vulkan drawing flow optimization method for mobile terminal

Technical Field

The invention belongs to the field of rendering in computer graphics, and particularly relates to a Vulkan drawing flow optimization method of a mobile terminal.

Background

The procedure of IMR (Immediate Mode Rendering) applied to the PC end needs to make read-write frame buffer and depth buffer for each triangle once during rendering, so that such architecture requires a large amount of bandwidth or large cache. The large bandwidth has the effects of large power consumption and high heat dissipation requirement, and the mobile terminal is limited by the size requirement and cannot design a large cache, so that the IMR obviously cannot be applied to the mobile terminal. Thus, for the mobile end, a TBR (Tile-Based Rendering) architecture was invented with the aim of reducing the external memory accesses required by the GPU during Rendering. The logic of the TBR is to divide the screen into small blocks (tiles), for example, 16x16 or 32x32 pixels can be used as one tile. And distributing the graphic elements to each small block for calculation, wherein each small block is provided with an independent cache, and after all calculation is finished locally, the frame buffer of the small block is written back to the main memory, and after all tiles finish work, the final frame buffer is obtained. Therefore, it is easy to see that when the mobile terminal rendering is realized, the read-write operation between the GPU and the external memory in the whole rendering flow is reduced as much as possible.

Frame Graph is a new rendering Frame, which is a technology for completing optimization operation by obtaining all information in a complete Frame, so as to analyze the dependency relationship between each rendering channel node.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a Vulkan drawing flow optimization method of a mobile terminal, which automatically merges compatible Render Pass into the same one by reducing the number of rendering channels (Render Pass) in one frame; and for the same rendering target, reducing the operation of reading the same rendering target to the Tile by the Render Pass after writing the same back to the video memory. Then automatically judging whether each texture corresponding to the rendering target of the Render Pass needs to store a calculation result and whether the previous existing result needs to be read; textures that are not necessarily accessed are automatically set to not be accessed, thereby saving bandwidth overhead. And finally, for a rendering target needing to be started in MSAA, automatically judging whether the texture in a Multiple Sample state is needed to be used as the rendering target, otherwise, designating the texture from the in-situ residual to the Single Sample when the Render Pass is finished.

Therefore, multiple storage expenses caused by MSAA are eliminated, and the low performance problem caused by the expense of read-write Frame Buffer (Frame Buffer) occurs by optimizing the Vulkan drawing flow of the mobile terminal.

The main technical scheme of the invention is as follows:

a Vulkan drawing flow optimization method of a mobile terminal encapsulates a VkRenderpass related call provided by the Vulkan, optimizes the read-write frame buffering operation in an encapsulation function body, and comprises the following steps:

(1) Packaging three function interfaces of a rendering start channel function BeginRenderpass, a drawing function Draw and a rendering end channel function EndRenderpass;

the method comprises the steps that a rendering channel VkRenderpass of Vulkan and a frame buffer object VkFrameBuffer are obtained by inputting a group of rendering Target Render Target and a bound depth map, and are stored, and whether the object stored when the BeginRenderpass is called last time is compatible with the rendering channel and the frame buffer obtained by current calling is checked;

if not, calling a corresponding vkCmdEndRenderpass function provided by Vulkan to terminate a rendering channel started last time, and storing a new object to be used really;

if compatible, the rendering channels are automatically combined without terminating the started rendering channels;

after the rendering channel starting function BeginRenderpass acquires and stores a rendering channel and frame buffer, the drawing function Draw is used for judging whether a new rendering channel needs to be started by calling an API of Vulkan or not and starting; the function interface of the end rendering channel function EndRenderpass is used for checking semantic definiteness when the upper layer calls the group of functions, and does not execute the end rendering channel function provided by Vulkan;

(2) The precondition that the starting rendering channel function is called to execute compatibility judgment to form merging is that the homogeneous rendering channel object started by the last drawing is not terminated due to other conditions before the starting rendering channel function is called; the method comprises the following steps:

if the frame buffer objects written by the rendering channels are the same, the rendering channels can be combined, and if the resources needed by each rendering channel for drawing are different, the resources need to be updated before the rendering channels start;

therefore, in a group of homogeneous rendering channels, in order to prevent the update operation of the rendering channel to be executed later from breaking the enabled homogeneous rendering channel, the update operation corresponding to the resource required by the rendering channel to be executed later needs to be advanced to the position before the group of rendering channels, so as to achieve the object by utilizing the Frame Graph; (3) When the dependency relationship between rendering channels connected by the resources is determined in the step (2), for the resources where no dependency exists, redundant write-back operation can be prevented by discarding the resources at the end of the rendering channels; abstracting each rendering channel into a rendering node by utilizing a Frame Graph, and regarding the combined rendering channels in the step (1) as the same rendering node; for each rendering node, determining, with the Frame Graph, whether each rendering target in its bound Frame buffer is dependent by a subsequent node, or on a previous node; for rendering targets without corresponding dependency relationships, redundant read-write operations on the rendering targets are avoided by setting corresponding zone bits when the rendering channel is created.

(4) For a rendering channel of the multi-sampling anti-aliasing MSAA, because the rendering channel needs to be analyzed to be converted into a common texture without multi-sampling, thereby generating a final required result, and the operation cost of writing the target of the multi-sampling back to the video memory when the rendering channel is finished and then executing the analysis is too high, special processing is required to be carried out when the rendering target is judged in the step (3) so as to avoid redundant reading and writing; specifically, when the rendering channel is created, a corresponding resolved rendering target is designated for the multiple sampling target, namely, one rendering target is additionally bound for the rendering channel, so that the resolving operation can be completed in situ when the rendering channel is finished, the multiple sampling target is discarded, and only the resolved target is required to be written back to the video memory.

Further, the step (1) comprises the following sub-steps:

the method comprises the steps of (1.1) maintaining caches of rendering channels and frame buffers, searching corresponding rendering channels and frame buffers in the caches according to rendering targets and depth maps in the input parameters when a rendering channel function is started to be invoked each time, and generating and storing new rendering channels and frame buffers in the respective caches if searching fails;

(1.2) judging the compatibility of the rendering channel in the process of starting the rendering channel function, wherein the key words for searching and creating the rendering channel and the frame buffer are only related to the parameters of the rendering target and the depth template map, and are not related to the operation of the rendering channel;

(1.3) in the draw function, judging whether the API of Vulkan needs to be called to start a new rendering channel and start;

and (1.4) in the ending rendering channel function, ensuring the integrity of the rendering hardware interface semantics, and ensuring that each starting call has a corresponding ending call.

Further, the step (2) includes the following sub-steps:

(2.1) utilizing the architecture design of Frame Graph, the majority of the resource update operations required to render the channel are put at the Frame

The construction phase of Graph, while the actual drawing flow occurs at the execution phase after construction;

(2.2) in order to avoid the situation that the result of a certain rendering node is depended on by a subsequent rendering node and the state switching of the result is caused to break a merged homogeneous rendering channel among subsequent homogeneous rendering channels, the state switching required by the resource is recorded in advance by utilizing the analysis of the Frame Graph, and the execution time of the operation of the state switching is set after the termination rendering channel function of Vulkan is truly executed, namely among different homogeneous rendering nodes, so that the result is prevented from being used as the resource to carry out the state switching among the merged homogeneous rendering channels.

The beneficial effects of the invention are as follows:

the invention provides a method for optimizing a mobile terminal Vulkan drawing flow, which is characterized in that a decision logic is added in a Frame Graph layer to order rendering channel nodes, meanwhile, the read-write operation of each color and depth attachment in each rendering channel node is decided according to the ordered result, and the texture object of an open MSAA is specially judged, so that the cost of Frame buffer read-write when the mobile terminal draws by Vulkan is saved, the performance bottleneck problem of the mobile terminal caused by bandwidth limitation is reduced to a certain extent, and the overall performance is improved.

From the foregoing embodiments, it should be understood that the general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

FIG. 1 is a bottom-level implementation of the combined Render Pass logic;

FIG. 2 is a logic flow diagram of the Load Store Flag calculation for the decision Frame Graph;

fig. 3 is an exemplary diagram of the Frame Graph after Load Store Flag calculation.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application.

The purpose of the embodiment of the application is to provide a method for optimizing the low performance problem caused by the overhead of reading and writing Frame Buffer, which occurs in the Vulkan drawing flow of the mobile terminal.

The first step: first, the function interfaces of the begin rendering channel (begin Render pass), draw (Draw), end rendering channel (endrender pass) are encapsulated, and the beginrender Render pass function functions to obtain vkrender pass and VkFrameBuffer objects by passing in a set of Render Targets and bound depth maps (Render Targets).

At this time, as shown in fig. 1, it is checked whether the object set when the begintenderpass is called last time is compatible with the currently acquired VkRenderPass and VkFrameBuffer, if not, vkcmddendtenderpass is called to terminate the last enabled VkRenderPass, and a new object is saved to wait for real use; if compatible, the already-started VkRenderpass does not need to be terminated, so that the call to vkCmdBungerinRenderpass and vkCmdEndRenderpass is reduced, and the effect of automatically merging the VkRenderpass is achieved.

(a) Maintaining the caches of the VkFrameBuffer and the VkRenderPass, searching the corresponding VkFrameBuffer and the corresponding VkRenderPass in the caches according to the Render Targets in the input parameters when the BeginRenderPass function is called each time, and generating and storing new VkFrameBuffer and VkRenderPass in the respective caches if searching fails;

(b) VkRenderPass obtained from the BeginRenderPass function does not start immediately, but rather begins by calling vkcmdbegin renderpass before calling vkcmddrow function within the encapsulated Draw function;

(c) The EndRenderpass function does not call vkCmdEndRenderpass to terminate the rendering channel, and in order to make the rendering channels merged as much as possible, only a mark is added, and the true vkCmdEndRenderpass call occurs when the BeginRenderpass function finds that the rendering target is switched;

(d) The keywords for finding and creating the VkFrameBuffer and VkRenderPass are related to parameters of the Render Targets only, and are irrelevant to operations of the Render Pass itself (such as whether operations such as reading in and clearing the Render Targets are performed), so that judgment on VkRenderPass compatibility is ensured.

And a second step of: in addition to the automatic merge logic in the first step, some API calls may interrupt the VkRenderPass that may have been merged because they need to be outside of the scope of the VkRenderPass. Mainly there are two cases:

(a) Some updates to the data or state of the resources needed for rendering (e.g., vkCmdCopyBuffer, vkCmdCopyImage, convert vk imagelayout, etc. operations). These operations need to be advanced as far as possible before all VkRenderPass starts, while also reducing unnecessary state updates to the resources. This can be achieved by Frame Graph, in which the dependency relationship between the resources such as Buffer and Image and the Render Pass node is determined, and for the resources that do not depend on all Render Pass of the current Frame, the operation of updating the Frame is advanced to the beginning of Frame drawing; for the resources with the dependency relationship (for example, the resource result output by a certain node is input by a certain subsequent node as the resource), the position where the VkRenderpass is switched is searched between the two nodes for insertion;

(b) Because the operation of computer Pass will inevitably interrupt VkRenderPass, it is necessary to combine the Render nodes allowed to be combined as much as possible according to the dependency relationship between computer Pass and Render Pass in the Frame Graph (for example, for the case that rendering channel C depends on both computing channel a and rendering channel B, but computing channel a and rendering channel B are independent of each other, the Frame Graph may put the execution of computing channel a before rendering channel B).

Third, determine whether each rendering result of each Render node is relied on by a subsequent Render node or computer node in the Frame Graph, where simultaneous consideration is: because the above-mentioned real merging operation of the Render nodes is performed at the bottom layer, the merging logic is not visible to the Frame Graph, so the Frame Graph needs to determine whether the neighboring Render nodes are merged by itself, where it is determined whether the two Render nodes are merged by determining whether the output results of the two nodes are consistent (i.e., the rendering targets are consistent), and if a merging condition exists, the merged whole is considered as one node. Here, a decision needs to be made for each output resource (i.e., each Attachment in the Frame Buffer) associated with each Render node. The decision logic is shown in fig. 2, and the specific flow is as follows:

(a) For each Render node, for each Attachment of that Render node, the traversal determines whether the subsequent node (except for other Render nodes that would merge with the current Render node) depends on its results, and if none of the subsequent other Render nodes require the current Attachment, the rendering results may be discarded after the current Render node ends without being written back to memory, by specifying vkattachment STORE OP as VK ATTACHMENT _store op_don_care in the vkattachment description, otherwise set to VK ATTACHMENT _store_op_store.

(b) Similarly, each Render node also needs to determine for each bound Attachment whether the previous Render node (except for other Render nodes that would merge with the current Render node) has an operation to write to it. If some Render node before it has made a write operation and saved the current Attachment, then vkattachment loadop needs to be set to vk_ ATTACHMENT _load_op_load in vkattachment description, otherwise vk_ ATTACHMENT _load_op_don_care.

Fourth, for the texture resource of the MSAA to be started, the texture resource and the corresponding texture of the resolution are managed as the same object. In the third step, the Frame Graph is used for judging whether the rendering target of the Render node is different from whether the multi-sampling antialiasing (MSAA) is started for the rendering target object. For an object that has MSAA turned on, it is required to Resolve (Resolve) as an input resource of a node by default, and as an output result of a node, it is required to be an object of Multiple sampling (Multiple Sample). From this logic, it can be derived that, for the case that no subsequent node takes the current certain MSAA rendering target as the rendering target again, but only as the input resource required by the subsequent node, the object of Multiple samples can be discarded, so as to greatly reduce the storage overhead, see fig. 3, which is a simple example of obtaining the effect through this logic calculation.

(a) In the third step, when the Frame Graph is used to determine the texture of the MSAA, whether to write back the memory is determined, and if there is no node to write again in the following, the vkattachmentstore OP of the object of the MSAA may be set to vk_ ATTACHMENT _store_op_don_care.

(b) The read and write operations of the corresponding resolution texture are fixed to VK ATTACHMENT LOAD OP don CARE and VK ATTACHMENT STORE OP STORE.

(c) For MSAA Depth Stencil Attachment, the in-place Resolve operation may also be accomplished using a construct VkSubpassDescriptionDepthStencilResolve provided by an extension named VK KHR _depth_stencil_Resolve.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims

1. The Vulkan drawing flow optimization method of the mobile terminal is characterized by comprising the steps of packaging a VkRenderpass related call provided by the Vulkan and optimizing read-write frame buffering operation in a packaging function body, wherein the method comprises the following steps of:

(1) Packaging three functions of rendering channel function BeginRenderpass, drawing function Draw and rendering channel function EndRenderpass, and calling Vulkan related API in the packaged function body;

therefore, in a group of homogeneous rendering channels, in order to prevent the update operation of the rendering channel to be executed later from breaking the enabled homogeneous rendering channel, the update operation corresponding to the resource required by the rendering channel to be executed later needs to be advanced to the position before the group of rendering channels, so as to achieve the object by utilizing the Frame Graph;

(3) When the dependency relationship between rendering channels connected by the resources is determined in the step (2), for the resources where no dependency exists, redundant write-back operation can be prevented by discarding the resources at the end of the rendering channels; abstracting each rendering channel into a rendering node by utilizing a Frame Graph, and regarding the combined rendering channels in the step (1) as the same rendering node; for each rendering node, determining, with the Frame Graph, whether each rendering target in its bound Frame buffer is dependent by a subsequent node, or on a previous node; for a rendering target without a corresponding dependency relationship, redundant read-write operation on the rendering target is avoided by setting a corresponding zone bit when the rendering channel is created;

2. The method for optimizing a mobile-side Vulkan drawing process according to claim 1, wherein the step (1) comprises the following sub-steps:

3. The method for optimizing a mobile-side Vulkan drawing process according to claim 1, wherein the step (2) comprises the following sub-steps:

(2.1) using the architectural design of the Frame Graph, placing most of the resource update operations required for rendering the channel in the build phase of the Frame Graph, while the actual drawing flow occurs in the execution phase after the build;