CN114528090A - Vulkan-based method for realizing graphic rendering and related device - Google Patents

Vulkan-based method for realizing graphic rendering and related device Download PDF

Info

Publication number
CN114528090A
CN114528090A CN202011236910.1A CN202011236910A CN114528090A CN 114528090 A CN114528090 A CN 114528090A CN 202011236910 A CN202011236910 A CN 202011236910A CN 114528090 A CN114528090 A CN 114528090A
Authority
CN
China
Prior art keywords
opengl
event
task
thread
events
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011236910.1A
Other languages
Chinese (zh)
Inventor
李宏伟
罗谈发
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202011236910.1A priority Critical patent/CN114528090A/en
Priority to PCT/CN2021/127780 priority patent/WO2022095808A1/en
Publication of CN114528090A publication Critical patent/CN114528090A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Image Generation (AREA)

Abstract

The embodiment of the application provides a Vulkan-based method for realizing graphic rendering and a related device, wherein the method comprises the following steps: processing the asynchronously executed OpenGL ES instructions into a first event through a first thread, wherein the first event comprises description information of the asynchronously executed OpenGL ES instructions; processing the first event through a second thread to obtain a second task jobwhich comprises one first event or a plurality of first events, wherein the first events are obtained by processing the same or different asynchronously-executed OpenGL ES instructions; saving the second task to a command cache region through a third thread; and sending a second task in the command buffer area to the GPU image processor through the Vulkan interface, wherein the GPU is used for finishing the image rendering according to the second task. By adopting the embodiment of the application, the rendering efficiency of the GPU can be improved.

Description

Vulkan-based method for realizing graphic rendering and related device
Technical Field
The application relates to the technical field of computer graphics, in particular to a Vulkan-based method for realizing graphics rendering and a related device.
Background
With the rapid development of terminal technologies, terminals such as computers and mobile phones can realize various functions by installing different application software, for example, one application software is provided with functions such as beauty, makeup, magic expression and the like. Due to the accumulation of codes for many years, these functions are currently based on the programming of corresponding renderers and image engines in the Open Graphics Library (OpenGL) or the Embedded Graphics Library (OpenGL for Embedded Systems, OpenGL ES). However, many vendors are planning applications that replace OpenGL ES with Vulkan, and for most applications OpenGL or OpenGL ES still exist in order to preserve forward compatibility.
To accomplish the functional technical implementation of providing OpenGL ES in a computer system that does not use OpenGL ES drivers, a simple method is available to implement OpenGL ES directly with Vulkan as the bottom layer. However, the operating efficiency of OpenGL ES directly using Vulkan as the bottom layer is far lower than that of the native OpenGL ES driver, which further affects the performance of GPU in graphics rendering.
Disclosure of Invention
The embodiment of the application discloses a Vulkan-based method and a related device for realizing graphic rendering, which can improve the rendering efficiency of a GPU.
In the method, a first thread processes an OpenGL ES instruction executed asynchronously into a first event (event), where the first event may include description information of the OpenGL ES instruction executed asynchronously; then, a second task (job) can be obtained by processing the first event through a second thread, wherein the second task can comprise one first event or a plurality of first events, and the plurality of first events are obtained by processing according to the same or different asynchronously executed OpenGL ES instructions; then, the second task can be stored in the command cache region through the third thread; and finally, sending a second task in the command buffer area to the GPU through the Vulkan interface, wherein the GPU is used for finishing the graphic rendering according to the second task.
In the embodiment of the present application, the OpenGL ES graphics rendering is implemented by using Vulkan as a backend, and because the Vulkan supports multi-thread parallel processing, the embodiment of the present application utilizes the multi-thread advantage of the Vulkan to process different events or tasks in different threads (for example, a first thread, a second thread, and a third thread), for example: the method comprises the steps of processing an OpenGL ES instruction which is executed asynchronously into a first event through a first thread, receiving the first event through a second thread and processing the first event into a second task joba, storing the second task joba into a command cache region in a third thread, sending the command cache region to a GPU through a Vulkan interface, and finishing graphics rendering through the GPU. Moreover, when the third thread processes the second task, the second thread also processes other received first events, and the first thread also processes other received OpenGL ES instructions executed asynchronously, which can be seen that each thread is processed in parallel, so that the processing efficiency can be improved, and therefore, the frequency of submitting to the GPU can be improved, and the purpose of improving the rendering efficiency can be achieved. The purpose of improving the rendering efficiency is achieved.
In a possible implementation manner of the first aspect, before the asynchronously executed OpenGL ES instructions are processed into the first event by the first thread, the OpenGL ES instructions may be further classified into the asynchronously executed OpenGL ES instructions and the synchronously executed OpenGL ES instructions according to a type of an OpenGL ES application program interface API.
It can be seen that by classifying OpenGL ES instructions into asynchronously executed OpenGL ES instructions and synchronously executed OpenGL ES instructions, different processing can be performed on OpenGL ES instructions in a targeted manner, and processing efficiency is improved.
In a possible implementation manner of the first aspect, the OpenGL ES instruction of the state setting and obtaining type belongs to an OpenGL ES instruction executed in synchronization; the OpenGL ES instructions of the object generation and destruction type also belong to OpenGL ES instructions which are executed synchronously; both OpenGL ES instructions of the data upload type and OpenGL ES instructions of the rendering type belong to OpenGL ES that are executed asynchronously.
In a possible implementation manner of the first aspect, after classifying, according to the type of the OpenGL ES application program interface API, a plurality of OpenGL ES instructions into an OpenGL ES instruction to be executed asynchronously and an OpenGL ES instruction to be executed synchronously, description information of the OpenGL ES instruction to be executed synchronously may be saved to a data structure by the first thread, and a saving result may be returned after the saving is completed.
In a possible implementation manner of the first aspect, after classifying, according to the type of an OpenGL ES application program interface API, a plurality of OpenGL ES instructions into an OpenGL ES instruction to be executed asynchronously and an OpenGL ES instruction to be executed synchronously, the description information of the OpenGL ES instruction to be executed synchronously may be obtained from a data structure by a first thread, and then the obtained description information is sent to the image processor, where the image processor is configured to complete graphics rendering according to the description information of the OpenGL ES instruction to be executed synchronously; the first thread may be a rendering thread of an OpenGL ES application.
As can be seen, the first thread may be a thread that executes a rendering thread that calls an OpenGL ES application, so that when the OpenGL ES instruction that is executed synchronously is processed in the first thread, it can be ensured that the OpenGL ES API is visible to a single thread of an upper-layer application.
In a possible implementation manner of the first aspect, the obtaining of the second task jobby the second thread processing the first event may be embodied as: and under the condition that the first event obtained by current processing belongs to a preset first type, executing the step of processing the first event through a second thread to obtain a second task job.
It can be seen that the frequency of the first event being processed into the second task is controlled by the type of the first event in the second thread, and the frequency of the first event being submitted to the GPU can be controlled, so that instability of rendering performance of the GPU is avoided, and efficiency of rendering application of the GPU is improved.
In a possible implementation manner of the first aspect, the obtaining of the second task jobby the second thread processing may be embodied as: and under the condition that the first event obtained by current processing does not belong to a preset first type, if the number of the first events currently accumulated and cached is greater than a first threshold value, executing a step of processing the first event through a second thread to obtain a second task.
It can be seen that, in the second thread, the frequency of the first event being processed into the second task is controlled by the type of the first event and the number of the cached first events, and further, the frequency of submitting the first event to the GPU can be controlled, so that instability of rendering performance of the GPU is avoided, and the efficiency of rendering application of the GPU is improved.
In a possible implementation manner of the first aspect, the obtaining of the second task jobby the second thread processing may be embodied as: under the condition that the first event obtained by current processing does not belong to a preset first type and the number of the first events which are currently accumulated and cached is greater than a second threshold and is less than or equal to the first threshold, if the load of the GPU is less than the first load threshold, the GPU can perform processing at the moment, executing a step of obtaining a second task through processing of a second thread, wherein the second threshold is less than the first threshold; the first load threshold may be set empirically, or may be determined comprehensively according to the size of the data amount to be processed in graphics rendering. It is readily understood that the case of "equal to" can also be put into another branch of the decision, such as the case of being greater than or equal to the second threshold and less than the first threshold. It should be noted that, if the current load of the GPU is smaller than the first load, it can also be understood that the operating frequency point of the GPU is lower than the frequency point during normal operation, or the electronic device operating the GPU does not have the problems of serious heat generation and fast power consumption. The load is mainly a load in a GPU rendering process.
It can be seen that, in the second thread, the frequency of the second task processed by the first event is controlled by the type of the first event, the number of the cached first events, and the load of the GPU, and further, the frequency of the second task submitted to the GPU can be controlled, so that the instability of the rendering performance of the GPU is avoided, and the efficiency of the GPU rendering application is improved.
In a possible implementation manner of the first aspect, the second task joba obtained by the second thread processing the first event may be embodied as or equal to: and under the condition that the first event obtained by current processing does not belong to a preset first type and the number of the first events currently accumulated and cached is greater than a second threshold and less than or equal to the first threshold, if the load of the GPU is greater than or equal to the first load threshold, the first event is cached when the GPU cannot be processed any more, wherein the second threshold is less than the first threshold. It is readily understood that the case of "equal to" can also be put into another branch of the decision, such as the case of being greater than or equal to the second threshold and less than the first threshold. It should be noted that, if the load of the GPU is greater than or equal to the first load, it may also be understood that the operating frequency point of the GPU at this time is higher than the frequency point during normal operation, or the electronic device operating the GPU generates heat seriously and consumes power quickly. The load is mainly a load in a GPU rendering process. It can be seen that, in the second thread, the frequency of the first event being processed into the second task is controlled by the type of the first event, the number of the cached first events, and the load of the GPU, for example, when the GPU cannot process the first event, the first event may be cached and not submitted to the GPU, so as to avoid instability of rendering performance of the GPU.
In a possible implementation manner of the first aspect, the second task may include a plurality of first events, and the caching of the second task in the command buffer by the third thread is embodied as: sequencing the plurality of first events in the second task through the third thread; and caching the sequenced second task to a command cache region.
It can be seen that the GPU needs to complete graphics rendering according to the second task in the command cache region, the second task includes a plurality of first events, the first events include description information of the asynchronously executed OpenGL ES instructions, and the GPU finally completes graphics rendering according to the description information of the asynchronously executed OpenGL ES instructions, so that the plurality of first events in the second task are sequenced in the third thread, and the Vulkan can be used to directly operate optimization of the GPU hardware bottom layer, integrate the randomly arranged first events, and further more effectively utilize GPU hardware resources.
In a possible implementation manner of the first aspect, the sorting, by the third thread, the plurality of first events in the second task may specifically be represented as: and sequencing the plurality of first events in the second task through the third thread based on pipeline states corresponding to the plurality of first events in the second task.
It can be seen that the second task includes multiple first events, and the multiple first events may correspond to the same pipeline state, and each pipeline state needs to be rendered for image rendering, so that the optimization of the GPU hardware bottom layer can be directly operated by using Vulkan, and the multiple first events in the second task are ordered in the third thread by the optimization strategy that the multiple first events may correspond to the same pipeline state, so that GPU hardware resources can be more effectively utilized.
In one possible implementation of the first aspect, the plurality of first events in the second task correspond to first events of the same pipeline state and are ordered together.
It can be seen that, because each pipeline state needs to perform image rendering through a rendering instruction, when a plurality of first events which may correspond to the same pipeline state are sequenced together, the rendering instruction may be called once to complete image rendering, so that the times of switching in the rendering process can be reduced by the rearranged first events, and the operating efficiency of the GPU is improved.
In a possible implementation manner of the first aspect, the second task includes a plurality of first events, and the caching of the second task in the command cache by the third thread is specifically represented as: sequencing the first events, which are modified by corresponding to the same variable, of the plurality of first events in the second task through a third thread; modifying the sequenced first event in the second task based on the variable; and caching the second task which is modified after being sequenced into a command cache region.
It can be seen that the GPU needs to complete graphics rendering according to a second task in the command cache region, the second task includes a plurality of first events, the first events include description information of the asynchronously executed OpenGL ES instructions, the GPU finally completes graphics rendering according to the description information of the asynchronously executed OpenGL ES instructions, the plurality of first events may be modified corresponding to the same variable, and each modified first event needs to be uploaded to the GPU to complete graphics rendering, so that the Vulkan is utilized to directly operate optimization of the GPU hardware bottom layer, the plurality of first events in the second task are sequenced in a third thread by an optimization strategy that the plurality of first events correspond to the same variable to modify, the randomly arranged first events are integrated, and GPU hardware resources can be more effectively utilized.
In one possible implementation of the first aspect, first events of the plurality of events that are modified for the same variable are ordered together.
It can be seen that each modified first event needs to be uploaded to the GPU to complete graphics rendering, so that when a plurality of first events are ordered together, which may correspond to the first events modified by the same variable, the first events do not need to be modified once and uploaded to the GPU once, and the first events can be uploaded to the GPU after all the first events are modified, thereby reducing the switching frequency and improving the operating efficiency of the GPU.
In one possible implementation of the first aspect, processing the asynchronous instruction into the first object by the first thread comprises: acquiring description information of an asynchronous OpenGL ES instruction through a first thread; the description information of the asynchronous OpenGL ES instructions and the asynchronous OpenGL ES instructions are encapsulated as a first object.
The second aspect of the embodiment of the present application discloses an apparatus for implementing graphics rendering based on Vulkan, including:
a context module, configured to process, by a first thread, an OpenGL ES instruction that is executed asynchronously into a first event, where the first event includes description information of the OpenGL ES instruction that is executed asynchronously;
the event management module is used for processing the first time through a second thread to obtain a second task joba, wherein the second task comprises one first event or a plurality of first events, and the plurality of first events are obtained by processing according to the same or different asynchronously executed OpenGL ES instructions;
the event packaging module is used for storing the second task to the command cache region through a third thread;
and the task scheduling module is used for sending a second task in the command buffer area to the GPU graphic processor through the Vulkan interface, wherein the graphic processor is used for finishing graphic rendering according to the second task.
In a possible implementation manner of the second aspect, the context module is further configured to classify the plurality of OpenGL ES instructions into an OpenGL ES instruction executed asynchronously and an OpenGL ES instruction executed synchronously according to a type of an OpenGL ES application program interface API.
In a possible implementation manner of the second aspect, the OpenGL ES instructions of the state setting and obtaining type and the OpenGL ES instructions of the object generation and destruction type both belong to OpenGL ES instructions executed synchronously; both OpenGL ES instructions of the data upload type and OpenGL ES instructions of the rendering type belong to OpenGL ES that are executed asynchronously.
In a possible implementation manner of the second aspect, the context module is further configured to save, by the first thread, description information of the OpenGL ES instructions executed synchronously to the data structure.
In a possible implementation manner of the second aspect, the context module is further configured to obtain, by the first thread, description information of an OpenGL ES instruction that is executed synchronously from a data structure, and send the description information to the image processor, where the image processor is configured to complete graphics rendering according to the description information of the OpenGL ES instruction that is executed synchronously.
In a possible implementation manner of the second aspect, the event management module is specifically configured to: and when the first event belongs to a preset first type, executing a step of obtaining a second task job through second thread processing.
In a possible implementation manner of the second aspect, the event management module is specifically configured to: and when the first event does not belong to a preset first type, if the number of the first events currently accumulated and cached is greater than a first threshold value, executing a step of processing the first event through a second thread to obtain a second task.
In a possible implementation manner of the second aspect, the event management module is specifically configured to: and when the first event does not belong to a preset first type and the number of the first events currently accumulated and cached is greater than a second threshold and smaller than a first threshold, if the load of the GPU is smaller than the first load threshold, executing a step of processing the first event through a second thread to obtain a second task, wherein the second threshold is smaller than the first threshold.
In a possible implementation manner of the second aspect, the event management module is specifically configured to: and if the load of the GPU is greater than a first preset threshold value, caching the first event under the condition that the first event does not belong to a preset first type and the number of the first events currently accumulated and cached is greater than a second threshold value and less than or equal to the first threshold value.
In a possible implementation manner of the second aspect, the event wrapper module is specifically configured to: sequencing the plurality of first events in the second task through the third thread; and caching the sequenced second task to a command cache region.
In a possible implementation manner of the second aspect, the event wrapper module is specifically configured to sort, by the third thread, the plurality of first events in the second task based on pipeline statuses corresponding to the plurality of first events.
In one possible implementation of the second aspect, the plurality of first events are ordered together for a first event of the same pipeline state.
In a possible implementation manner of the second aspect, the event wrapper module is specifically configured to: sequencing the first events, which are modified by corresponding to the same variable, of the plurality of first events in the second task through a third thread; modifying the sequenced first event in the second task based on the variable; and caching the second task which is modified after being sequenced into a command cache region.
In one possible implementation of the second aspect, first events of the plurality of events that are modified for the same variable are ordered together.
In a possible implementation manner of the second aspect, the context module is specifically configured to: acquiring description information of an asynchronous OpenGL ES instruction through a first thread; the description information of the asynchronous OpenGL ES instructions and the asynchronous OpenGL ES instructions are encapsulated as a first object.
With regard to the technical effects brought about by the second aspect or the specific implementation, reference may be made to the introduction of the technical effects of the first aspect or the corresponding implementation.
A third aspect of embodiments of the present application provides an electronic device, including a processor and a memory; the processor is configured to execute the computer instructions stored by the memory to cause the electronic device to implement the method described in the first aspect or any one of the possible implementations of the first aspect.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium having stored thereon computer instructions, which, when executed by one or more processors, cause the one or more processors to implement the method described in the first aspect or any one of the possible implementations of the first aspect.
A fifth aspect of embodiments of the present application provides a computer program product, wherein the computer program product has stored therein computer instructions that, when executed by one or more processors, cause the one or more processors to implement the method described in the first aspect or any one of the possible implementations of the first aspect.
Drawings
The drawings used in the embodiments of the present application are described below.
Fig. 1 is a schematic diagram of an architecture for implementing OpenGL ES according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of an architecture for implementing graphics rendering based on Vulkan according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a method for implementing graphics rendering based on Vulkan according to an embodiment of the present application;
FIG. 4 is a structural diagram of a component for implementing graphics rendering based on Vulkan according to an embodiment of the present application;
fig. 5A is a schematic flowchart illustrating a process of classifying OpenGL ES instructions at an OpenGL ES implementation layer according to an embodiment of the present application;
fig. 5B is a schematic processing flow diagram of an OpenGL ES instruction executed synchronously according to an embodiment of the present application;
fig. 5C is a schematic processing flow diagram of an asynchronously executed OpenGL ES instruction according to an embodiment of the present application;
FIG. 5D is a flowchart illustrating a process of obtaining a second task according to an embodiment of the present application;
FIG. 5E is a flowchart illustrating processing a second task according to an embodiment of the present application;
FIG. 5F is a schematic diagram of a reordering according to strategy one provided by an embodiment of the present application;
FIG. 5G is a schematic diagram of a rearrangement according to strategy two provided by the embodiments of the present application;
FIG. 5H is a flowchart illustrating a second task in a command buffer sent to the GPU according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The embodiments of the present application will be described below with reference to the drawings. It is noted that, in the present application, words such as "exemplary" or "for example" are used to mean exemplary, illustrative, or descriptive. Any embodiment or design described herein as "exemplary" or "such as" is not necessarily to be construed as preferred or advantageous over other embodiments or designs, and the use of the word "exemplary" or "such as" is intended to present relevant concepts in a concrete fashion.
The following is a brief introduction to relevant art and terminology related to this application to facilitate understanding.
1. Open Graphics Library or Open Graphics Library (OpenGL)
OpenGL is a cross-language, cross-platform Application Programming Interface (API) for rendering 2D, 3D vector graphics. This interface consists of nearly 350 different function calls to draw complex three-dimensional scenes from simple graphics bits. Yet another program interface system is Direct3D used only on Microsoft Windows. OpenGL is commonly used in virtual reality, scientific visualization programs, and electronic game development.
Efficient implementations of OpenGL (using graphics acceleration hardware) exist in Windows, parts of UNIX platform and Mac OS. These implementations are typically provided by the display device vendor and are very dependent on the hardware provided by the vendor. The API of OpenGL defines several functions that can be called by the client program, as well as some named integer constants (e.g., the constant GL _ TEXTURE _2D corresponds to a decimal integer 3553). Although the definition of these functions is seemingly similar to the C programming language, they are language independent.
2. Embedded development graphics library (OpenGL for Embedded Systems, OpenGL ES)
OpenGL ES is a subset of OpenGL three-dimensional graphics APIs, designed for embedded devices such as cell phones, PDAs, and game hosts. The API is generalized by the Khronos group definition, which is a graphics software and hardware industry association that focuses primarily on open standards in the graphics and multimedia areas.
OpenGL ES is a custom-built from OpenGL, which removes many non-essential features such as complex primitives like glBegin/glEnd, quadrilateral (GL _ qualds), polygon (GL _ polygon), etc. Over the years, there are now two major versions, OpenGL ES 1.x for fixed pipeline hardware and OpenGL ES 2.x for programmable pipeline hardware. OpenGL ES 1.0 is based on the OpenGL 1.3 specification, OpenGL ES 1.1 is based on the OpenGL 1.5 specification, which support two profiles, common and common lite, respectively. The lite profile supports only fixed-point real numbers, while the common profile supports both fixed-point and floating-point numbers. OpenGL ES 2.0 is defined with reference to the OpenGL 2.0 specification, common profile is published at 2005-8, with support for programmable pipelines introduced.
3. Vulkan (drawing application program interface cross platform)
Vulkan is a cross-platform 2D and 3D drawing Application Program Interface (API) originally published by the kornas organization (Khronos Group) on the 2015 Game Developer Congress (GDC).
Konnes first named VulkanAPI as "Next Generation OpenGL action" (Next Generation OpenGL initial) or "glNext", but these names were not used after the formal declaration of Vulkan. Just like OpenGL, Vulkan is designed for real-time 3D programs (e.g., video games), and Vulkan is also designed to provide high performance and low CPU management burden (overhead).
Like OpenGL, Vulkan was also developed by the Khronos group. The method is a follow-up version of AMD Mantle, inherits the powerful low-overhead architecture of the former, and enables software developers to comprehensively acquire the performance, efficiency and functions of the Raeden GPU and the multi-core CPU.
Vulkan significantly reduces the "API overhead" (background work performed by the CPU in analyzing the hardware requirements of the game) of the CPU in providing important features, performance and image quality relative to OpenGL, and can use GPU hardware features not normally accessible through OpenGL.
4、EGL
EGL is the interface between rendering APIs (e.g., OpenGL ES) and the underlying Native platform windowing system. EGLAPI is an independent API independent of the OpenGL ES versions standards, and mainly functions to create Context for OpenGL instructions, and the like. Generally, OpenGL is an API operating a GPU that controls the running state of a graphics rendering pipeline state machine by driving instructions to the GPU, but such an intermediate layer is required when interaction with a local windowing system is involved, and is preferably platform independent. EGLs are therefore designed to bridge between OpenGL and native window systems.
5. Application Program Interface (API)
An API is a predefined function or convention that refers to the joining of different components of a software system. To provide a set of routines that applications and developers can access based on certain software or hardware without accessing source code or understanding the details of the internal workings.
The interface is used by programmers in programming, and the system and application programs can access resources in the system and obtain services of the OS in execution through the interface, and the interface is the only way for the programs to obtain the services of the operating system. Most operating system program interfaces consist of a set of system calls (system calls), each of which is a subroutine that performs a specific function.
The API is a set of definitions, procedures and protocols, and implements the communication between the computer software through the API. One of the primary functions of an API is to provide a common set of functions. The API is also a middleware and provides data sharing for various platforms. In the programming practice, the programming interface is designed to reasonably divide the responsibility of the software system. The good interface design can reduce the mutual dependence of all parts of the system, improve the cohesion of the composition units and reduce the coupling degree among the composition units, thereby improving the maintainability and the expandability of the system.
6. Back end (Backend)
In the field of software engineering and programming, the back end works in the background, controls the content of the front end, and is mainly responsible for programming architecture ideas, managing databases and the like. The back end interacts with the database to process corresponding business logic, and how to implement functions, access data, stability and performance of the platform, and the like need to be considered.
7. Front end (Frontend)
In the field of software engineering and programming, front-ends are the part of software systems that interact directly with users. For the front end of the Web, the front end of the Web is usually referred to as the foreground part of the Web, and comprises a presentation layer and a structural layer of the Web: the structure of the Web page, the visual appearance of the Web and the interactive realization of the Web layer.
The front-end technology is generally divided into front-end design and front-end development, the front-end design can be generally understood as visual design of a website, and the front-end development is the foreground code implementation of the website.
8. Driving (Driver, Device)
Driver, computer software terminology, refers to a program that drives software in a computer. The driver is a special program added to the operating system, which contains information about the hardware device. This information enables the computer to communicate with the corresponding device. The driver is a configuration file written by a hardware manufacturer according to an operating system, and hardware in the computer cannot work without the driver.
9. Render channel (Render Pass)
Rendering channels generally refer to multi-channel rendering techniques in which an object is rendered multiple times, and the results of each rendering process are added to the final rendering result. These rendering processes are generally: light, shadows, reflections, highlights, and global illumination. The multiple passes are to achieve the effect that one Pass cannot achieve, the passes are interdependent, the following passes use the data (depth and geometric information) of the previous Pass, and the data from the last Pass is the data in the frame buffer, so the relationship between the passes can be visually compared with one process.
10. Rendering Pipeline (Rendering Pipeline)
The rendering pipeline, also called rendering pipeline, is a parallel processing unit in the display chip for processing the graphics signals independently. The rendering pipeline is to convert a series of vertex data, texture and other information into an image that can be seen by human eyes. This process is done by both the CPU and the GPU.
11. Command buffer (Command buffer)
A command buffer is a collection of several commands. It is submitted to the appropriate hardware queue for processing by the GPU, and then the driver fetches the command buffer to validate and compile it before the actual GPU processing begins. The command buffer records the various Vulkan API commands that the application expects to execute. Once the command buffer has been baked, it can be reused over and over again. They record the commands in the order specified by the application. These commands are used to execute different types of jobs; these include binding vertex buffers, pipeline binding, recording Render Pass commands, setting view ports and clipping, specifying drawing commands, controlling copy operations of image and buffer contents, etc.
12. Uniform variable
Uniform is a type of data used in the rendering process of a graphics rendering system, is a modifier of variable type, is a constant value in the shader in OpenGL ES, and uses data that stores various shader needs, such as: a conversion matrix, a lighting parameter, or a color.
13. Draw call (Draw call)
Draw call is a command from the CPU to the GPU that points to only a list of primitives that need to be rendered. A "draw instruction" may be an instruction, such as an API call function; or a set of multiple instructions, for example, in OpenGL, one draw call usually contains multiple API call functions, and this draw call may be considered as one drawing instruction to complete one drawing. One draw call can draw one drawing target, or a plurality of draw calls can draw one drawing target or one draw call can draw a plurality of drawing targets.
14. Asynchronous execution
Asynchronous execution means that in the asynchronous execution mode, the order in which execution of each statement ends is not necessarily the same as the order in which execution of the statement starts. For example, after the application program on the client sends the instruction of the query operation to the server, the application program will immediately execute the next statement of the query instruction without waiting for the server to return the query result to the client. That is, when a task is executed asynchronously, it may be transferred to another task before the task is completed. The asynchronous execution mode enables the application program to get rid of the constraint of a single task, and improves the flexibility and the execution efficiency of the application program.
15. Synchronous execution
Synchronous execution is any process consisting of a plurality of tasks, wherein the tasks need to be executed in sequence, and the next task can be executed after the former task returns a result.
At present, the current development in the field of computer graphics is as follows:
in terminals and desktop systems, the API layer (or driver) of the graphics system is changing continuously, and OpenGL ES and Vulkan provide support for the application layer at the same time, but the trend in the future is that Vulkan may gradually replace OpenGL ES, and OpenGL ES still exist for maintaining the forward compatibility.
Due to the upgrade of the GPU hardware and the operating system, the API layer of the graphics system is also continuously updated synchronously, which may cause the following problems: 1. maintaining OpenGL ES and Vulkan simultaneously will cause the software system to be huge and bloated; 2. the overlapping of two API modules results in engineering complexity; 3. maintaining two API systems simultaneously also results in a significant waste of engineering resources.
To accomplish the functionality of providing OpenGL ES APIs in systems that do not use OpenGL ES API drivers, a method is typically employed to implement OpenGL ES directly with Vulkan as the bottom layer. Referring to fig. 1, fig. 1 is a schematic diagram of an architecture for implementing OpenGL ES according to an embodiment of the present disclosure. It can be seen from fig. 1 that the architecture 10 is configured with one or more backend, such as OpenGL105A, Vulkan105B, Direct3D105C, etc., each backend has its corresponding Driver, such as OpenGL105A corresponding OpenGL Driver106A, Vulkan105B corresponding Vulkan Driver106B, Direct3D105C corresponding Direct3D Driver106C, and OpenGL ES can be implemented with different backend according to different platforms and configurations by the architecture 10. For example, if the application is installed on an electronic device that deploys the Android system, the Vulkan Driver106B may be used to implement OpenGL ES graphics rendering.
Through the architecture 100 shown in fig. 1, the electronic device may obtain an OpenGL ES instruction of an application program call API from an EGL/OpenGL entry layer, and obtain state data corresponding to the OpenGL ES instruction through the processing module 103. Since the architecture 10 supports one or more backend, state data corresponding to OpenGL ES instructions may be saved in the front end 104. According to the adopted back end (for example, Vulkan105B), the state data stored in the front end 104 is converted into the state data corresponding to the adopted back end, then the state data corresponding to the back end is generated into a corresponding back end instruction and then submitted to a corresponding Driver (for example, Vulkan Driver106B), and then the corresponding Driver submits to the GPU to complete calling, so that OpenGL ES image rendering can be realized.
The inventor of the present application finds that, from the time of obtaining call data of an OpenGL ES API, to the time of storing the call data at the front end and then converting the call data stored at the front end into data corresponding to the back end, the conversion and operation of the OpenGL ES API in a single thread are performed, and the advantage of Vulkan multithreading cannot be fully exerted. Resulting in additional CPU run time for performing conversion operations within a single thread compared to native OpenGL ES-driven implementations, resulting in reduced rendering efficiency. In addition, although the architecture design satisfies the universality (supports various back-ends), the lack of specific architecture design and suitable optimization method for a certain back-end (e.g. Vulkan) results in that the advantages of the Vulkan are not fully utilized, and the resource utilization rate is limited.
In order to solve the above problem, the present application proposes a method and a related apparatus for implementing graphics rendering based on Vulkan, where the method includes:
the method comprises the steps that the electronic equipment receives an OpenGL ES instruction called by an application program, a plurality of OpenGL ES instructions are classified into an OpenGL ES instruction executed asynchronously and an OpenGL ES instruction executed synchronously, the OpenGL ES instruction executed asynchronously is processed into a first event through a first thread, and description information of the OpenGL ES instruction executed synchronously is stored in a data structure or is obtained from the data structure; sending the first event to a second thread through a lock-free queue, and processing the first event through the second thread to obtain a second task, wherein the second task comprises one first event or a plurality of first events; and then caching the second task into a command buffer of the command buffer area through the third thread, and sending the second task in the command buffer area to the GPU through the Vulkan interface. And finally, finishing the graphic rendering by the GPU according to the second task. Based on the proposed multi-thread architecture, various optimization technologies (such as technologies for dynamically controlling GPU instruction submission and GPU instruction rearrangement) can be adopted to fully utilize CPU and GPU resources, and the operation efficiency is improved.
By using the embodiment of the application, the OpenGL ES graphics rendering is realized by taking Vulkan as a back end, and because the Vulkan supports multi-thread parallel processing, the multi-thread advantage of the Vulkan is utilized in the embodiment of the application, and different events or tasks are processed in different threads (such as a first thread, a second thread and a third thread). Meanwhile, the Vulkan is used as a back end to achieve OpenGL ES graphic rendering, OpenGL ES APIs can be used on more devices and systems, universality of the OpenGL ES APIs is improved, and devices which do not support the OpenGL ES APIs can run OpenGL ES applications. For equipment manufacturers and systems, the maintenance cost of OpenGL ES drivers can be reduced, multiple graphic API drivers do not need to be maintained on equipment supporting multiple graphic APIs, and a software system is simplified.
Referring to fig. 2, fig. 2 is a schematic diagram of an architecture for implementing graphics rendering based on Vulkan according to an embodiment of the present application. As can be seen in FIG. 2, the architecture 200 may include an application 201, an EGL/OpenGL portal 202, a context module 203, an event management module 204, an event packaging module 205, a task scheduling module 206, and a Vulkan Driver 207. When the application 201 runs, the application 201 may call an OpenGL ES API, and obtain a called OpenGL ES instruction and related parameters through the EGL/OpenGL entry 202. Then, the called OpenGL ES instruction and related parameters are processed by the context module 203, the context module 203 converts the instructions into corresponding state data and GPU instructions and then sends the state data and GPU instructions to the event management module 204, the event management module 204 packages the state data in batches, then the event packaging module 205 converts the packaged data such as the instructions and state information into a Vulkan command buffer, and finally the task scheduling module 206 sends the data in the command buffer to the GPU, and the GPU completes graphics rendering. The functions of the modules are introduced as follows:
the application programs 201 include, but are not limited to, game applications, video applications, chat communication applications, photo applications, and the like. The application is an application using an OpenGL ES API. The application may be installed on an electronic device, which may be, but is not limited to, a laptop computer, a desktop computer, a mobile phone, a mobile terminal (e.g., a smart phone), a wearable device, an in-vehicle device, an internet of things device, or other device capable of graphics rendering, such as a device running an android system, an IOS system, a windows system, and other systems.
The EGL/OpenGL ES entry 202 is configured to obtain an OpenGL ES instruction of the application 201 when the application 201 makes an API call.
A context module 203, running on a first thread. Since multiple OpenGL ES instructions can be classified into asynchronous OpenGL ES instructions and synchronous OpenGL ES instructions, in one possible implementation manner, for synchronous OpenGL ES instructions, the context module 203 is configured to save description information of the synchronous OpenGL ES instructions into a data structure according to a type of the synchronous OpenGL ES instructions (for example, the OpenGL ES instructions are a state setting type), or obtain description information of the synchronous OpenGL ES instructions from the data structure according to a type of the synchronous OpenGL ES instructions (for example, the OpenGL ES instructions are a state obtaining type), and the context module 203 may further send the description information related to the synchronous OpenGL ES instructions to the GPU through the Vulkan Driver207, so as to complete image rendering by the GPU.
For an asynchronous OpenGL ES instruction, the context module 203 may process the asynchronous OpenGL ES instruction into a first event through a first thread according to the type of the asynchronous OpenGL ES instruction (for example, the OpenGL ES instruction is a drawing type), and then send the first event to the event manager204 through a lock-free queue. The first event includes description information of an asynchronous OpenGL ES instruction.
In one possible implementation manner, the context module 203 may obtain, by a first thread, description information of an asynchronous OpenGL ES instruction in a corresponding data module, and then encapsulate the description information of the asynchronous OpenGL ES instruction and the asynchronous OpenGL ES instruction into a first object. It should be noted that the description information includes a state value of the OpenGL ES instruction. OpenGL ES is a set of states that may all have default values that may be changed by OpenGL ES and saved in context, and various states may be involved in rendering. Such as the blending mode, texture pictures, program, etc. used by OpenGL ES.
The event management event manager module 204, running on the second thread, is configured to dynamically control a submission frequency of the first event to the event packing module 205 according to one or more of a type of the first event, a cache amount of the first event, or a GPU load. If it is determined that the first event needs to be sent to the event packaging module 205, the first event is processed by the second thread to obtain a second task joba, and the second task is sent to the packaging module 205. Wherein the second task comprises one first event or a plurality of first events.
In a possible implementation manner, in a case that the currently processed first event belongs to a preset first type, the event management event manager module 204 performs a step of obtaining a second task job through processing of a second thread.
In a possible implementation manner, under the condition that the currently processed first event does not belong to a preset first type, if the number of the currently accumulated cached first events is greater than a first threshold, the event management event manager module 204 executes a step of obtaining a second task through processing of a second thread.
In a possible implementation manner, when the currently processed first event does not belong to a preset first type, and the number of the currently accumulated cached first events is greater than a second threshold and is less than or equal to the first threshold, if the load of the current GPU is less than the first load threshold, which indicates that the GPU can process at this time, the event management event manager module 204 executes a step of processing the first event by using a second thread to obtain a second task. Wherein the second threshold is less than the first threshold; the first load threshold may be set empirically, or may be determined comprehensively according to the size of the data amount to be processed in graphics rendering. It is readily understood that the case of "equal to" can also be put into another branch of the decision, such as the case of being greater than or equal to the second threshold and less than the first threshold. It should be noted that, if the current load of the GPU is smaller than the first load, it can also be understood that the operating frequency point of the GPU is lower than the frequency point during normal operation, or the electronic device operating the GPU does not have the problems of serious heat generation and fast power consumption. The load is mainly a load in a GPU rendering process. In a possible implementation manner, when the first event obtained by the current processing does not belong to a preset first type and the number of the first events currently accumulated in the cache is greater than a second threshold and is less than or equal to a first threshold, if the load of the current GPU is greater than the first load threshold, the event management event manager module 204 performs cache processing on the first object. Wherein the second threshold is less than the first threshold. It is readily understood that the case of "equal to" can also be put into another branch of the decision, such as the case of being greater than or equal to the second threshold and less than the first threshold. It should be noted that, if the load of the GPU is greater than or equal to the first load, it may also be understood that the operating frequency point of the GPU at this time is higher than the frequency point during normal operation, or that the electronic device operating the GPU generates heat seriously and consumes power quickly. The load is mainly a load in a GPU rendering process. The event wrapper module 205 runs in a third thread, and is configured to sort the first events included in the second task according to a specific optimization policy, cache the sorted second task in the command cache region, and send the second task in the command cache region to the task scheduling module 206.
In one possible implementation, the event wrapper module event wrapper205 orders the plurality of first events in the second task by the third thread based on pipeline states corresponding to the plurality of first events in the second task. Wherein the plurality of first events in the second task correspond to first events of the same pipeline state and are ordered together.
In a possible implementation manner, the event wrapper module event wrapper205 sequences, through a third thread, a plurality of first events in the second task corresponding to the first event modified by the same variable; then modifying the sequenced first event in the second task based on the variable; and finally caching the second task which is modified after being sequenced into a command cache region.
And the task scheduling unit job dispatcher207 runs on a second thread and is used for sending a second task in the command cache area to the GPU through the Vulkan interface, wherein the GPU is used for finishing graphics rendering according to the second task.
It should be noted that, in the embodiments of the present application, the first thread, the second thread, and the third thread are three threads independent from each other, and different processing flows are performed in different threads. For example, an asynchronous OpenGL ES instruction is processed as a first event in a first thread, the first event is processed as a second task job in a second thread, and the second task job is cached in a command buffer in a third thread. Moreover, when the third thread processes the second task, the second thread also processes other received first events, and the first thread also processes other received OpenGL ES instructions executed asynchronously, which can be seen that each thread is processed in parallel, so that the processing efficiency can be improved, and therefore, the frequency of submitting to the GPU can be improved, and the purpose of improving the rendering efficiency can be achieved.
Referring to fig. 3, fig. 3 is a schematic flowchart illustrating a method for implementing graphics rendering based on Vulkan according to an embodiment of the present application, and optionally, the method may be implemented based on the architecture shown in fig. 2. The Vulkan-based graphics rendering method shown in FIG. 3 at least comprises the following steps:
step S301: the electronic device classifies the multiple OpenGL ES instructions as asynchronous OpenGL ES instructions and synchronous OpenGL ES instructions.
Specifically, the electronic device may be a terminal device supporting platforms such as Windows, Android, Unix-like, Linux, and MacOS, and may be, but is not limited to, a laptop computer, a desktop computer, a mobile phone, a smartphone, a tablet computer, a multimedia player, an e-reader, an intelligent in-vehicle device, an intelligent appliance, an artificial intelligence device, a wearable device, an internet of things device, or a virtual reality/augmented reality/mixed reality device.
For OpenGL ES APIs, there can be generally four types: setting and obtaining states, generating and destroying objects, uploading and downloading data, and drawing. Please refer to table 1, where table 1 is a description of OpenGL ES APIs provided in the embodiments of the present application. As can be seen from table 1, the OpenGL ES instructions can be classified into asynchronous OpenGL ES instructions and synchronous OpenGL ES instructions according to the type of OpenGL ES API in the embodiments of the present application. For example, the OpenGL ES instructions corresponding to the API with the state setting and obtaining type and the API with the object generation and destruction type are divided into synchronous OpenGL ES instructions; and dividing OpenGL ES instructions corresponding to the data uploading type API and the drawing type API into asynchronous OpenGL ES instructions.
TABLE 1OpenGL ES API description
Figure BDA0002764390640000121
Therefore, after the electronic device acquires the OpenGL ES instruction of the application program through the EGL/OpenGL ES entry layer, the synchronous OpenGL ES instruction may be processed through step S302A, and the asynchronous OpenGL ES instruction may be processed through step S302B by classifying OpenGL ES instructions set in advance.
Step S302A: the electronic device processes the synchronized OpenGL ES instructions through the first thread.
Specifically, when the synchronized OpenGL ES instructions (e.g., state setting and obtaining, object generating, and destroying) are processed by the first thread in the context module, the electronic device saves the description information of the synchronized OpenGL ES instructions into a data structure corresponding to the context module, or obtains the description information of the synchronized OpenGL ES instructions from the data structure corresponding to the context module. After the electronic device calls the OpenGL ES API to complete the corresponding work, it needs to return a result indicating that the call is finished, and then it can continue to call the next API, where the returned result may be an empty result, that is, a result that does not contain any information.
For example, if the electronic device receives the first OpenGL ES instruction in the context module as the state setting type, and the content carried by the OpenGL ES instruction is written with a-1 and B-2, the result indicating that the storage is completed is returned after storing a-1 and B-2 in the data structure corresponding to the context module; if the electronic device receives a second OpenGL ES instruction in the context module as a state acquisition type and the content carried by the OpenGL ES instruction is read a and B, acquiring state values of a and B from a data structure corresponding to the context module, and then sending the state values to an image processor GPU, where the image processor may be configured to complete image rendering according to description information of the OpenGL ES instruction executed synchronously; and if the received third OpenGL ES instruction of the electronic equipment is the object destruction type in the context module, and the content carried by the OpenGL ES instruction is deletion A and B, returning a result indicating that deletion is completed after deleting the state values of A and B from the data structure corresponding to the context module.
Step S302B: an asynchronous OpenGL ES instruction is processed as a first event by a first thread.
Specifically, after the electronic device calls the OpenGL ES API, when the Context module processes an asynchronous OpenGL ES instruction (for example, a data upload type or a rendering type) through the first thread, the first event may be generated according to a state value stored in a data structure corresponding to the Context module and the OpenGL ES instruction. It is understood that the first event includes the state value of the asynchronous OpenGL ES instruction at this time and the content information of the OpenGL ES instruction. For example, if the electronic device receives an OpenGL ES instruction in a context module, the OpenGL ES instruction is of a drawing type, and the content carried by the OpenGL ES instruction is a + B ═ C, the electronic device obtains the state values of a and B from the data structure corresponding to the context module, and then generates the first event according to the description information such as the state values of a and B and a + B ═ C.
The electronic device may then return as such after sending the first Event to the Event manager module. Since the Context module runs on the first thread and the Event manager module runs on the second thread, optionally, data can be sent or received between two modules running on different threads through a lock-free FIFO queue.
Where different first events have different properties (such as whether or not waiting for execution results is required). For example, for the eglSwapBuffers interface, it is necessary to wait for the GPU to render the results of this frame and return the results. Therefore, the Context module in the first thread needs to block waiting for the Event manager module and the Job dispatcher module in the second thread to finish sending instructions to the GPU, then waiting for the GPU to finish rendering the frame, and finally presenting the content of the frame through the screen.
For another example, for a first Event obtained by processing an OpenGL ES API in a Context module, such as data upload or a rendering instruction, the electronic device may continue to receive and execute a next OpenGL ES API called by the application program without waiting for a return result after directly sending the first Event to the Event manager module through the Context module. It can be seen that the operation of sending the instruction to the GPU is not in the first thread, so that the CPU load of this part is shared by other threads, so that the application logic can obtain more sufficient computational resources in the first thread, and the performance of the entire application is improved.
It should be noted that the first thread may be a rendering thread corresponding to an application program.
Step S303: and the electronic equipment obtains a second task through the second thread processing.
Specifically, when the electronic device receives a first Event sent by the Context module in the Event manager module, the Event manager module may determine whether the first Event needs to be processed to obtain a second task joba in the second thread according to the type of the first Event. If the first event is needed, packaging the first event into a second task, and if the first event is not needed, caching the first event.
In a possible implementation manner, if the first event obtained by the current processing belongs to a preset first type, the preset first type may be an instruction of an OpenGL ES API is an instruction such as glFinish/glFlush/glReadPixels, or the OpenGL ES API is a large-block data upload type and upload data is greater than a preset threshold, such as glbuffer data, glsubbuffer data, and the like. The first Event is immediately processed by the second thread in the Event manager module to get the second task. The immediate processing may be that the processing order of the first event requiring immediate processing is arranged in the first order, that is, after receiving the first event requiring immediate processing, the first event is first processed if possible. For example, if the asynchronous OpenGL ES instruction contained in the first Event received by the electronic device in the Event manager module belongs to one or more of glFinish/glFlush/glReadPixels, the first Event obtained by the current processing is processed as the second task. The second task may include the currently processed first event, or the currently processed first event and the cached first event.
In a possible implementation manner, if the first Event obtained by current processing does not belong to a preset first type, it is indicated that the first Event obtained by current processing does not need to be processed immediately to obtain a second task, so that the first Event can be cached in the Event manager module, and the first Event cached in the Event manager module is packaged into the second task by the first thread when processing is needed. For example, if the asynchronous OpenGL ES instruction included in the first Event received by the electronic device in the Event manager module does not belong to any of glFinish/glFlush/glReadPixels, or does not belong to glbuffer data, glsubbuffer data in the large block data upload type, the first Event may be cached first.
In a possible implementation manner, under the condition that the currently processed first event does not belong to a preset first type, if the number of the currently accumulated cached first events is greater than a first threshold, the cached first events may be packaged into a second task by a first thread. It should be noted that the first threshold may be set according to actual situations, and the first threshold may be a maximum value of the packed number of the first events, which is not limited in this embodiment of the application. For example, in the case that the asynchronous OpenGL ES instruction contained in the first Event received by the electronic device in the Event manager module does not belong to any one of glFinish/glFlush/glReadPixels, or glBufferData, glsubdata in the big block data upload type, if the number of the first events buffered in the Event manager module is greater than the first threshold, the currently processed first Event may be packed as the second task, or the currently processed first Event and the buffered second time may be packed together as the second task.
In a possible implementation manner, when the first event obtained by current processing does not belong to a preset first type, and the number of the first events currently accumulated and cached is greater than a second threshold and smaller than a first threshold, if the load of the current GPU is too small, that is, smaller than the first load threshold, the first event obtained by current processing may be packed into a second task through the first thread. Wherein the second threshold is less than the first threshold. The first threshold and the second threshold may be set according to actual situations, and the embodiments of the present application do not limit the thresholds. For example, in a case that an asynchronous OpenGL ES instruction included in a first Event received by the electronic device in the Event manager module does not belong to any one of glFinish/glFlush/glReadPixels, or does not belong to glbuffer data in a large block data upload type, or glsubbuffer data, if the number of the first Event buffered in the Event manager module is within the allowable buffering number, but the load of the GPU at this time is light, the first Event currently processed may be packaged as the second task, or the first Event currently processed and the second time buffered may be packaged together as the second task.
In a possible implementation manner, under the condition that the number of the first events currently cached in the accumulation buffer is greater than the second threshold and smaller than the first threshold, if the load of the current GPU is too large, that is, greater than or equal to the first load threshold, the first Event may be cached in the Event manager module, and the first Event cached in the Event manager module is packaged into the second task by the first thread when processing is required. For example, in a case where an asynchronous OpenGL ES instruction included in a first Event received by the electronic device in the Event manager module does not belong to any one of glFinish/glFlush/glReadPixels, or does not belong to glBufferData, glsubdata in a large block data upload type, if the number of the first Event cached in the Event manager module is within the allowable cache number, but the GPU is overloaded at this time and is not suitable for being committed to the GPU, the first Event obtained by current processing may be cached and then committed. It should be noted that the first load threshold may be set empirically, or may be determined comprehensively according to the time of the data amount processed during the graphics rendering, or may be determined comprehensively according to the size of the data amount processed during the graphics rendering, and different GPUs may correspond to different first load thresholds, and may be set according to the actual situation, which is not limited in this embodiment.
It should be noted that, if an API that can detect the GPU load exists, the API may be called to obtain the GPU load. If there is no API that can detect the GPU load, an estimated method can be used to calculate the GPU load. For example, the load of the GPU may be estimated according to the last time of submission (which may be the time of the last submission of the second task to the GPU), the last number of submissions (which may be the number of triangle drawing instructions in the first event included in the last submission of the second task to the GPU), the number of triangle drawing instructions in the first event in the cache, the last frame time (which may be the time of drawing the last frame of completed triangle), and the last frame number (which may be the number of triangles in the last frame of completed drawing). Assuming that the time for the GPU to process the first event is proportional to the number of triangle drawing instructions it contains, the load of the current GPU can be simply expressed as:
GPU load-time of last submission/(number of previous frame-time of previous frame)
And after a second task is obtained through the processing of the second thread, sending the second task to the Event wrapper module.
When the electronic device packages the first Event in the Event manager module determines the submission frequency to the GPU, the first Event needs to be submitted to the GPU as frequently as possible, and it is ensured that the GPU does not have idle time. However, the I/O burden is increased by the higher submission frequency, and the GPU is enabled to be highly loaded and stable as much as possible by dynamically controlling the submission frequency to the GPU, so that the hardware resources of the GPU are fully utilized. Therefore, the electronic device sends a certain number of first events to the Event wrapper module after packaging the first events into a second task at the Event manager module.
Step S304: the electronic device caches the second task to the command cache region through the third thread.
Specifically, the electronic device may receive a second task sent from the Event manager module in the Event wrapper module, store the second task in the Command buffer area Command buffer through a third thread, and then send the second task in the Command buffer area to the Job dispatcher module.
In a possible implementation manner, in the process of saving the first event in the second task to the Command cache region, the first event in the second task may be sorted according to the optimization rule, and the sorted first event is still saved in the Command buffer. The first event in the second task sent to the Job dispatcher module is therefore optimally ordered. For example, if the sequence of the first event in the second task received by the electronic device in the Job dispatcher module is the first event 1, the first event 2, the first event 3, and the first event 4, the first event is sorted according to the optimization rule, the sequence of the first event in the sorted second task may be the first event 2, the first event 4, the first event 1, and the first event 3, and then the first event in the sorted second task is saved in the Command buffer.
Step S305: the electronic device sends the second task in the command buffer to the GPU through the Vulkan interface.
Specifically, the electronic device may receive, in the Job dispatcher module, the second task in the command buffer sent by the Event wrapper module, and then the electronic device may send the second task in the command buffer to the GPU through the corresponding Vulkan API, so that the GPU completes graphics rendering.
Based on the architecture 200 shown in fig. 2, it can be loaded to the Android system. Referring to fig. 4, fig. 4 is a schematic structural diagram of a component for implementing graphics rendering based on Vulkan according to an embodiment of the present application. As can be seen from fig. 4, in a software framework of an OpenGL ES application running on an electronic device using an android system with Vulkan as an OpenGL ES API backend implementation, modules involved in the embodiments of the present application include one or more of the following: an application APP401 using OpenGL ES, an EGL/OpenGL ES entry layer 402, an OpenGL ES implementation layer 403, a Vulkan driver 404, a GPU 405.
The OpenGL ES implementation layer 403 includes one or more of the following: a context module 403A, an event management module 403B, an event wrapper module 403C, and a task scheduling module 403D. Also, the OpenGL ES implementation layer 403 includes three threads, which are a first thread, a second thread, and a third thread, respectively. The first thread is a rendering thread of OpenGL ES. The context module 403A runs on a first thread, and is in the same thread as an OpenGL ES API call of an OpenGL ES application, the event management module 403B runs on a second thread, the event wrapper module 403C runs on a third thread, and the task scheduling module 403D runs on the second thread.
The specific implementation of the OpenGL ES API with the backend of Vulkan based on the multithreaded architecture implemented by the component structure shown in fig. 4 is as follows:
after the OpenGL ES application (APP401) calls the OpenGL ES API, in the OpenGL ES implementation 403, different flow processes may be performed according to different OpenGL ES APIs. The APIs of OpenGL ES can be classified into at least four categories: state setting and obtaining (e.g., glViewport, glgetInteger, etc.), object generation and destruction (e.g., glGenBuffer, glDeletBuffers, etc.), data uploading and downloading (e.g., glBufferData, glReadPixels, etc.), drawing (e.g., glDrawElements). The four types of APIs can be divided into two types of APIs, where the state class and the object class are executed synchronously, i.e. executed in a thread of the application, and the data and the rendering are executed asynchronously, i.e. not executed in a thread of the application, but executed in another independent thread. For example, please refer to fig. 5A, where fig. 5A is a schematic flowchart illustrating a process of classifying OpenGL ES instructions in an OpenGL ES implementation layer according to an embodiment of the present application, and as can be seen from fig. 5A, after receiving an OpenGL ES instruction obtained by an APP calling an OpenGL ES API, determining whether the OpenGL ES instruction is a synchronized OpenGL ES instruction, if so, performing a processing flow of the synchronized OpenGL ES instruction; if not, the asynchronous OpenGL ES instruction processing flow is carried out.
Referring to fig. 5B, fig. 5B is a schematic processing flow diagram of an OpenGL ES instruction executed synchronously according to an embodiment of the present application. Fig. 5B is a process of processing an OpenGL ES instruction in the context module 403A without multi-thread processing, and for the API of the state read-write class or the resource management class (such as object generation and destruction, and data upload and download), it may go through two modules, namely a state manager module and an object manager module, in the context module 403A, and return after completing the corresponding work. The synchronous execution is because the two types of OpenGL ES API calls are less in computation, and the synchronous execution is advantageous for the performance of the whole application. From the aspect of module functionality, object management is established as 1: n OpenGL ES and Vulkan (one OpenGL ES object may require support of several Vulkan objects). In addition, OpenGL ES APIs of the object generation and destruction type also act on the object management module. The state management can save the current state of the whole system and manage the necessary state of the OpenGL ES specification in a mode of creating a context. As can be seen from fig. 5B, if it is a state setting and obtaining type API, since a data structure corresponding to a state exists in the context module 403A, since the context module 403A runs on a rendering thread of OpenGL ES of the APP401, a state value of a synchronized OpenGL ES instruction can be directly saved in the data structure of the context module 403A; or may obtain the state value of the current OpenGL ES instruction from the data structure, and write the address called by the API into the corresponding state value currently saved by the context module 403A.
Referring to fig. 5C, fig. 5C is a schematic processing flow diagram of an asynchronously executed OpenGL ES instruction provided in the embodiment of the present application, and is a process of processing an OpenGL ES API of a data upload and rendering type in a context module 403A. The OpenGL ES instructions of the two types of APIs are packed into a first event object inside a system. In order to ensure that each first event of the packing process is independent, when each first event is generated, the object management module and the state management module are accessed to obtain the state value of the asynchronous OpenGL ES instruction, the state value is stored in the first event, and after the first event is sent to the event management module 403B, the API call returns. Because the first event at this time is not processed, the processing of the two types of APIs is asynchronous.
Referring to fig. 5D, fig. 5D is a flowchart illustrating a process of processing a second task, that is, a process of processing a first event from the context module 403A by the event management module 403B according to an embodiment of the present application. The event management module 403B runs on a second thread, in a separate thread from the context module 403A. Thus, the event management module 403B will constantly detect the first event sent by the context module 403A running on the first thread. As can be seen from fig. 5D, if the first event obtained by the current processing belongs to a preset first type, the preset first type may be that an instruction of the OpenGL ES API is an instruction such as glFinish/glFlush/glReadPixels, or the OpenGL ES API is a large-block data upload type and upload data is greater than a preset threshold, such as glbuffer data, glsubbuffer data, and the like. The first Event is immediately processed by the second thread in the Event manager module to get the second task. The immediate processing may be that the processing order of the first event requiring immediate processing is arranged in the first order, that is, after receiving the first event requiring immediate processing, the first event is first processed if possible.
Under the condition that the currently processed first event does not belong to the preset first type, that is, under the condition that the first event does not need to be immediately packaged into the second task, if the number of the currently accumulated cached first events is greater than the first threshold N, the cached first event can be packaged into the second task through the first thread. It should be noted that the first threshold N may be set according to an actual situation, and the first threshold N may be a maximum value of the number of packages of the first event, which is not limited in this embodiment of the application.
And if the load of the current GPU is too small and the GPU can continue to process, packaging the first events obtained by current processing into a second task through the first thread. The second threshold M is smaller than the first threshold N, where the first threshold N may be a maximum value of the packing number of the first event, and the second threshold M may be a minimum value of the packing number of the first event. The first threshold and the second threshold may be set according to actual situations, and the embodiments of the present application do not limit the thresholds.
Under the condition that the number of the first events currently cached in the accumulated cache is larger than a second threshold value M and smaller than a first threshold value, if the load of the current GPU is too large and the GPU cannot process any more, the first events can be cached in the Event manager module, and the first events cached in the Event manager module are packaged into a second task through a first thread when processing is needed.
The event management module 403B dynamically controls the frequency of submitting to the GPU by dynamically controlling the number and timing of the first events for packaging, so that it can be ensured that the GPU handles a high-load and stable operating state, and the hardware resources of the GPU can be fully utilized.
Referring to fig. 5E, fig. 5E is a flowchart illustrating a process for processing a second task according to an embodiment of the present application, that is, a process for processing the second task from an event management module by an event wrapper module. In the third thread run by the piece packaging module 403C, processing the second task mainly includes: the first event in the second task is extracted, the information in the first event is analyzed, whether the order of the first event can be rearranged according to the rearrangement optimization strategy is judged, then the Vulkan API is used to store the OpenGL ES instruction included in the first event into the Command buffer, and the Command buffer is sent to the task scheduling module 403D.
Wherein the rearrangement optimization strategy comprises one or more of the following: and performing sequential adjustment on the pipeline state corresponding to the drawing instruction (strategy one), and performing sequential adjustment on the uniform uploading instruction and the drawing instruction based on the scenario of the repeated modification and use of the uniform (strategy two).
The arrangement rule of the strategy one is as follows: if there is a Draw instruction Draw call in the OpenGL ES instruction in the first event included in the second task, because each Draw instruction corresponds to a pipeline state, multiple Draw instructions corresponding to the same pipeline state may be merged together. For example, referring to fig. 5F, fig. 5F is a schematic diagram of rearrangement according to policy one provided in the present embodiment, as can be seen from fig. 5F, the sequences of the draw instructions in the OpenGL ES instructions in the first event included in the second task before rearrangement are draw instruction 1, draw instruction 2, draw instruction 3, and draw instruction 4, where draw instruction 1 corresponds to pipeline state 1, draw instruction 2 corresponds to pipeline state 2, draw instruction 3 corresponds to pipeline state 1, and draw instruction 4 corresponds to pipeline state 2. The draw instructions in the OpenGL ES instructions in the first event included in the second task after the rearrangement are ordered as draw instruction 1, draw instruction 3, draw instruction 2, and draw instruction 4, where draw instruction 1 and draw instruction 3 both correspond to pipeline state 1, and draw instruction 2 and draw instruction 4 both correspond to pipeline state 2. Before rearrangement, four rendering instructions need to be switched into pipeline states four times, and after rearrangement is performed according to the first strategy, the pipeline states need to be switched into pipeline states two times.
The arrangement rule of the strategy one is as follows: if the OpenGL ES instruction in the first event included in the second task needs to modify the same uniform many times, a uniform with a larger memory may be used, the data modified each time is stored in a different memory offset of the uniform with the larger memory, and then a plurality of uniform modification values are uploaded once and then drawn by using the drawing instruction. For example, referring to fig. 5G, fig. 5G is a schematic diagram of rearrangement according to policy two provided in the embodiment of the present application, and as can be seen from fig. 5G, before rearrangement, after a unique 1 in an OpenGL ES instruction in a first event included in a second task is modified, the modified data is used for drawing by using a drawing instruction, and after a unique 1 is modified again, the modified data is used for drawing again. It can be seen that, for the behavior that the same unified file is modified and immediately used for drawing by using the modified data and is modified for multiple times, the uploading frequency may be too high, and resource waste may be caused. After rearrangement, the plurality of uniform modified values are uploaded once and then are drawn by using the drawing instruction, so that the uniform data uploading for many times is reduced to uploading once.
Referring to fig. 5H, fig. 5H is a schematic flowchart of a process of sending the second task in the command buffer to the GPU according to the embodiment of the present application, that is, a process of sending the second task in the command buffer to the GPU by the task scheduling module. The event wrapper module 403C running on the third thread continuously sends the second task stored in the command buffer to the lock-free queue shared with the task scheduling module 403D, the task scheduling module 403D running on the second thread takes out the command buffer from the shared lock-free queue, submits the command buffer to the GPU through the Vulkan interface, and finally, the GPU completes graphics rendering according to the second task.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. The electronic device 60 comprises at least one memory 601 and at least one processor 602 and a communication interface 604. Optionally, a bus 603 may be included, wherein the memory 601, the processor 602, and the communication interface 604 are connected via the bus 603.
The memory 601 is used to provide a storage space, and the storage space may store data such as an operating system and computer instructions. The memory 601 may be a combination of one or more of RAM, ROM, EPROM, or CD-ROM, among others.
The processor 602 is a module for performing arithmetic operation and/or logical operation, and may specifically be one or a combination of plural processing modules such as a CPU, a GPU, an MPU, an ASIC, an FPGA, and a CPLD.
The communication interface 604 is used for receiving and/or transmitting data from/to the outside, and may be a wired link interface such as an ethernet cable, or may be a wireless link (Wi-Fi, bluetooth, universal wireless transmission, vehicle-mounted short-range communication technology, etc.) interface. Optionally, the communication interface 604 may also include a transmitter (e.g., a radio frequency transmitter, an antenna, etc.), or a receiver, etc. coupled to the interface.
The processor 602 in the electronic device 60 is configured to read the computer instructions stored in the memory 601 for executing the aforementioned method for implementing graphics rendering based on Vulkan, such as the method for implementing graphics rendering based on Vulkan described in the embodiment shown in fig. 3. The processor 602 in the electronic device 60 is configured to read the computer instructions stored in the memory 601, and is configured to perform the following operations:
processing, by a first thread, an asynchronously executed OpenGL ES instruction into a first event, which may include description information of the asynchronously executed OpenGL ES instruction; then, a second task job can be obtained through a second thread, wherein the second task includes a first event or a plurality of first events (events), and the plurality of first events are obtained through processing according to the same or different asynchronously-executed OpenGL ES instructions; then the second task can be stored in a command cache region through a third thread; and finally, sending a second task in the command buffer area to a GPU image processor through a Vulkan interface, wherein the GPU is used for finishing image rendering according to the second task.
In the embodiment of the present application, the OpenGL ES graphics rendering is implemented by using Vulkan as a backend, and because the Vulkan supports multi-thread parallel processing, the embodiment of the present application utilizes the multi-thread advantage of the Vulkan to process different events or tasks in different threads (for example, a first thread, a second thread, and a third thread), for example: the method comprises the steps of processing an OpenGL ES instruction which is executed asynchronously into a first event through a first thread, receiving the first event through a second thread and processing the first event into a second task joba, storing the second task joba into a command cache region in a third thread, sending the command cache region to a GPU through a Vulkan interface, and finishing graphics rendering through the GPU. Moreover, when the third thread processes the second task, the second thread also processes other received first events, and the first thread also processes other received OpenGL ES instructions executed asynchronously, which can be seen that each thread is processed in parallel, so that the processing efficiency can be improved, and therefore, the frequency of submitting to the GPU can be improved, and the purpose of improving the rendering efficiency can be achieved. The purpose of improving the rendering efficiency is achieved.
In one possible implementation, before processing an OpenGL ES instruction executed asynchronously by a first thread into a first event, the method further includes: the multiple OpenGL ES instructions are classified into OpenGL ES instructions executed asynchronously and OpenGL ES instructions executed synchronously according to the type of OpenGL ES application program interface API.
In one possible implementation, the OpenGL ES instructions of the state setting and obtaining type and the OpenGL ES instructions of the object generation and destruction type both belong to OpenGL ES instructions executed synchronously; both OpenGL ES instructions of the data upload type and OpenGL ES instructions of the rendering type belong to OpenGL ES that are executed asynchronously.
In one possible implementation, after classifying the plurality of OpenGL ES instructions into an OpenGL ES instruction to be executed asynchronously and an OpenGL ES instruction to be executed synchronously according to the type of the OpenGL ES application program interface API, saving, by the first thread, description information of the OpenGL ES instruction to be executed synchronously to a data structure.
In one possible implementation manner, after classifying the OpenGL ES instructions into an OpenGL ES instruction executed asynchronously and an OpenGL ES instruction executed synchronously according to the type of the OpenGL ES application program interface API, the method further includes: after obtaining description information of the OpenGL ES instructions which are synchronously executed from a data structure through a first thread, sending the description information to an image processor, wherein the image processor is used for finishing image rendering according to the description information of the OpenGL ES instructions which are synchronously executed; the first thread may be a rendering thread of an OpenGL ES application.
In a possible implementation manner, the obtaining of the second task jobby the second thread processing includes: and under the condition that the first event obtained by current processing belongs to a preset first type, executing the step of processing the first event through a second thread to obtain a second task job.
In a possible implementation manner, the obtaining of the second task jobby the second thread processing includes: and under the condition that the currently processed first event does not belong to a preset first type, if the number of the currently accumulated cached first events is greater than a first threshold, executing a step of processing the first event through a second thread to obtain a second task.
In a possible implementation manner, the obtaining of the second task jobby the second thread processing includes: and under the condition that the first event obtained by current processing does not belong to a preset first type and the number of the first events currently accumulated and cached is greater than a second threshold and smaller than a first threshold, if the load of the current GPU is large, executing a step of processing the first event through a second thread to obtain a second task.
In a possible implementation manner, the obtaining of the second task jobby the second thread processing includes: and under the condition that the first event obtained by current processing does not belong to a preset first type and the number of the first events currently accumulated and cached is greater than a second threshold and smaller than a first threshold, if the load of the current GPU is small, caching the first object.
In one possible embodiment, the second task includes a plurality of first events, and the caching of the second task in the command buffer by the third thread includes: sequencing the plurality of first events in the second task through the third thread; and caching the sequenced second task to a command cache region.
In one possible implementation, the ordering, by the third thread, the plurality of first events in the second task includes: and sequencing the plurality of first events in the second task through the third thread based on pipeline states corresponding to the plurality of first events in the second task.
In one possible implementation, the plurality of first events in the second task are ordered together for the first events of the same pipeline state.
In one possible embodiment, the two tasks include a plurality of first events, and the second task is cached in the command cache area through the third thread, including: sequencing the first events, which are modified by corresponding to the same variable, of the plurality of first events in the second task through a third thread; modifying the sequenced first event in the second task based on the variable; and caching the second task which is modified after being sequenced into a command cache region.
In one possible implementation, the first events of the multiple events that are modified for the same variable are ordered together.
In one possible implementation, processing an asynchronous instruction into a first object by a first thread includes: acquiring description information of an asynchronous OpenGL ES instruction through a first thread; the description information of the asynchronous OpenGL ES instructions and the asynchronous OpenGL ES instructions are encapsulated as a first object.
Embodiments of the present application further provide a computer-readable storage medium, in which computer instructions are stored, and when the computer instructions are executed on one or more processors, the method for implementing graphics rendering based on Vulkan described in the embodiment shown in fig. 3 is implemented.
Embodiments of the present application further provide a computer program product having stored therein computer instructions, which when executed by one or more processors, cause the one or more processors to implement the method for implementing graphics rendering based on Vulkan as described in the embodiment shown in fig. 3.
The embodiment of the present application further provides a chip system, which includes at least one processor, a memory and an interface circuit, where the interface circuit is configured to provide information input/output for the at least one processor, and the at least one memory stores computer instructions, and when the computer instructions are executed on one or more processors, the method for implementing graphics rendering based on Vulkan as described in the embodiment shown in fig. 3 is implemented.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer instruction product. The computer instruction product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application may be wholly or partially implemented. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
The steps in the method embodiment of the present application may be sequentially adjusted, combined, and deleted according to actual needs.
The modules in the device embodiment of the application can be combined, divided and deleted according to actual needs.

Claims (33)

1. A method for realizing graphics rendering based on Vulkan is characterized by comprising the following steps:
processing, by a first thread, an asynchronously executed OpenGL ES instruction into a first event, the first event including description information of the asynchronously executed OpenGL ES instruction;
processing the first event through a second thread to obtain a second task, wherein the second task comprises one first event or a plurality of first events, and the plurality of first events are obtained by processing according to the same or different asynchronously executed OpenGL ES instructions;
saving the second task to a command cache region through a third thread;
and sending a second task in the command cache region to a graphics processor through a Vulkan interface, wherein the graphics processor is used for finishing graphics rendering according to the second task.
2. The method of claim 1, wherein prior to processing the asynchronously executed OpenGL ES instruction as the first event via the first thread, further comprising:
the OpenGL ES instructions are classified into asynchronously executed OpenGL ES instructions and synchronously executed OpenGL ES instructions according to the type of OpenGL ES application program interface.
3. The method of claim 2, wherein OpenGL ES instructions of a state setting and obtaining type and OpenGL ES instructions of an object generation and destruction type both belong to the OpenGL ES instructions executed synchronously; both OpenGL ES instructions of a data upload type and OpenGL ES instructions of a rendering type belong to the asynchronously executed OpenGL ES.
4. The method of claim 2 or 3, wherein after classifying the plurality of OpenGL ES instructions as asynchronously executed OpenGL ES instructions and synchronously executed OpenGL ES instructions according to the type of OpenGL ES application program interface, the method further comprises:
and saving the description information of the OpenGL ES instruction synchronously executed to a data structure through the first thread.
5. The method of any one of claims 2-4, wherein after classifying the plurality of OpenGL ES instructions as asynchronously executing OpenGL ES instructions and synchronously executing OpenGL ES instructions according to a type of OpenGL ES Application Program Interface (API), the method further comprises:
and sending the description information of the OpenGL ES instruction to the image processor after the description information of the OpenGL ES instruction is obtained from a data structure, wherein the image processor is configured to complete image rendering according to the description information of the OpenGL ES instruction.
6. The method of any of claims 1-5, wherein processing the first event by a second thread results in a second task comprising:
and when the first event belongs to a preset first type, executing the step of processing the first event through a second thread to obtain a second task.
7. The method of any of claims 1-6, wherein processing the first event by a second thread results in a second task comprising:
and when the first event does not belong to a preset first type, and if the number of the first events which are accumulatively cached is greater than a first threshold value, executing the step of processing the first event through a second thread to obtain a second task.
8. The method of claims 1-7, wherein processing the first event by a second thread results in a second task job, comprising:
and when the first event does not belong to a preset first type and the number of the first events currently accumulated and cached is greater than a second threshold and is less than or equal to a first threshold, if the load of the GPU is less than the first load threshold, executing the step of processing the first event through a second thread to obtain a second task, wherein the second threshold is less than the first threshold.
9. The method according to any one of claims 1 to 8, wherein the obtaining of the second task job by the second thread processing comprises:
and when the first event does not belong to a preset first type and the number of the first events currently accumulated and cached is greater than a second threshold and less than or equal to a first threshold, caching the first event if the load of the GPU is greater than or equal to the first load threshold.
10. The method of any of claims 1-9, wherein the second task contains a plurality of first events, and wherein caching the second task into a command cache by a third thread comprises:
sequencing, by the third thread, the plurality of first events in the second task;
and caching the second task after the sequencing to a command cache region.
11. The method of any of claim 10, wherein said ordering, by said third thread, a plurality of first events in said second task comprises:
and ordering the plurality of first events in the second task by a third thread based on the pipeline states corresponding to the plurality of first events.
12. The method of claim 11, wherein first events of the plurality of first events corresponding to a same pipeline state are ordered together.
13. The method of any of claims 1-9, wherein the second task contains a plurality of first events, and wherein caching the second task into a command cache by a third thread comprises:
sequencing first events which are modified corresponding to the same variable in the plurality of first events through a third thread;
modifying the sequenced first event in the second task based on the variable;
and caching the modified second task to a command cache region.
14. The method of claim 13, wherein a first event of the plurality of events that is modified for a same variable is ordered together.
15. The method of any of claims 1-14, wherein processing the asynchronous instruction into the first object by the first thread comprises:
obtaining description information of the asynchronous OpenGL ES instruction through the first thread;
and encapsulating the description information of the asynchronous OpenGL ES instruction and the asynchronous OpenGL ES instruction into a first object.
16. An apparatus for implementing graphics rendering based on Vulkan, comprising:
a context module, configured to process, by a first thread, an OpenGL ES instruction that is executed asynchronously into a first event, where the first event includes description information of the OpenGL ES instruction that is executed asynchronously;
the event management module is used for processing the first event through a second thread to obtain a second task joba, wherein the second task comprises one first event or a plurality of first events, and the plurality of first events are obtained by processing according to the same or different asynchronously executed OpenGL ES instructions;
the event packaging module is used for storing the second task to a command cache region through a third thread;
and the task scheduling module is used for sending the second task in the command cache area to a GPU (graphics processing Unit) through a Vulkan interface, wherein the GPU is used for finishing graphics rendering according to the second task.
17. The apparatus of claim 16, wherein the context module is further configured to classify the plurality of OpenGL ES instructions as asynchronously executed OpenGL ES instructions and synchronously executed OpenGL ES instructions according to a type of OpenGL ES application program interface API.
18. The apparatus of claim 17, wherein OpenGL ES instructions of a state setting and obtaining type and OpenGL ES instructions of an object generation and destruction type both belong to the synchronously executing OpenGL ES instructions; both OpenGL ES instructions of a data upload type and OpenGL ES instructions of a rendering type belong to the asynchronously executed OpenGL ES.
19. The apparatus of claim 17 or 18, wherein the context module is further configured to save, by the first thread, description information of the OpenGL ES instruction executed synchronously to a data structure.
20. The apparatus of any one of claims 17-19, wherein the context module is further configured to send, after obtaining, by the first thread, the description information of the OpenGL ES instruction executed synchronously from a data structure, to the image processor, and the image processor is configured to complete graphics rendering according to the description information of the OpenGL ES instruction executed synchronously.
21. The apparatus according to any one of claims 16 to 20, wherein the event management module is specifically configured to:
and when the first event belongs to a preset first type, executing the step of obtaining a second task job through the second thread processing.
22. The apparatus according to any of claims 16-21, wherein the event management module is specifically configured to:
and when the first event does not belong to a preset first type, if the number of the first events currently accumulated and cached is greater than a first threshold value, executing the step of processing the first event through a second thread to obtain a second task.
23. The apparatus according to any of claims 16-22, wherein the event management module is specifically configured to:
and when the first event does not belong to a preset first type and the number of the first events currently accumulated and cached is greater than a second threshold and is less than or equal to a first threshold, if the load of the GPU is less than the first load threshold, executing the step of processing the first event through a second thread to obtain a second task, wherein the second threshold is less than the first threshold.
24. The apparatus according to any one of claims 16 to 23, wherein the event management module is specifically configured to:
and when the first event does not belong to a preset first type and the number of the first events currently accumulated and cached is greater than a second threshold and less than or equal to a first threshold, caching the first event if the load of the GPU is greater than or equal to the first load threshold.
25. The apparatus according to any one of claims 16-24, wherein the event wrapper module is specifically configured to:
ordering, by the third thread, a plurality of first events in the second task;
and caching the second task after the sequencing to a command cache region.
26. The apparatus of claim 25, wherein the event wrapper module is specifically configured to order, by a third thread, the plurality of first events in the second task based on pipeline statuses corresponding to the plurality of first events.
27. The apparatus of claim 26, wherein the first events of the first plurality correspond to a same pipeline state are ordered together.
28. The apparatus according to any one of claims 16-24, wherein the event wrapper module is specifically configured to:
sequencing the first events, which are modified by corresponding to the same variable, of the plurality of first events in the second task through a third thread;
modifying the sequenced first event in the second task based on the variable;
and caching the second task which is modified after being sequenced into a command cache region.
29. The apparatus of claim 28, wherein a first event of the plurality of events that modifies the same variable is ordered together.
30. The apparatus according to any of claims 16-29, wherein the context module is specifically configured to:
obtaining description information of the asynchronous OpenGL ES instruction through the first thread;
and encapsulating the description information of the asynchronous OpenGL ES instruction and the asynchronous OpenGL ES instruction into a first object.
31. An electronic device, comprising a processor and a memory; the processor is configured to execute the memory-stored computer instructions to cause the electronic device to implement the method of any of claims 1-15.
32. A computer-readable storage medium having stored therein computer instructions, which when executed by one or more processors, cause the one or more processors to implement the method of any one of claims 1-15.
33. A computer program product having stored therein computer instructions which, when executed by one or more processors, cause the one or more processors to carry out the method of any one of claims 1-15.
CN202011236910.1A 2020-11-06 2020-11-06 Vulkan-based method for realizing graphic rendering and related device Pending CN114528090A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011236910.1A CN114528090A (en) 2020-11-06 2020-11-06 Vulkan-based method for realizing graphic rendering and related device
PCT/CN2021/127780 WO2022095808A1 (en) 2020-11-06 2021-10-30 Method for implementing graphics rendering on basis of vulkan, and related apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011236910.1A CN114528090A (en) 2020-11-06 2020-11-06 Vulkan-based method for realizing graphic rendering and related device

Publications (1)

Publication Number Publication Date
CN114528090A true CN114528090A (en) 2022-05-24

Family

ID=81457521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011236910.1A Pending CN114528090A (en) 2020-11-06 2020-11-06 Vulkan-based method for realizing graphic rendering and related device

Country Status (2)

Country Link
CN (1) CN114528090A (en)
WO (1) WO2022095808A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116524104A (en) * 2023-07-03 2023-08-01 腾讯科技(深圳)有限公司 Rendering data processing method, device, equipment and storage medium
CN117437451A (en) * 2023-12-21 2024-01-23 芯瞳半导体技术(山东)有限公司 Image matching method, device, equipment and storage medium
CN117687771A (en) * 2023-07-24 2024-03-12 荣耀终端有限公司 Buffer allocation device, electronic equipment and storage medium
CN117992237A (en) * 2024-03-18 2024-05-07 麒麟软件有限公司 Rendering API forwarding method based on virgl graphic technology stack

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117631930A (en) * 2022-09-01 2024-03-01 苏州浩辰软件股份有限公司 Method, system and storage medium for quick response of drawing
CN117724987B (en) * 2024-02-18 2024-05-17 北京麟卓信息科技有限公司 OpenGL hierarchical realization verification method based on texture conversion tracking

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101869939B1 (en) * 2012-01-05 2018-06-21 삼성전자주식회사 Method and apparatus for graphic processing using multi-threading
CN107223264B (en) * 2016-12-26 2022-07-08 达闼机器人股份有限公司 Rendering method and device
US10579382B2 (en) * 2018-01-24 2020-03-03 Intel Corporation Method and apparatus for a scalable interrupt infrastructure
CN111158866A (en) * 2019-12-30 2020-05-15 珠海金山网络游戏科技有限公司 Engine system and rendering method thereof

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116524104A (en) * 2023-07-03 2023-08-01 腾讯科技(深圳)有限公司 Rendering data processing method, device, equipment and storage medium
CN116524104B (en) * 2023-07-03 2023-10-03 腾讯科技(深圳)有限公司 Rendering data processing method, device, equipment and storage medium
CN117687771A (en) * 2023-07-24 2024-03-12 荣耀终端有限公司 Buffer allocation device, electronic equipment and storage medium
CN117437451A (en) * 2023-12-21 2024-01-23 芯瞳半导体技术(山东)有限公司 Image matching method, device, equipment and storage medium
CN117437451B (en) * 2023-12-21 2024-04-16 芯瞳半导体技术(山东)有限公司 Image matching method, device, equipment and storage medium
CN117992237A (en) * 2024-03-18 2024-05-07 麒麟软件有限公司 Rendering API forwarding method based on virgl graphic technology stack

Also Published As

Publication number Publication date
WO2022095808A1 (en) 2022-05-12

Similar Documents

Publication Publication Date Title
CN109643291B (en) Method and apparatus for efficient use of graphics processing resources in virtualized execution environments
WO2022095808A1 (en) Method for implementing graphics rendering on basis of vulkan, and related apparatus
US10664942B2 (en) Reconfigurable virtual graphics and compute processor pipeline
US11443405B2 (en) Mechanism to accelerate graphics workloads in a multi-core computing architecture
JP5735187B2 (en) Graphics processing unit with command processor
CN108701368B (en) More efficient ray tracing methods and apparatus for embodied geometries
TWI614685B (en) System for efficient graphics processing in a virtual execution environment
US10776156B2 (en) Thread priority mechanism
CN110352403B (en) Graphics processor register renaming mechanism
US8675002B1 (en) Efficient approach for a unified command buffer
US11550632B2 (en) Facilitating efficient communication and data processing across clusters of computing machines in heterogeneous computing environment
US20150348224A1 (en) Graphics Pipeline State Object And Model
CN108604185B (en) Method and apparatus for efficiently submitting workload to a high performance graphics subsystem
US20180033116A1 (en) Apparatus and method for software-agnostic multi-gpu processing
US20190004840A1 (en) Register partition and protection for virtualized processing device
US9830676B2 (en) Packet processing on graphics processing units using continuous threads
US10769751B2 (en) Single input multiple data processing mechanism
CN109564676B (en) On-die tessellation allocation
JP2020525914A (en) Firmware changes for virtualized devices
US11094032B2 (en) Out of order wave slot release for a terminated wave
US20240095083A1 (en) Parallel workload scheduling based on workload data coherence
Lorenzon et al. A novel multithreaded rendering system based on a deferred approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination