WO2023044877A1 - 一种渲染任务处理方法、装置、电子设备及存储介质 - Google Patents

一种渲染任务处理方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023044877A1
WO2023044877A1 PCT/CN2021/120797 CN2021120797W WO2023044877A1 WO 2023044877 A1 WO2023044877 A1 WO 2023044877A1 CN 2021120797 W CN2021120797 W CN 2021120797W WO 2023044877 A1 WO2023044877 A1 WO 2023044877A1
Authority
WO
WIPO (PCT)
Prior art keywords
rendering
thread
task
subtasks
sub
Prior art date
Application number
PCT/CN2021/120797
Other languages
English (en)
French (fr)
Inventor
武云潇
凌华彬
林顺
Original Assignee
厦门雅基软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 厦门雅基软件有限公司 filed Critical 厦门雅基软件有限公司
Priority to PCT/CN2021/120797 priority Critical patent/WO2023044877A1/zh
Publication of WO2023044877A1 publication Critical patent/WO2023044877A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation

Definitions

  • Embodiments of the present disclosure relate to the field of computer technology, and in particular to a rendering task processing method, device, electronic device, and storage medium.
  • a typical rendering frame it usually includes multiple rendering tasks (Render Pass), and each rendering task consists of a series of rendering instructions (Draw Call). Differences between rendering tasks may include different rendering purposes, different write policies, and/or different read policies, etc.
  • rendering tasks In modern graphics interfaces, the types of rendering tasks should be as few as possible, and each rendering command in each rendering task should be recorded in a rendering command buffer object (Command Buffer). Among them, typical types of rendering tasks are divided into: forward rendering tasks, post-processing rendering tasks, and user interface (User Interface, UI) rendering tasks.
  • forward rendering tasks In modern graphics interfaces, the types of rendering tasks should be as few as possible, and each rendering command in each rendering task should be recorded in a rendering command buffer object (Command Buffer).
  • typical types of rendering tasks are divided into: forward rendering tasks, post-processing rendering tasks, and user interface (User Interface, UI) rendering tasks.
  • Typical multi-thread optimization performs parallel processing on rendering tasks, specifically, dispatching different rendering tasks to different threads for recording or reading.
  • rendering tasks specifically, dispatching different rendering tasks to different threads for recording or reading.
  • forward rendering tasks contain most of the rendering instructions, so the parallelism of typical multi-thread optimization is relatively low. Poor, less effective at parallelizing rendering tasks.
  • an embodiment of the present disclosure provides a rendering task processing method, wherein a rendering task includes multiple rendering instructions, and a main rendering instruction buffer object is used to buffer a rendering task, and the rendering task processing method includes:
  • the rendering thread splits the rendering task into multiple subtasks, and each subtask includes the same number of rendering instructions;
  • the rendering thread splits the main rendering instruction buffer object into multiple sub-rendering instruction buffer objects, and each sub-rendering instruction buffer object is used to buffer a subtask;
  • the rendering thread assigns multiple subtasks to multiple worker threads in one-to-one correspondence, so that multiple worker threads can record multiple subtasks into corresponding sub-rendering instruction buffer objects in parallel.
  • an embodiment of the present disclosure further provides a rendering task processing apparatus, where a rendering task includes multiple rendering instructions, and a main rendering instruction buffer object is used to buffer a rendering task, and the rendering task processing apparatus includes:
  • the first splitting module is configured to split the rendering task into multiple subtasks by the rendering thread for one rendering task, and each subtask includes the same number of rendering instructions;
  • the second splitting module is configured as a rendering thread that splits the main rendering instruction buffer object into multiple sub-rendering instruction buffer objects, and each sub-rendering instruction buffer object is used to buffer a subtask;
  • the recording module is configured so that the rendering thread assigns multiple subtasks to multiple worker threads in one-to-one correspondence, so that multiple worker threads record multiple subtasks into corresponding sub-rendering instruction buffer objects in parallel.
  • an embodiment of the present disclosure also provides an electronic device, including:
  • memory for storing computer programs that are executable on processors that, when executed by the processors, are configured to:
  • the rendering thread splits the rendering task into multiple subtasks, and each subtask includes the same number of rendering instructions;
  • the rendering thread splits the main rendering instruction buffer object into multiple sub-rendering instruction buffer objects, and each sub-rendering instruction buffer object is used to buffer a subtask;
  • the rendering thread assigns multiple subtasks to multiple worker threads in one-to-one correspondence, so that multiple worker threads record multiple subtasks into corresponding sub-rendering instruction buffer objects in parallel.
  • the embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the above rendering task processing method is implemented.
  • FIG. 1 is a schematic flowchart of a multi-thread rendering method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a rendering task processing method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of another rendering task processing method provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic block diagram of a rendering task processing device provided by an embodiment of the present disclosure.
  • Fig. 5 is a schematic block diagram of an electronic device provided by an embodiment of the present disclosure.
  • multi-core CPU Central Processing Unit, central processing unit
  • mobile terminals such as smart phones
  • OpenGL Open Graphics Library
  • Open Graphics Library Open Graphics Library
  • graphics API Application Programming Interface, application programming interface
  • graphics interface adaptation layer In traditional cross-platform renderers, different hardware graphics interfaces are encapsulated into a set of unified interfaces through the graphics interface adaptation layer, where the graphics interface adaptation layer is recorded as GFX (Graphics) adaptation layer or RHI (Render Hardware Interface , rendering hardware interface) adaptation layer.
  • the graphics interface adaptation layer provides a unified graphics interface for upper-level users, and adapts to different hardware graphics interfaces at the bottom layer.
  • Different hardware graphics interfaces such as: low-level rendering application programming interface (Metal interface), cross-platform Graphics application programming interface (Vulkan interface), a subset of OpenGL 3D graphics API (OpenGL for Embedded Systems, OpenGL ES), multimedia programming interface (DirectX), etc.
  • the embodiment of the present disclosure provides a multi-threaded framework design applied to a cross-platform renderer, so as to utilize the computing power of a multi-core CPU, reduce the waste of multi-core CPU computing resources, improve the overall operating performance of the renderer, and provide hardware graphics interface encapsulation to upper-level users.
  • At least one embodiment of the present disclosure provides a multi-thread rendering method, by splitting the graphics interface adaptation layer of the renderer into a rendering layer and a device layer based on a proxy pattern (Proxy Pattern), the rendering layer is at least used to provide hardware graphics
  • the interface is encapsulated to obtain a corresponding encapsulated interface
  • the device layer is at least used to provide a hardware graphics interface.
  • the split of the rendering layer and the device layer is transparent to the upper-layer users, and does not change the user's existing usage habits.
  • a rendering thread and a device thread are created by the renderer, so that the rendering thread realizes the function of the rendering layer, and the device thread realizes the function of the device layer.
  • the rendering thread and the device thread can perform lock-free data exchange, and complete message communication and data transmission through the producer-consumer mode.
  • Fig. 1 is a multi-threaded rendering method provided by an embodiment of the present disclosure, which is applied to a renderer, and the renderer creates a rendering thread and a device thread, wherein both the rendering thread and the device thread are resident main threads.
  • the multi-thread rendering method may include but not limited to step 101 to step 105 .
  • step 101 the rendering thread responds to the invocation operation of the encapsulation interface, and creates an invocation message corresponding to the encapsulation interface.
  • the encapsulation interface includes but not limited to a rendering queue agent interface (QueueAgent), a rendering command buffer agent interface (CommandBufferAgent), and a rendering device agent interface (DeviceAgent).
  • a rendering queue agent interface QueueAgent
  • a rendering command buffer agent interface CommandBufferAgent
  • a rendering device agent interface DeviceAgent
  • the call message corresponding to the encapsulation interface includes an execution function of the hardware graphics interface and at least one execution parameter of the execution function.
  • Calling the encapsulation interface does not directly execute the encapsulation interface, but the rendering thread creates the call message corresponding to the encapsulation interface through the message queue (Message Queue).
  • step 102 the rendering thread records the call message to the message buffer queue.
  • the message buffer queue (Message Buffer) is a lock-free queue
  • the rendering thread is responsible for recording (that is, writing) the call message into the message buffer queue
  • the device thread is responsible for reading the call message from the message buffer queue.
  • the rendering thread After the rendering thread creates the call message corresponding to the encapsulation interface, it will append the call message to the message buffer queue (Message Buffer).
  • Message Buffer the message buffer queue
  • step 103 the rendering thread wakes up the device thread after recording a rendering frame into the message buffer queue.
  • the rendering frame includes at least one calling message.
  • the rendering thread After the rendering thread determines that all call messages in the current rendering frame are recorded in the message buffer queue, it actively wakes up the device thread. Since the message buffer queue is a lock-free queue, the rendering thread proceeds to the next rendering frame directly after waking up the device thread recording without waiting for the device thread to read the current rendered frame. Compared with the sequential recording and reading of rendered frames in the prior art, this embodiment can halve the processing time of a rendered frame, thereby improving the performance of the renderer. overall operating performance.
  • the rendering thread After the rendering thread records the first rendering frame into the message buffer queue, it wakes up the device thread, and after waking up the device thread, records the second rendering frame into the message buffer queue.
  • the reading and recording of the second rendering frame by the rendering thread are synchronized, and the device thread and the rendering thread can access the message buffer queue at the same time.
  • the rendering thread may notify the message buffer queue to complete the recording of a rendering frame, and wake up the device thread after the notification is completed.
  • step 104 the device thread reads the call message from the message buffer queue after being woken up.
  • the device thread After the device thread is woken up, it reads the calling messages sequentially from the message buffer queue, and waits for wake-up after reading all the calling messages in the message buffer queue.
  • the message buffer queue includes write pointers and read pointers. After the device thread is awakened, it sequentially reads the call messages from the message buffer queue until the read pointer of the device thread in the message buffer queue points to the position of the write pointer of the rendering thread, that is, after consuming all the call messages in the message buffer queue, the device thread Go to sleep and wait to wake up.
  • step 105 the device thread invokes the corresponding hardware graphics interface based on the read invocation message.
  • the device thread can determine the target execution function and at least one target execution parameter based on the read call message, wherein the target execution function is read The execution function included in the call message; at least one target execution parameter is at least one execution parameter included in the read call message.
  • the device thread executes the target execution function based on at least one target execution parameter, so that the target execution function invokes a corresponding hardware graphics interface.
  • the hardware graphics interface includes but not limited to a rendering queue interface (Queue), a rendering command buffer interface (CommandBufferAgent), and a rendering device interface (Device).
  • the call message is the call message created by the rendering thread in response to the call operation of the rendering command buffer agent interface (CommandBuffer Agent), then after the device thread reads the call message, the device thread can determine the target execution function and At least one target execution parameter, and then the device thread can execute the target execution function based on the at least one target execution parameter, so that the target execution function invokes a corresponding rendering command buffer interface (CommandBuffer).
  • the rendering command buffer agent interface CommonBuffer Agent
  • the graphic interface adaptation layer of the renderer is split into a rendering layer and a device layer based on a proxy pattern (Proxy Pattern), and a rendering thread and a device thread are created by the renderer, so that the rendering thread realizes the rendering layer
  • the function of the device thread implements the function of the device layer.
  • parallel processing is performed on rendering instructions, specifically, by splitting a rendering task into subtasks including the same number of rendering instructions , and split the main rendering instruction buffer object into multiple sub-rendering instruction buffer objects.
  • a worker thread records multiple sub-tasks in parallel to the corresponding sub-rendering instruction buffer object, so that the recording (that is, writing) of the call message can be dispatched to multiple worker threads in the rendering thread, or the device thread can send the call message to multiple worker threads.
  • the read of the call message is dispatched to multiple worker threads.
  • one call message corresponds to one rendering task or multiple rendering tasks.
  • the thread processes the previous rendering task, and another task thread is used to process the UI rendering task. Obviously, the former takes longer.
  • the embodiments of the present disclosure enable multiple worker threads to record multiple sub-tasks including the same number of rendering instructions into corresponding sub-rendering instruction buffer objects in parallel, which can improve the parallelism of multi-core CPUs in terms of processing time.
  • typical multi-thread optimization performs parallel processing on rendering tasks
  • typical types of rendering tasks are divided into: forward rendering tasks, post-processing rendering tasks, and user interface (User Interface, UI) rendering tasks. Therefore, typical multi-thread optimization only uses three threads to process forward rendering tasks, post-processing rendering tasks and user interface (User Interface, UI) rendering tasks in parallel.
  • multiple subtasks are assigned to multiple worker threads in one-to-one correspondence, and the multiple worker threads process the subtasks in parallel.
  • the number of split subtasks can be greater than the number of types of rendering tasks, and Each subtask is assigned to a worker thread, and multiple worker threads process multiple subtasks in parallel.
  • embodiments of the present disclosure can use multiple threads (number greater than three) to perform parallel processing. Compared with typical multithread optimization, only using Three threads are processed in parallel, and the parallelism of multi-core CPU can be improved from the number of parallel threads.
  • Fig. 2 is a rendering task processing method provided by an embodiment of the present disclosure, wherein a rendering task includes multiple rendering commands, and a primary rendering command buffer object (Primary Command Buffer) is used to buffer a rendering task, the rendering task processing method It includes steps 201 to 203 as follows.
  • Primary rendering command Buffer Primary Command Buffer
  • step 201 for a rendering task, the rendering thread splits the rendering task into multiple subtasks, and each subtask includes the same number of rendering instructions.
  • the number of split subtasks may be greater than the number of types of rendering tasks.
  • typical types of rendering tasks are divided into: forward rendering tasks, post-processing rendering tasks and user interface (User Interface, UI) rendering tasks, that is, The number of types of rendering tasks is 3, and the number of split subtasks may be greater than 3.
  • the rendering thread can determine the number of idle worker threads, and split the rendering task into multiple subtasks based on the number of idle worker threads, wherein the number of multiple subtasks is the same as the number of idle worker threads.
  • the number of cores of a multi-core CPU is 16, and correspondingly 16 worker threads can be created. Therefore, the number of multiple subtasks is the same as the number of idle worker threads, but not more than 16 subtasks.
  • the rendering thread splits the rendering task into multiple subtasks based on the number of cores of the multi-core CPU, and the number of multiple subtasks is the same as the number of cores of the multi-core CPU. For example, if the number of cores of the multi-core CPU is 16, the number of multiple subtasks is 16.
  • step 202 the rendering thread splits the main rendering command buffer object into multiple sub-rendering command buffer objects (Secondary Command Buffer), and each sub-rendering command buffer object is used to buffer a subtask.
  • each sub-rendering instruction buffer object has a split order
  • the recording or reading of each sub-rendering instruction buffer object is independent of each other, which facilitates parallel recording or parallel reading of each sub-rendering instruction buffer object.
  • the split number of sub-rendering instruction buffer objects is the same as the split number of subtasks, and the sub-rendering instruction buffer objects correspond to subtasks one by one.
  • step 203 the rendering thread assigns multiple subtasks to multiple worker threads in one-to-one correspondence, so that multiple worker threads record multiple subtasks into corresponding sub-rendering instruction buffer objects in parallel.
  • the worker thread is not a resident thread. After the worker thread finishes processing the corresponding subtasks, the worker thread can be released.
  • the rendering thread assigns multiple subtasks to multiple worker threads one by one. Since each subtask includes the same number of rendering instructions, the time consumed by multiple worker threads to process multiple subtasks in parallel is similar or the same. From processing In terms of time, the parallelism of the multi-core CPU can be improved, and there will be no phenomenon that a worker thread needs to wait or wait for a long time for other worker threads after processing the corresponding subtask.
  • the embodiments of the present disclosure may use multiple threads ( The number is greater than 3) for parallel processing. Compared with typical multi-thread optimization, only 3 threads are used for parallel processing. From the number of parallel threads, the parallelism of multi-core CPU can be improved.
  • the rendering thread wakes up the device thread after determining that the recording of a rendering frame is completed; wherein the rendering frame includes at least one rendering task.
  • the device thread After the device thread is woken up, it reads the rendering tasks sequentially.
  • the device thread assigns multiple sub-rendering instruction buffer objects corresponding to the rendering task to multiple worker threads one by one, so that multiple worker threads can submit subtasks in multiple sub-rendering instruction buffer objects to the graphics in parallel.
  • Processor Graphics Processing Unit, GPU.
  • the device thread determines that submitting a rendering frame is completed, it notifies the graphics processor to perform rendering.
  • Fig. 3 is a rendering task processing method provided by an embodiment of the present disclosure, and the rendering task processing method includes the following steps 301 to 310.
  • step 301 the rendering thread determines the number of idle worker threads.
  • the number of cores of a multi-core CPU is 16, and 6 worker threads are being used, so the rendering thread determines that the number of idle worker threads is 10.
  • step 302 the rendering thread sequentially splits all rendering instructions included in a rendering task into multiple segments, and the number of segments is the same as the number of idle worker threads.
  • Each rendering instruction is a subtask.
  • a rendering task includes 100 rendering instructions, and the number of idle worker threads is 10, then the rendering thread will divide the 100 rendering instructions into 10 segments in sequence: instruction numbers 1 to 10 are the first segment, instruction numbers 11 to 10 20 is the second paragraph, and so on.
  • step 303 the rendering thread splits the main rendering instruction buffer object into multiple sub-rendering instruction buffer objects.
  • Each sub-rendering instruction buffer object is used to buffer a subtask, that is, buffer 10 rendering instructions.
  • step 304 the rendering thread assigns the multiple rendering instructions to multiple idle worker threads in one-to-one correspondence, so that the multiple idle worker threads record the multiple rendering instructions into corresponding sub-rendering instruction buffer objects in parallel.
  • the rendering thread assigns instruction numbers 1 to 10 to the first idle worker thread, assigns instruction numbers 11 to 20 to the second idle worker thread, and so on, 10 idle worker threads record 10 rendering instructions in parallel to The corresponding sub-rendering instruction buffer object.
  • Fig. 4 is a rendering task processing device provided by an embodiment of the present disclosure, wherein a rendering task includes multiple rendering instructions, and a main rendering instruction buffer object is used to buffer a rendering task, and the rendering task processing device includes but is not limited to: The first splitting module 41 , the second splitting module 42 and the recording module 43 .
  • the first splitting module 41 is configured such that, for a rendering task, the rendering thread splits the rendering task into multiple subtasks, and each subtask includes the same number of rendering instructions;
  • the second splitting module 42 is configured as a rendering thread to split the main rendering instruction buffer object into multiple sub-rendering instruction buffer objects, and each sub-rendering instruction buffer object is used to buffer a sub-task;
  • the recording module 43 is configured so that the rendering thread assigns multiple subtasks to multiple worker threads in one-to-one correspondence, so that multiple worker threads record multiple subtasks into corresponding sub-rendering instruction buffer objects in parallel.
  • the first splitting module 41 is configured to: the rendering thread determines the number of idle worker threads, and splits the rendering task into multiple subtasks based on the number of idle worker threads, and the number of multiple subtasks is related to the number of idle worker threads.
  • the number of threads is the same; or, the rendering thread splits the rendering task into multiple subtasks based on the number of cores of the multi-core CPU, and the number of multiple subtasks is the same as the number of cores of the multi-core CPU.
  • the rendering task processing device also includes the following modules not shown in FIG. 4:
  • the wake-up module is configured to wake up the device thread after the rendering thread determines that the recording of a rendering frame is completed; wherein, the rendering frame includes at least one rendering task.
  • the reading module is configured to sequentially read rendering tasks after the device thread is woken up.
  • the submission module is configured such that, for a rendering task, the device thread assigns multiple sub-rendering instruction buffer objects corresponding to the rendering task to multiple worker threads one by one, so that multiple worker threads parallelize the multiple sub-rendering instruction buffer objects subtasks are submitted to the GPU.
  • the rendering task processing device also includes the following modules not shown in FIG. 4:
  • the notification module is configured to notify the graphics processor to perform rendering after the device thread determines that submitting a rendering frame is completed.
  • FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the electronic device can be a portable mobile device such as a smart phone, a notebook computer, or a tablet computer; it can also be a fixed device such as a desktop computer or a smart TV.
  • the electronic device includes: at least one processor 51 , at least one memory 52 and at least one communication interface 53 .
  • Various components in the electronic device are coupled together via a bus system 54 .
  • the communication interface 53 is used for information transmission with external devices. Understandably, the bus system 54 is used to realize connection communication between these components.
  • the bus system 54 also includes a power bus, a control bus and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus system 54 in FIG.
  • the memory 52 in this embodiment may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories.
  • the memory 52 stores the following elements, executable modules or data structures, or a subset thereof, or an extended set thereof: an operating system and an application program.
  • the operating system includes various system programs, such as framework layer, core library layer, driver layer, etc., for realizing various basic tasks and processing hardware-based tasks.
  • the application program includes various application programs, such as a media player (Media Player), a browser (Browser), etc., and is used to implement various application tasks.
  • the memory 52 is used to store computer programs that can run on the processor 51 .
  • the processor 51 executes the computer program, it is configured to:
  • the rendering thread splits the rendering task into multiple subtasks, and each subtask includes the same number of rendering instructions; the rendering thread splits the main rendering instruction buffer object into multiple sub rendering instruction buffer objects, and each sub rendering instruction buffer object The object is used to buffer a subtask; the rendering thread assigns multiple subtasks to multiple worker threads in one-to-one correspondence, so that multiple worker threads can record multiple subtasks into corresponding subrendering instruction buffer objects in parallel.
  • the processor 51 when the processor 51 executes the computer program, it is configured to: the rendering thread determines the number of idle worker threads, and splits the rendering task into multiple subtasks based on the number of idle worker threads, and the number of multiple subtasks is related to the number of idle worker threads.
  • the number of working threads is the same; or, the rendering thread splits the rendering task into multiple subtasks based on the number of cores of the multi-core CPU, and the number of multiple subtasks is the same as the number of cores of the multi-core CPU.
  • the processor 51 when the processor 51 executes the computer program, it is configured to: wake up the device thread after the rendering thread determines that recording a rendering frame is completed; wherein, the rendering frame includes at least one rendering task; task; for a rendering task, the device thread assigns multiple sub-rendering instruction buffer objects corresponding to the rendering task to multiple worker threads one by one, so that multiple worker threads can submit subtasks in multiple sub-rendering instruction buffer objects in parallel to the graphics processor.
  • the processor 51 when the processor 51 executes the computer program, it is configured to: after the device thread determines that submitting a rendering frame is completed, notify the graphics processor to perform rendering.
  • the processor 51 may be an integrated circuit chip, which has a signal processing capability.
  • the processor 51 can be a general processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable Logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • An embodiment of the present disclosure also proposes a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it can realize:
  • the rendering thread splits the rendering task into multiple subtasks, and each subtask includes the same number of rendering instructions; the rendering thread splits the main rendering instruction buffer object into multiple sub rendering instruction buffer objects, and each sub rendering instruction buffer object The object is used to buffer a subtask; the rendering thread assigns multiple subtasks to multiple worker threads in one-to-one correspondence, so that multiple worker threads can record multiple subtasks into corresponding subrendering instruction buffer objects in parallel.
  • the computer readable storage medium is a non-transitory computer readable storage medium.
  • the rendering thread determines the number of idle worker threads, and splits the rendering task into multiple subtasks based on the number of idle worker threads, and the number of multiple subtasks is related to the number of idle worker threads. or the rendering thread splits the rendering task into multiple subtasks based on the number of cores of the multi-core CPU, and the number of multiple subtasks is the same as the number of cores of the multi-core CPU.
  • the computer program when executed by the processor, it is implemented: after the rendering thread determines that the recording of a rendering frame is completed, the device thread is awakened; wherein, the rendering frame includes at least one rendering task; after the device thread is awakened, the rendering task is sequentially read; For a rendering task, the device thread assigns multiple sub-rendering instruction buffer objects corresponding to the rendering task to multiple worker threads one by one, so that multiple worker threads can submit subtasks in multiple sub-rendering instruction buffer objects to the graphics in parallel. processor.
  • the computer program is further implemented when the processor executes: after the device thread determines that submitting a rendering frame is completed, it notifies the graphics processor to perform rendering.
  • An embodiment of the present disclosure also proposes a computer program product, wherein the computer program product includes a computer program, the computer program is stored in a computer-readable storage medium, and at least one processor of the computer reads and executes the computer program from the storage medium , so that the computer executes the steps in the embodiments of the rendering task processing method described above.
  • the computer program product includes a computer program
  • the computer program is stored in a computer-readable storage medium
  • at least one processor of the computer reads and executes the computer program from the storage medium , so that the computer executes the steps in the embodiments of the rendering task processing method described above.
  • the computer program product can write program codes for executing the operations of the embodiments of the present disclosure in any combination of one or more programming languages, and the programming languages include object-oriented programming languages, such as Java, C++, etc., and Included are conventional procedural programming languages, such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server to execute.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Generation (AREA)

Abstract

本公开提供了一种渲染任务处理方法、装置、电子设备及存储介质。本公开的至少一个实施例中,针对渲染指令进行并行化处理,通过将一个渲染任务拆分为包括相同数量的渲染指令的子任务,并将主渲染指令缓冲对象拆分为多个子渲染指令缓冲对象,每个子渲染指令缓冲对象用于缓冲一个子任务,进而将多个子任务一一对应分配到多个工作线程中,以使多个工作线程并行地将多个子任务录制到对应的子渲染指令缓冲对象中,从处理时间上可以提高多核CPU的并行度;另外,子任务的拆分数量可以大于渲染任务的种类数量,而每个子任务分配到一个工作线程中,多个工作线程并行处理多个子任务,从并行线程数量上可以提高多核CPU的并行度。

Description

一种渲染任务处理方法、装置、电子设备及存储介质 技术领域
本公开实施例涉及计算机技术领域,具体涉及一种渲染任务处理方法、装置、电子设备及存储介质。
背景技术
在一个典型的渲染帧中,通常包括多个渲染任务(Render Pass),每个渲染任务由一系列渲染指令(Draw Call)组成。渲染任务之间的差异可能包括不同的渲染目的、不同的写入策略和/或不同的读取策略等。
在现代图形接口中,渲染任务的种类应尽可能少,且每个渲染任务中的各渲染指令应录制在一个渲染指令缓冲对象(Command Buffer)中。其中,渲染任务的典型种类划分为:前向渲染任务、后处理渲染任务和用户界面(User Interface,UI)渲染任务。
典型的多线程优化针对渲染任务进行并行化处理,具体地,将不同的渲染任务派发到不同的线程进行录制或读取。但是,由于渲染任务的种类较少,并且不同种类的渲染任务包括的渲染指令的数量差异较大,例如,前向渲染任务包含绝大多数渲染指令,因此,典型的多线程优化的并行度较差,针对渲染任务进行并行化处理的效果较差。
发明内容
第一方面,本公开实施例提供了一种渲染任务处理方法,其中,一个渲染任务包括多个渲染指令,一个主渲染指令缓冲对象用于缓冲一个渲染任务,该渲染任务处理方法包括:
针对一个渲染任务,渲染线程将渲染任务拆分为多个子任务,每个子任务包括相同数量的渲染指令;
渲染线程将主渲染指令缓冲对象拆分为多个子渲染指令缓冲对象,每个子渲染指令缓冲对象用于缓冲一个子任务;
渲染线程将多个子任务一一对应分配到多个工作线程中,以使多个工作线程 并行地将多个子任务录制到对应的子渲染指令缓冲对象中。
第二方面,本公开实施例还提供一种渲染任务处理装置,一个渲染任务包括多个渲染指令,一个主渲染指令缓冲对象用于缓冲一个渲染任务,该渲染任务处理装置包括:
第一拆分模块,被配置为针对一个渲染任务,渲染线程将渲染任务拆分为多个子任务,每个子任务包括相同数量的渲染指令;
第二拆分模块,被配置为渲染线程将主渲染指令缓冲对象拆分为多个子渲染指令缓冲对象,每个子渲染指令缓冲对象用于缓冲一个子任务;
录制模块,被配置为渲染线程将多个子任务一一对应分配到多个工作线程中,以使多个工作线程并行地将多个子任务录制到对应的子渲染指令缓冲对象中。
第三方面,本公开实施例还提供了一种电子设备,包括:
处理器;以及
存储器,用于存储可在处理器上运行的计算机程序,处理器执行计算机程序时被配置为:
针对一个渲染任务,渲染线程将渲染任务拆分为多个子任务,每个子任务包括相同数量的渲染指令;
渲染线程将主渲染指令缓冲对象拆分为多个子渲染指令缓冲对象,每个子渲染指令缓冲对象用于缓冲一个子任务;
渲染线程将多个子任务一一对应分配到多个工作线程中,以使多个工作线程并行地将多个子任务录制到对应的子渲染指令缓冲对象中。
第四方面,本公开实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现如上述的渲染任务处理方法。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其 他的附图。
图1为本公开实施例提供的一种多线程渲染方法的示意性流程图;
图2为本公开实施例提供的一种渲染任务处理方法的示意性流程图;
图3为本公开实施例提供的另一种渲染任务处理方法的示意性流程图;
图4为本公开实施例提供的一种渲染任务处理装置的示意性框图;
图5为本公开实施例提供的一种电子设备的示意性框图。
具体实施方式
为了能够更清楚地理解本公开的上述目的、特征和优点,下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。此处所描述的具体实施例仅仅用于解释本公开,而非对本公开的限定。基于所描述的本公开的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本公开保护的范围。
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。
随着智能硬件的发展,多核CPU(Central Processing Unit,中央处理器)在移动端(例如智能手机)上普及使用。然而,目前很多游戏仍然基于OpenGL(Open Graphics Library,开放图形库)等传统图形API(Application Programming Interface,应用程序编程接口)的渲染框架只支持顺序提交渲染指令,无法通过多线程并行提交,因此,未能充分利用多核CPU的计算能力。
传统的跨平台渲染器中,通过图形接口适配层将不同的硬件图形接口封装为一套统一的接口,其中,图形接口适配层记为GFX(Graphics)适配层或RHI(Render Hardware Interface,渲染硬件接口)适配层。图形接口适配层提供给上层用户统一的图形接口,并适配到底层不同的硬件图形接口上,不同的硬件图形接口例如:低层次的渲染应用程序编程接口(即Metal接口),跨平台的绘图应用程序编程接口(即Vulkan接口),OpenGL三维图形API的子集(OpenGL for Embedded Systems,OpenGL ES),多媒体编程接口(DirectX)等。
本公开实施例提供了一种应用于跨平台渲染器的多线程框架设计,以便利用多核CPU的计算能力,减少多核CPU计算资源的浪费,提升渲染器的整体运行性能,并且提供硬件图形接口封装给上层用户。本公开的至少一个实施例提供了一种多线程渲染方法,通过将渲染器的图形接口适配层基于代理模式(Proxy Pattern)拆分为渲染层和设备层,渲染层至少用于提供硬件图形接口封装以得到对应的封装接口,设备层至少用于提供硬件图形接口。由于采用了代理模式,因此渲染层和设备层的拆分对于上层用户而言是透明的,不改变用户现有的使用习惯。另外,通过渲染器创建渲染线程和设备线程,以便渲染线程实现渲染层的功能,设备线程实现设备层的功能。渲染线程和设备线程可以进行无锁数据交换,通过生产者消费者模式完成消息通信和数据传输。
图1是本公开实施例提供的一种多线程渲染方法,应用于渲染器中,渲染器创建渲染线程和设备线程,其中,渲染线程和设备线程均为常驻主线程。如图1所示,多线程渲染方法可以包括但不限于步骤101至步骤105。
在步骤101中,渲染线程响应封装接口的调用操作,创建封装接口对应的调用消息。
其中,封装接口包括但不限于渲染队列代理接口(QueueAgent)、渲染指令缓冲代理接口(CommandBufferAgent)和渲染设备代理接口(DeviceAgent)等。
其中,封装接口对应的调用消息包括一个硬件图形接口的执行函数和执行函数的至少一个执行参数。
调用封装接口,并非直接执行封装接口,而是由渲染线程通过消息队列(Message Queue)创建封装接口对应的调用消息。
在步骤102中,渲染线程将调用消息录制到消息缓冲队列。
其中,消息缓冲队列(Message Buffer)是一个无锁的队列,渲染线程负责将调用消息录制到(也即写入)消息缓冲队列,设备线程负责从消息缓冲队列中读取调用消息。
渲染线程创建封装接口对应的调用消息后,将调用消息以追加的方式塞入消息缓冲队列(Message Buffer)。
在步骤103中,渲染线程将一个渲染帧录制到消息缓冲队列后,唤醒设备线 程。其中,渲染帧包括至少一个调用消息。
渲染线程判断当前渲染帧中的所有调用消息均录制到消息缓冲队列后,主动唤醒设备线程,由于消息缓冲队列是一个无锁的队列,因此渲染线程在唤醒设备线程后,直接进行下一个渲染帧的录制,而无需等待设备线程对当前渲染帧的读取,相比现有技术中顺序进行渲染帧的录制和读取,本实施例可以将一个渲染帧的处理时间减半,进而提升渲染器的整体运行性能。
例如,渲染线程将第一个渲染帧录制到消息缓冲队列后,唤醒设备线程,并在唤醒设备线程后,将第二个渲染帧录制到消息缓冲队列,这样,设备线程对于第一个渲染帧的读取与渲染线程对第二个渲染帧的录制是同步的,且设备线程和渲染线程可以同时对消息缓冲队列进行访问。
在一些实施例中,渲染线程判断当前渲染帧中的所有调用消息均录制到消息缓冲队列后,可以通知消息缓冲队列完成一个渲染帧的录制,并在完成通知后唤醒设备线程。
在步骤104中,设备线程被唤醒后从消息缓冲队列中读取调用消息。
设备线程被唤醒后从消息缓冲队列中顺序读取调用消息,直至读取消息缓冲队列中的所有调用消息后等待唤醒。
在一些实施例中,消息缓冲队列包括写指针和读指针。设备线程被唤醒后从消息缓冲队列中顺序读取调用消息,直至消息缓冲队列中设备线程的读指针指向渲染线程的写指针的位置,也即,消费消息缓冲队列中所有调用消息后,设备线程进入休眠等待唤醒。
在步骤105中,设备线程基于读取的调用消息调用对应的硬件图形接口。
由于调用消息包括一个硬件图形接口的执行函数和执行函数的至少一个执行参数,因此设备线程基于读取的调用消息可以确定目标执行函数和至少一个目标执行参数,其中,目标执行函数为读取的调用消息所包括的执行函数;至少一个目标执行参数为读取的调用消息所包括的至少一个执行参数。
设备线程基于至少一个目标执行参数执行目标执行函数,以使目标执行函数调用对应的硬件图形接口。其中,硬件图形接口包括但不限于渲染队列接口(Queue)、渲染指令缓冲接口(CommandBufferAgent)和渲染设备接口(Device) 等。
例如,调用消息为渲染线程响应渲染指令缓冲代理接口(CommandBuffer Agent)的调用操作而创建的调用消息,那么设备线程读取该调用消息后,设备线程基于读取的调用消息可以确定目标执行函数和至少一个目标执行参数,进而设备线程可以基于至少一个目标执行参数执行目标执行函数,以使目标执行函数调用对应的渲染指令缓冲接口(CommandBuffer)。
以上多线程渲染方法各实施例将渲染器的图形接口适配层基于代理模式(Proxy Pattern)拆分为了渲染层和设备层,并通过渲染器创建渲染线程和设备线程,以便渲染线程实现渲染层的功能,设备线程实现设备层的功能。考虑到在现代图形接口中,例如Metal接口,Vulkan接口,OpenGL ES,DirectX等,渲染任务的种类应尽可能少,且每个渲染任务中的各渲染指令应录制在一个渲染指令缓冲对象(Command Buffer)中,而典型的多线程优化针对渲染任务进行并行化处理的效果较差。
因此,本公开的至少一个实施例中,并非针对渲染任务进行并行化处理,而是针对渲染指令进行并行化处理,具体地,通过将一个渲染任务拆分为包括相同数量的渲染指令的子任务,并将主渲染指令缓冲对象拆分为多个子渲染指令缓冲对象,每个子渲染指令缓冲对象用于缓冲一个子任务,进而将多个子任务一一对应分配到多个工作线程中,以使多个工作线程并行地将多个子任务录制到对应的子渲染指令缓冲对象中,可以实现在渲染线程中将调用消息的录制(也即写入)派发到多个工作线程,或者在设备线程中将调用消息的读取派发到多个工作线程。其中,一个调用消息对应一个渲染任务或多个渲染任务。
可见,本公开的至少一个实施例中,由于针对渲染指令进行并行化处理,使得多个工作线程并行地将多个子任务录制到对应的子渲染指令缓冲对象中,各子渲染指令缓冲对象的录制或读取互不依赖,且每个子任务包括的渲染指令的数量是相同的。因此,相比于典型的多线程优化针对渲染任务进行并行化处理,由于不同种类的渲染任务包括的渲染指令的数量差异较大,导致针对渲染任务进行并行化处理的效果较差,例如,一个前项渲染任务中可能有100个渲染指令,这100个渲染指令可能是画100个箱子;一个UI渲染任务可能有2个渲染指令, 这2个渲染指令可能是画2个按钮,采用一个任务线程处理这一个前项渲染任务,采用另一个任务线程处理这一个UI渲染任务,显然前者消耗的时间更长。本公开实施例使得多个工作线程并行地将多个包括相同数量的渲染指令的子任务录制到对应的子渲染指令缓冲对象中,从处理时间上可以提高多核CPU的并行度。
另外,由于典型的多线程优化针对渲染任务进行并行化处理,渲染任务的典型种类划分为:前向渲染任务、后处理渲染任务和用户界面(User Interface,UI)渲染任务。因此,典型的多线程优化只采用三个线程并行处理前向渲染任务、后处理渲染任务和用户界面(User Interface,UI)渲染任务。而本公开的至少一个实施例中,将多个子任务一一对应分配到多个工作线程中,由多个工作线程并行处理子任务,子任务的拆分数量可以大于渲染任务的种类数量,而每个子任务分配到一个工作线程中,多个工作线程并行处理多个子任务,因此,本公开实施例可以采用多个线程(数量大于三个)进行并行处理,相比典型的多线程优化只采用三个线程并行处理,从并行线程数量上可以提高多核CPU的并行度。
图2是本公开实施例提供的一种渲染任务处理方法,其中,一个渲染任务包括多个渲染指令,一个主渲染指令缓冲对象(Primary Command Buffer)用于缓冲一个渲染任务,该渲染任务处理方法包括如下步骤201至步骤203。
在步骤201中,针对一个渲染任务,渲染线程将渲染任务拆分为多个子任务,每个子任务包括相同数量的渲染指令。
其中,子任务的拆分数量可以大于渲染任务的种类数量,例如,渲染任务的典型种类划分为:前向渲染任务、后处理渲染任务和用户界面(User Interface,UI)渲染任务,也即,渲染任务的种类数量为3种,而子任务的拆分数量可以大于3个。
在一些实施例中,渲染线程可以确定空闲工作线程的数量,并基于空闲工作线程的数量将渲染任务拆分为多个子任务,其中,多个子任务的数量与空闲工作线程的数量相同。例如,多核CPU的核数为16,对应可以创建16个工作线程,因此,多个子任务的数量与空闲工作线程的数量相同,但不超过16个子任务。
在一些实施例中,渲染线程基于多核CPU的核数将渲染任务拆分为多个子 任务,多个子任务的数量与多核CPU的核数相同。例如,多核CPU的核数为16,则多个子任务的数量为16个。
在步骤202中,渲染线程将主渲染指令缓冲对象拆分为多个子渲染指令缓冲对象(Secondary Command Buffer),每个子渲染指令缓冲对象用于缓冲一个子任务。
虽然各子渲染指令缓冲对象具有拆分顺序,但是各子渲染指令缓冲对象的录制或读取互不依赖,便于各子渲染指令缓冲对象的并行录制或并行读取。子渲染指令缓冲对象的拆分数量与子任务的拆分数量相同,子渲染指令缓冲对象与子任务一一对应。
在步骤203中,渲染线程将多个子任务一一对应分配到多个工作线程中,以使多个工作线程并行地将多个子任务录制到对应的子渲染指令缓冲对象中。
与渲染线程不同,工作线程并非常驻线程,在工作线程处理完对应的子任务后,即可释放该工作线程。渲染线程将多个子任务一一对应分配到多个工作线程中,由于每个子任务包括的渲染指令的数量相同,因此,多个工作线程并行处理多个子任务所消耗的时长相近或相同,从处理时间上可以提高多核CPU的并行度,不会存在某个工作线程处理完对应子任务还需要等待或过长时间等待其他工作线程的现象。
另外,由于子任务的拆分数量可以大于渲染任务的种类数量,而每个子任务分配到一个工作线程中,多个工作线程并行处理多个子任务,因此,本公开实施例可以采用多个线程(数量大于3个)进行并行处理,相比典型的多线程优化只采用3个线程并行处理,从并行线程数量上可以提高多核CPU的并行度。
在一些实施例中,渲染线程确定录制一个渲染帧完成后,唤醒设备线程;其中,渲染帧包括至少一个渲染任务。设备线程被唤醒后顺序读取渲染任务。针对一个渲染任务,设备线程将渲染任务对应的多个子渲染指令缓冲对象一一对应分配到多个工作线程,以使多个工作线程并行地将多个子渲染指令缓冲对象中的子任务提交至图形处理器(Graphics Processing Unit,GPU)。在一些实施例中,设备线程确定提交一个渲染帧完成后,通知图形处理器进行渲染。
图3是本公开实施例提供的一种渲染任务处理方法,该渲染任务处理方法包 括如下步骤301至步骤310。
在步骤301中,渲染线程确定空闲工作线程的数量。
例如,多核CPU的核数为16,有6个工作线程正在使用,那么渲染线程确定空闲工作线程的数量为10。
在步骤302中,渲染线程将一个渲染任务包括的所有渲染指令按顺序拆分成多段,段数与空闲工作线程的数量相同。每一段渲染指令即为一个子任务。
例如,一个渲染任务包括100个渲染指令,空闲工作线程的数量为10,那么渲染线程将这100个渲染指令按照顺序拆分为10段:指令序号1至10为第一段,指令序号11至20为第二段,依次类推。
在步骤303中,渲染线程将主渲染指令缓冲对象拆分为多个子渲染指令缓冲对象。每个子渲染指令缓冲对象用于缓冲一个子任务,也即缓冲10个渲染指令。
在步骤304中,渲染线程将多段渲染指令一一对应分配到多个空闲工作线程中,以使多个空闲工作线程并行地将多段渲染指令录制到对应的子渲染指令缓冲对象中。
例如,渲染线程将指令序号1至10分配在第一个空闲工作线程,将指令序号11至20分配在第二个空闲工作线程,依次类推,10个空闲工作线程并行地将10渲染指令录制到对应的子渲染指令缓冲对象中。
图4是本公开实施例提供的一种渲染任务处理装置,其中,一个渲染任务包括多个渲染指令,一个主渲染指令缓冲对象用于缓冲一个渲染任务,该渲染任务处理装置包括但不限于:第一拆分模块41、第二拆分模块42和录制模块43。
第一拆分模块41,被配置为针对一个渲染任务,渲染线程将渲染任务拆分为多个子任务,每个子任务包括相同数量的渲染指令;
第二拆分模块42,被配置为渲染线程将主渲染指令缓冲对象拆分为多个子渲染指令缓冲对象,每个子渲染指令缓冲对象用于缓冲一个子任务;
录制模块43,被配置为渲染线程将多个子任务一一对应分配到多个工作线程中,以使多个工作线程并行地将多个子任务录制到对应的子渲染指令缓冲对象中。
在一些实施例中,第一拆分模块41被配置为:渲染线程确定空闲工作线程 的数量,并基于空闲工作线程的数量将渲染任务拆分为多个子任务,多个子任务的数量与空闲工作线程的数量相同;或者,渲染线程基于多核中央处理器的核数将渲染任务拆分为多个子任务,多个子任务的数量与多核中央处理器的核数相同。
在一些实施例中,该渲染任务处理装置还包括图4中未示出如下模块:
唤醒模块,被配置为渲染线程确定录制一个渲染帧完成后,唤醒设备线程;其中,渲染帧包括至少一个渲染任务。
读取模块,被配置为设备线程被唤醒后顺序读取渲染任务。
提交模块,被配置为针对一个渲染任务,设备线程将渲染任务对应的多个子渲染指令缓冲对象一一对应分配到多个工作线程,以使多个工作线程并行地将多个子渲染指令缓冲对象中的子任务提交至图形处理器。
在一些实施例中,该渲染任务处理装置还包括图4中未示出如下模块:
通知模块,被配置为设备线程确定提交一个渲染帧完成后,通知图形处理器进行渲染。
以上各实施例公开的渲染任务处理装置的技术细节可参考前述渲染任务处理方法各实施例,为避免重复,此处不再赘述。
图5为本公开实施例提供的一种电子设备的结构示意图。电子设备可以为智能手机、笔记本电脑、平板电脑等便携移动式设备;也可以为台式计算机、智能电视等固定式设备。如图5所示,电子设备包括:至少一个处理器51、至少一个存储器52和至少一个通信接口53。电子设备中的各个组件通过总线系统54耦合在一起。通信接口53,用于与外部设备之间的信息传输。可理解地,总线系统54用于实现这些组件之间的连接通信。总线系统54除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但为了清楚说明起见,在图5中将各种总线都标为总线系统54。
可以理解,本实施例中的存储器52可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。
在一些实施方式中,存储器52存储了如下的元素,可执行模块或者数据结构,或者他们的子集,或者他们的扩展集:操作系统和应用程序。
其中,操作系统,包含各种系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础任务以及处理基于硬件的任务。应用程序,包含各种应用程序,例如媒体播放器(Media Player)、浏览器(Browser)等,用于实现各种应用任务。
在一些实施方式中,存储器52用于存储可在处理器51上运行的计算机程序。处理器51执行所述计算机程序时被配置为:
针对一个渲染任务,渲染线程将渲染任务拆分为多个子任务,每个子任务包括相同数量的渲染指令;渲染线程将主渲染指令缓冲对象拆分为多个子渲染指令缓冲对象,每个子渲染指令缓冲对象用于缓冲一个子任务;渲染线程将多个子任务一一对应分配到多个工作线程中,以使多个工作线程并行地将多个子任务录制到对应的子渲染指令缓冲对象中。
在一些实施例中,处理器51执行计算机程序时被配置为:渲染线程确定空闲工作线程的数量,并基于空闲工作线程的数量将渲染任务拆分为多个子任务,多个子任务的数量与空闲工作线程的数量相同;或者,渲染线程基于多核中央处理器的核数将渲染任务拆分为多个子任务,多个子任务的数量与多核中央处理器的核数相同。
在一些实施例中,处理器51执行计算机程序时被配置为:渲染线程确定录制一个渲染帧完成后,唤醒设备线程;其中,渲染帧包括至少一个渲染任务;设备线程被唤醒后顺序读取渲染任务;针对一个渲染任务,设备线程将渲染任务对应的多个子渲染指令缓冲对象一一对应分配到多个工作线程,以使多个工作线程并行地将多个子渲染指令缓冲对象中的子任务提交至图形处理器。
在一些实施例中,处理器51执行计算机程序时被配置为:设备线程确定提交一个渲染帧完成后,通知图形处理器进行渲染。
处理器51可以是一种集成电路芯片,具有信号的处理能力。处理器51可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任 何常规的处理器等。
本公开实施例还提出一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现:
针对一个渲染任务,渲染线程将渲染任务拆分为多个子任务,每个子任务包括相同数量的渲染指令;渲染线程将主渲染指令缓冲对象拆分为多个子渲染指令缓冲对象,每个子渲染指令缓冲对象用于缓冲一个子任务;渲染线程将多个子任务一一对应分配到多个工作线程中,以使多个工作线程并行地将多个子任务录制到对应的子渲染指令缓冲对象中。
在一些实施例中,计算机可读存储介质为非暂态计算机可读存储介质。
在一些实施例中,计算机程序被处理器执行时实现:渲染线程确定空闲工作线程的数量,并基于空闲工作线程的数量将渲染任务拆分为多个子任务,多个子任务的数量与空闲工作线程的数量相同;或者,渲染线程基于多核中央处理器的核数将渲染任务拆分为多个子任务,多个子任务的数量与多核中央处理器的核数相同。
在一些实施例中,计算机程序被处理器执行时实现:渲染线程确定录制一个渲染帧完成后,唤醒设备线程;其中,渲染帧包括至少一个渲染任务;设备线程被唤醒后顺序读取渲染任务;针对一个渲染任务,设备线程将渲染任务对应的多个子渲染指令缓冲对象一一对应分配到多个工作线程,以使多个工作线程并行地将多个子渲染指令缓冲对象中的子任务提交至图形处理器。
在一些实施例中,计算机程序还被处理器执行时实现:设备线程确定提交一个渲染帧完成后,通知图形处理器进行渲染。
本公开实施例还提出一种计算机程序产品,其中,该计算机程序产品包括计算机程序,该计算机程序存储在计算机可读存储介质中,计算机的至少一个处理器从存储介质读取并执行该计算机程序,使得计算机执行如前述渲染任务处理方法各实施例的步骤,为避免重复描述,在此不再赘述。
其中,计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例操作的程序代码,程序设计语言包括面向对象的程序设计语言,诸如Java、C++等,还包括常规的过程式程序设计语言,诸如“C”语言或类似 的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本公开的范围之内并且形成不同的实施例。本领域的技术人员能够理解,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
虽然结合附图描述了本公开的实施方式,但是本领域技术人员可以在不脱离本公开的精神和范围的情况下做出各种修改和变型,这样的修改和变型均落入由所附权利要求所限定的范围之内。

Claims (13)

  1. 一种渲染任务处理方法,其中,一个所述渲染任务包括多个渲染指令,一个主渲染指令缓冲对象用于缓冲一个所述渲染任务,该方法包括:
    针对一个所述渲染任务,渲染线程将所述渲染任务拆分为多个子任务,每个所述子任务包括相同数量的渲染指令;
    所述渲染线程将所述主渲染指令缓冲对象拆分为多个子渲染指令缓冲对象,每个所述子渲染指令缓冲对象用于缓冲一个所述子任务;
    所述渲染线程将所述多个子任务一一对应分配到多个工作线程中,以使所述多个工作线程并行地将所述多个子任务录制到对应的子渲染指令缓冲对象中。
  2. 根据权利要求1所述的方法,其中,所述渲染线程将所述渲染任务拆分为多个子任务包括:
    所述渲染线程确定空闲工作线程的数量,并基于所述空闲工作线程的数量将所述渲染任务拆分为多个子任务,所述多个子任务的数量与所述空闲工作线程的数量相同;
    或者,所述渲染线程基于多核中央处理器的核数将所述渲染任务拆分为多个子任务,所述多个子任务的数量与所述多核中央处理器的核数相同。
  3. 根据权利要求1所述的方法,其中,所述方法还包括:
    所述渲染线程确定录制一个渲染帧完成后,唤醒设备线程;其中,所述渲染帧包括至少一个渲染任务;
    所述设备线程被唤醒后顺序读取渲染任务;
    针对一个所述渲染任务,所述设备线程将所述渲染任务对应的多个子渲染指令缓冲对象一一对应分配到多个工作线程,以使所述多个工作线程并行地将所述多个子渲染指令缓冲对象中的子任务提交至图形处理器。
  4. 根据权利要求3所述的方法,其中,所述方法还包括:
    所述设备线程确定提交一个渲染帧完成后,通知所述图形处理器进行渲染。
  5. 一种渲染任务处理装置,其中,一个所述渲染任务包括多个渲染指令,一个主渲染指令缓冲对象用于缓冲一个所述渲染任务,该装置包括:
    第一拆分模块,被配置为针对一个所述渲染任务,渲染线程将所述渲染任务拆分为多个子任务,每个所述子任务包括相同数量的渲染指令;
    第二拆分模块,被配置为所述渲染线程将所述主渲染指令缓冲对象拆分为多个子渲染指令缓冲对象,每个所述子渲染指令缓冲对象用于缓冲一个所述子任务;
    录制模块,被配置为所述渲染线程将所述多个子任务一一对应分配到多个工作线程中,以使所述多个工作线程并行地将所述多个子任务录制到对应的子渲染指令缓冲对象中。
  6. 根据权利要求5所述的装置,其中,所述第一拆分模块被配置为:
    所述渲染线程确定空闲工作线程的数量,并基于所述空闲工作线程的数量将所述渲染任务拆分为多个子任务,所述多个子任务的数量与所述空闲工作线程的数量相同;
    或者,所述渲染线程基于多核中央处理器的核数将所述渲染任务拆分为多个子任务,所述多个子任务的数量与所述多核中央处理器的核数相同。
  7. 根据权利要求5所述的装置,其中,所述装置还包括:
    唤醒模块,被配置为所述渲染线程确定录制一个渲染帧完成后,唤醒设备线程;其中,所述渲染帧包括至少一个渲染任务;
    读取模块,被配置为所述设备线程被唤醒后顺序读取渲染任务;
    提交模块,被配置为针对一个所述渲染任务,所述设备线程将所述渲染任务对应的多个子渲染指令缓冲对象一一对应分配到多个工作线程,以使所述多个工作线程并行地将所述多个子渲染指令缓冲对象中的子任务提交至图形处理器。
  8. 根据权利要求7所述的装置,其中,所述装置还包括:
    通知模块,被配置为所述设备线程确定提交一个渲染帧完成后,通知所述图形处理器进行渲染。
  9. 一种电子设备,包括:
    处理器;以及
    存储器,用于存储可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时被配置为:
    针对一个渲染任务,渲染线程将所述渲染任务拆分为多个子任务,每个所述子任务包括相同数量的渲染指令;
    所述渲染线程将主渲染指令缓冲对象拆分为多个子渲染指令缓冲对象,每个所述子渲染指令缓冲对象用于缓冲一个所述子任务;
    所述渲染线程将所述多个子任务一一对应分配到多个工作线程中,以使所述多个工作线程并行地将所述多个子任务录制到对应的子渲染指令缓冲对象中。
  10. 根据权利要求9所述的电子设备,其中,所述处理器被配置为:
    所述渲染线程确定空闲工作线程的数量,并基于所述空闲工作线程的数量将所述渲染任务拆分为多个子任务,所述多个子任务的数量与所述空闲工作线程的数量相同;
    或者,所述渲染线程基于多核中央处理器的核数将所述渲染任务拆分为多个子任务,所述多个子任务的数量与所述多核中央处理器的核数相同。
  11. 根据权利要求9所述的电子设备,其中,所述处理器还被配置为:
    所述渲染线程确定录制一个渲染帧完成后,唤醒设备线程;其中,所述渲染帧包括至少一个渲染任务;
    所述设备线程被唤醒后顺序读取渲染任务;
    针对一个所述渲染任务,所述设备线程将所述渲染任务对应的多个子渲染指令缓冲对象一一对应分配到多个工作线程,以使所述多个工作线程并行地将所述多个子渲染指令缓冲对象中的子任务提交至图形处理器。
  12. 根据权利要求11所述的电子设备,其中,所述处理器还被配置为:
    所述设备线程确定提交一个渲染帧完成后,通知所述图形处理器进行渲染。
  13. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至4中任一所述的渲染任务处理方法。
PCT/CN2021/120797 2021-09-26 2021-09-26 一种渲染任务处理方法、装置、电子设备及存储介质 WO2023044877A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/120797 WO2023044877A1 (zh) 2021-09-26 2021-09-26 一种渲染任务处理方法、装置、电子设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/120797 WO2023044877A1 (zh) 2021-09-26 2021-09-26 一种渲染任务处理方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023044877A1 true WO2023044877A1 (zh) 2023-03-30

Family

ID=85719872

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/120797 WO2023044877A1 (zh) 2021-09-26 2021-09-26 一种渲染任务处理方法、装置、电子设备及存储介质

Country Status (1)

Country Link
WO (1) WO2023044877A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680086A (zh) * 2023-07-25 2023-09-01 联通沃音乐文化有限公司 一种基于离线渲染引擎的调度管理系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104052803A (zh) * 2014-06-09 2014-09-17 国家超级计算深圳中心(深圳云计算中心) 一种去中心化的分布式渲染方法及渲染系统
CN109213607A (zh) * 2017-06-30 2019-01-15 武汉斗鱼网络科技有限公司 一种多线程渲染的方法和装置
CN109389666A (zh) * 2018-09-29 2019-02-26 吉林动画学院 分布式实时渲染装置及方法
CN110443880A (zh) * 2019-08-08 2019-11-12 Oppo广东移动通信有限公司 图像渲染方法、装置、存储介质及电子设备
US20190378236A1 (en) * 2018-06-08 2019-12-12 Honeywell International Inc. System and method for distributed processing of graphic server components
CN113220419A (zh) * 2021-05-17 2021-08-06 宁波汇盒信息技术有限公司 一种基于5g的实时云渲染模拟系统及计算机存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104052803A (zh) * 2014-06-09 2014-09-17 国家超级计算深圳中心(深圳云计算中心) 一种去中心化的分布式渲染方法及渲染系统
CN109213607A (zh) * 2017-06-30 2019-01-15 武汉斗鱼网络科技有限公司 一种多线程渲染的方法和装置
US20190378236A1 (en) * 2018-06-08 2019-12-12 Honeywell International Inc. System and method for distributed processing of graphic server components
CN109389666A (zh) * 2018-09-29 2019-02-26 吉林动画学院 分布式实时渲染装置及方法
CN110443880A (zh) * 2019-08-08 2019-11-12 Oppo广东移动通信有限公司 图像渲染方法、装置、存储介质及电子设备
CN113220419A (zh) * 2021-05-17 2021-08-06 宁波汇盒信息技术有限公司 一种基于5g的实时云渲染模拟系统及计算机存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680086A (zh) * 2023-07-25 2023-09-01 联通沃音乐文化有限公司 一种基于离线渲染引擎的调度管理系统
CN116680086B (zh) * 2023-07-25 2024-04-02 联通沃音乐文化有限公司 一种基于离线渲染引擎的调度管理系统

Similar Documents

Publication Publication Date Title
KR101707289B1 (ko) 그래픽 병렬 처리 유닛에 대한 버퍼 관리
CN105579961B (zh) 数据处理系统及操作方法、用于数据处理系统的硬件单元
JP6199477B2 (ja) ゲストオペレーティングシステムおよび仮想プロセッサとともにハイパーバイザを使用するシステムおよび方法
CN106557367B (zh) 用于为计算资源提供粒度化服务质量的装置、方法和设备
US7631309B2 (en) Methods and system for managing computational resources of a coprocessor in a computing system
US8310492B2 (en) Hardware-based scheduling of GPU work
US10592218B2 (en) Dynamic data and compute resource elasticity
US9658890B2 (en) Runtime agnostic representation of user code for execution with selected execution runtime
US10719970B2 (en) Low latency firmware command selection using a directed acyclic graph
US20090328058A1 (en) Protected mode scheduling of operations
US10585653B2 (en) Declarative programming model with a native programming language
KR102024283B1 (ko) 다중스레드 컴퓨팅
US8522254B2 (en) Programmable integrated processor blocks
AU2014268246A1 (en) Reverting tightly coupled threads in an over-scheduled system
US20210096921A1 (en) Execution Graph Acceleration
JP2017538212A (ja) 中央処理装置(cpu)と補助プロセッサとの間の改善した関数コールバック機構
WO2020063041A1 (zh) 多核处理器的调度方法、装置、终端及存储介质
US20180239652A1 (en) Lightweight thread synchronization using shared memory state
WO2023044877A1 (zh) 一种渲染任务处理方法、装置、电子设备及存储介质
US20130125131A1 (en) Multi-core processor system, thread control method, and computer product
WO2023044876A1 (zh) 一种多线程渲染方法、装置、电子设备及存储介质
CN112445614A (zh) 一种线程数据的存储管理方法、计算机设备及存储介质
CN113032154B (zh) 一种虚拟cpu的调度方法、装置、电子设备及存储介质
CN112114967B (zh) 一种基于服务优先级的gpu资源预留方法
US10776289B2 (en) I/O completion polling for low latency storage device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21957995

Country of ref document: EP

Kind code of ref document: A1