CN113342485A - Task scheduling method, device, graphics processor, computer system and storage medium - Google Patents

Task scheduling method, device, graphics processor, computer system and storage medium Download PDF

Info

Publication number
CN113342485A
CN113342485A CN202110542963.4A CN202110542963A CN113342485A CN 113342485 A CN113342485 A CN 113342485A CN 202110542963 A CN202110542963 A CN 202110542963A CN 113342485 A CN113342485 A CN 113342485A
Authority
CN
China
Prior art keywords
task
shader
tasks
scheduling
stage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110542963.4A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongtian Xingxing Shanghai Technology Co ltd
Original Assignee
Zhongtian Xingxing Shanghai Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongtian Xingxing Shanghai Technology Co ltd filed Critical Zhongtian Xingxing Shanghai Technology Co ltd
Priority to CN202110542963.4A priority Critical patent/CN113342485A/en
Publication of CN113342485A publication Critical patent/CN113342485A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Generation (AREA)

Abstract

The application provides a task scheduling method, a device, a graphic processor, a computer system and a storage medium, wherein the task scheduling method comprises the following steps: monitoring the input and output of each stage of shader in the rendering pipeline; acquiring the in-out balance information of each stage of shader based on the input and the output of each stage of shader; and scheduling tasks of all shaders based on all the access balance information. By considering the balance condition of the downstream of the rendering pipeline, the computing resource and the storage resource of the whole image processor can be more effectively utilized; on the other hand, unnecessary blocking of the pipeline is also reduced.

Description

Task scheduling method, device, graphics processor, computer system and storage medium
Technical Field
The present application relates to the field of integrated circuit design technologies, and in particular, to a task scheduling method and apparatus, a graphics processor, a computer system, and a storage medium.
Background
With the popularization of Unified rendering architecture (Unified Shader) of a Graphics Processing Unit (GPU), task management and scheduling of a Shader (Shader Pipeline) become particularly important. Due to the fact that thousands of software or tens of hardware threads (threads) can be accommodated in a Stream processor set (streams, SPs), and the diversity of Shader pipelines, it becomes extremely complicated to select an appropriate Thread type from numerous waiting tasks to allocate resources.
The traditional scheduling of shaders is performed based on Push Model, for example, in a rendering pipeline of a DirectX 11 program with mesh subdivision, the shaders are rendered in the order of Vertex Shader (Vertex Shader), Hull Shader (Hull Shader), Domain Shader (Domain Shader), Geometry Shader (Geometry Shader), and Pixel Shader (Pixel Shader), and each Shader processes a corresponding task in a respective task queue, for example, the Vertex Shader processes a Vertex shading task. When task allocation is performed on each shader, the sequence of tasks is consistent with the sequence of shaders in the production line, so that resources such as computing resources and storage resources corresponding to downstream shaders are wasted, and the efficiency is reduced; moreover, the probability of pipeline blocking is also increased, leading to further efficiency reductions.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, it is an object of the present application to provide a task scheduling method, apparatus, graphics processor, computer system, and storage medium, thereby solving the problems of the prior art.
To achieve the above and other related objects, the present application provides a task scheduling method, including: monitoring the input and output of each stage of shader in the rendering pipeline; acquiring the in-out balance information of each stage of shader based on the input and the output of each stage of shader; and scheduling tasks of all shaders based on all the access balance information.
In some embodiments of the present application, said scheduling tasks for respective shaders based on respective said access balance information comprises: when the access balance information indicates balance, setting task priorities from high to low according to the sequence of the queuing time of the tasks; or, for each task with the same queuing time, the task priorities from high to low are set according to the front-rear order between shaders corresponding to the task.
In some embodiments of the present application, the task scheduling method further includes: monitoring the use state information of the computing resources distributed by each stage of shader; the scheduling tasks for each shader based on each of the access balance information further comprises: when the balance in and out information indicates balance, scheduling the task of each shader according to the matching relation between the use state of the computing resource and the task requirement; wherein the matching relationship comprises: the usage rate in the usage state of the computing resource is positively correlated with the amount of delay in the task demand, and/or the remaining amount in the usage state of the computing resource is positively correlated with the amount of consumption in the task demand.
In some embodiments of the present application, said scheduling the task of each shader according to the matching relationship between the usage state of the computing resource and the task delay includes: determining the task demand type of each stage of shader according to the use state of the computing resource; wherein each of the task demand types corresponds to a combination of task latency and a state of use of computing resources; and scheduling the corresponding tasks according to the task demand types of the shaders at all levels.
In some embodiments of the present application, the task requirement type includes at least one of: a first task demand type that demands tasks having a latency greater than or equal to a preset latency threshold and a computational resource usage less than a preset resource usage threshold; a second task demand type that demands tasks having a latency greater than or equal to a preset latency threshold and a computational resource usage greater than or equal to a preset resource usage threshold; a third task demand type that demands tasks having a latency below a preset latency threshold and a calculated resource usage below a preset resource usage threshold; a fourth task demand type that demands tasks having a latency below a preset latency threshold and a computational resource usage above or equal to the preset resource usage threshold.
To achieve the above and other related objects, there is provided a task scheduling apparatus including: the first monitoring module is used for monitoring the input and the output of each stage of shader in the rendering pipeline; and the task scheduling module is used for acquiring the access balance information of each stage of shader based on the input and the output of each stage of shader and scheduling the task of each shader based on each access balance information.
In some embodiments of the present application, the task scheduling module includes: the task priority setting module is used for setting task priorities from high to low correspondingly according to the sequence of the queuing time of the tasks when the access balance information indicates balance; or, when the access balancing information indicates balancing, the task priority from high to low is set according to the front-rear order between shaders corresponding to the tasks with the same queuing time.
In some embodiments of the present application, the task scheduling apparatus further includes: the second monitoring module is used for monitoring the use state information of the computing resources distributed by each stage of shader; the task scheduling module is used for scheduling the tasks of the shaders according to the matching relation between the use states of the computing resources and the task requirements when the balance in and out information indicates balance; wherein the matching relationship comprises: the usage rate in the usage state of the computing resource is positively correlated with the amount of delay in the task demand, and/or the remaining amount in the usage state of the computing resource is positively correlated with the amount of consumption in the task demand.
In some embodiments of the present application, the task scheduling module comprises: the demand type obtaining module is used for determining the task demand type of each stage of shader according to the use state of the computing resource; wherein each of the task demand types corresponds to a combination of task latency and a state of use of computing resources; and the requirement type scheduling module is used for scheduling the tasks which are consistent with the task requirement types of the corresponding shaders at all levels.
In some embodiments of the present application, the task requirement type includes at least one of: a first task demand type that demands tasks having a latency greater than or equal to a preset latency threshold and a computational resource usage less than a preset resource usage threshold; a second task demand type that demands tasks having a latency greater than or equal to a preset latency threshold and a computational resource usage greater than or equal to a preset resource usage threshold; a third task demand type that demands tasks having a latency below a preset latency threshold and a calculated resource usage below a preset resource usage threshold; a fourth task demand type that demands tasks having a latency below a preset latency threshold and a computational resource usage above or equal to the preset resource usage threshold.
To achieve the above and other related objects, there is provided a graphics processor including: computing resources; the monitoring unit is used for monitoring the input and the output of each stage of shader; the task scheduling unit is used for acquiring the access balance information of each stage of shader based on the input and the output of each stage of shader and scheduling the task of each shader based on each access balance information; and the thread scheduler is used for allocating computing resources corresponding to the scheduled tasks.
In some embodiments of the present application, the graphics processor comprises: a storage resource comprising: the buffer area is used for connecting the output and the input between all stages of shaders; the monitoring unit is also used for monitoring the use state information of the computing resources distributed by each stage of the shader; the task scheduling unit is further configured to schedule the tasks of the shaders according to the matching relationship between the use states of the computing resources and the task requirements when the access balance information indicates balance; wherein the matching relationship comprises: the usage rate in the usage state of the computing resource is positively correlated with the amount of delay in the task demand, and/or the remaining amount in the usage state of the computing resource is positively correlated with the amount of consumption in the task demand.
To achieve the above and other related objects, the present application provides a computer system including any one of the processors described above.
To achieve the above and other related objects, the present application provides a computer-readable storage medium storing program instructions; the program instructions, when executed, perform any of the task scheduling methods.
In summary, the present application provides a task scheduling method, a task scheduling apparatus, a graphics processor, a computer system, and a storage medium, where the task scheduling method includes: monitoring the input and output of each stage of shader in the rendering pipeline; acquiring the in-out balance information of each stage of shader based on the input and the output of each stage of shader; and scheduling tasks of all shaders based on all the access balance information. By considering the downstream balancing condition on the rendering pipeline, the computing resources and the storage resources of the whole graphics processor can be more effectively utilized; on the other hand, unnecessary blocking of the pipeline is also reduced.
Drawings
FIG. 1 is a block diagram of an exemplary computer system.
FIG. 2A is a block diagram of an example rendering pipeline.
Fig. 2B is a schematic diagram of a rendering pipeline in another example.
Fig. 2C shows a schematic structural diagram of a rendering pipeline setting monitor based on the example of fig. 2A in the embodiment of the present application.
Fig. 3 shows a flowchart of a task scheduling method in an embodiment of the present application.
Fig. 4 shows a schematic structural diagram of monitoring the input and output of each stage of shader in the rendering pipeline through a monitor in the embodiment of the present application.
Fig. 5 is a flowchart illustrating task scheduling according to access equalization information of shaders in a rendering pipeline according to an embodiment of the present disclosure.
Fig. 6 is a schematic structural diagram of a graphics processor according to an embodiment of the present application.
Fig. 7 is a block diagram of a task scheduling device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present application. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings so that those skilled in the art to which the present application pertains can easily carry out the present application. The present application may be embodied in many different forms and is not limited to the embodiments described herein.
In order to clearly explain the present application, components that are not related to the description are omitted, and the same reference numerals are given to the same or similar components throughout the specification.
Throughout the specification, when a device is referred to as being "connected" to another device, this includes not only the case of being "directly connected" but also the case of being "indirectly connected" with another element interposed therebetween. In addition, when a device "includes" a certain component, unless otherwise stated, the device does not exclude other components, but may include other components.
When a device is said to be "on" another device, this may be directly on the other device, but may also be accompanied by other devices in between. When a device is said to be "directly on" another device, there are no other devices in between.
Although the terms first, second, etc. may be used herein to describe various elements in some instances, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first signal interface and a second signal interface, etc. are described. Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a", "an" and "the" include plural forms as long as the words do not expressly indicate a contrary meaning. The term "comprises/comprising" when used in this specification is taken to specify the presence of stated features, regions, integers, steps, operations, elements, and/or components, but does not exclude the presence or addition of other features, regions, integers, steps, operations, elements, and/or components.
Terms representing relative spatial terms such as "lower", "upper", and the like may be used to more readily describe one element's relationship to another element as illustrated in the figures. Such terms are intended to include not only the meanings indicated in the drawings, but also other meanings or operations of the device in use. For example, if the device in the figures is turned over, elements described as "below" other elements would then be oriented "above" the other elements. Thus, the exemplary terms "under" and "beneath" all include above and below. The device may be rotated 90 or other angles and the terminology representing relative space is also to be interpreted accordingly.
Although not defined differently, including technical and scientific terms used herein, all terms have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. Terms defined in commonly used dictionaries are to be additionally interpreted as having meanings consistent with those of related art documents and the contents of the present prompts, and must not be excessively interpreted as having ideal or very formulaic meanings unless defined.
As shown in fig. 1, a schematic diagram of an exemplary computer system 100 is shown.
In a specific example, the computer system 100 may be implemented in, for example, a server, a smart phone, a tablet computer, a notebook computer, a desktop computer, a set-top box, an e-reader, a smart watch, a smart band, or a network system formed by network connection of a plurality of electronic devices.
The architecture of the possible computer system 100 generally includes a host processor 101 (e.g., a CPU) and a graphics processor 102 (GPU). The main processor 101 is configured with a system Memory 103 (RAM) and a Hard Disk Drive (HDD), and the graphics processor 102 is configured with a display Memory 104 (VRAM).
In some embodiments, the main processor 101 and the graphics processor 102 may be integrated into a single processor chip, the main processor 101 is implemented as at least one processor core in the processor chip, and the graphics processor 102 is implemented as another processor core in the processor chip, and is coupled to the main processor 101 through internal circuitry of the chip; a portion of system memory 103 may be used as display memory 104.
Or, the main processor 101 and the graphics processor 102 are respectively located on different chip carriers, for example, the graphics processor 102 is located on a display card, the main processor 101 is packaged in a processor chip, and the main processor 101 and the display card are connected to a computer Motherboard (Motherboard) and are communicatively connected through a line (e.g., PCIe) on the Motherboard; alternatively, the computer system 100 may include at least two electronic devices communicatively coupled (e.g., remotely via a network) with the host processor 101 being located on one of the electronic devices and the graphics processor 102 being located on the other electronic device, and data may be transferred between the host processor 101 and the graphics processor 102 via the communications.
In some scenarios, the application executed by the host processor 101 may have a display requirement, and may invoke a driver of the graphics processor 102 via the graphics API to issue one or more commands, such as graphics rendering commands, to the graphics processor 102 for rendering one or more graphics primitives (primative) into a displayable graphics image, so as to command the secure graphics processor 102 to perform related graphics operation tasks, so as to finally display the rendered graphics data to a display. Graphics primitive data required for rendering may be loaded from a hard disk into system memory 103 and further loaded by graphics processor 102 from system memory 103 into display memory 104. The definition of the graphics primitive may be, for example, a triangle, a rectangle, a triangle fan, a triangle strip, and the like. The primitive definition may include a vertex specification that specifies one or more vertices associated with the primitive to be rendered. The vertex specification may include location coordinates for each vertex, and in some cases other attributes associated with the vertex, such as color attributes, normal vectors, and texture coordinates. The primitive definition may also include primitive type information (e.g., triangle, rectangle, triangle fan, triangle strip, etc.), scaling information, rotation information, and the like.
In some examples, the application may be a graphics application, an operating system, a portable graphics application, a computer-aided design program for engineering or artistic applications, a video game application, or an application that uses 2D or 3D graphics, among other applications.
Shaders (shaders) may be implemented by programs running on the GPU. Such as various shaders provided in different versions of the program in DirectX. The shaders can be classified into different types according to their uses, such as Vertex shaders (Vertex shaders), Hull shaders (Hull shaders), Domain shaders (Domain shaders), Geometry shaders (Geometry shaders), Pixel shaders (Pixel shaders), Mesh shaders (Mesh shaders), magnification shaders (Amplification shaders), Regeneration shaders (Regeneration shaders), cross shaders (interaction shaders), Callable shaders (call shaders), Hit/Miss shaders (Hit/Miss shaders), and the like.
In order to implement the functions of the shader program, a unified programmable Stream Processor (SP) is used in the GPU hardware to implement various shader operations, such as a vertex shader, a pixel shader, several shaders, and the like. Furthermore, the GPU hardware also provides corresponding storage resources, such as a Cache (Cache) or a display memory 104 that is external to the Cache, corresponding to computing resources such as SPs.
As shown in fig. 2A, a schematic diagram of a rendering pipeline in an example is shown.
In this example, a typical rendering pipeline for a DX11 application with mesh subdivision is shown, in which sequential rendering tasks are performed involving stages (stage) of shaders, such as Vertex Shader 201(Vertex Shader), Hull Shader 202(Hull Shader), Domain Shader 203(Domain Shader), Geometry Shader 204(Geometry Shader), and Pixel Shader 205(Pixel Shader).
The vertex shader 201 is used for performing coordinate system transformation and shading operation on the vertex; the Hull Shader 202 and Domain Shader 203 belong to Tessellation shaders (Hull shaders), the Hull Shader 202(Hull shaders) operates on the vertices of a patch (a type of primitive, referring to blobs or fragments) and specifies how much geometry should be generated from the patch, the Domain Shader 203(Domain shaders) uses Tessellation coordinates to place the generated vertices and send them to the rasterizers, or to the geometry Shader 204 for further processing; the geometry shader 204 may be used to perform Per-Primitive (Per-Primitive) shading operations or to generate more primitives; a pixel shader 205 is used for shading each pixel.
A task queue can be arranged corresponding to each shader, wherein the tasks of each shader are placed in sequence. When tasks are allocated to the shaders of each stage, the tasks are processed according to the precedence order of the shaders in the pipeline, that is, for example, in fig. 2A, the tasks are allocated according to the order from the vertex shader 201 to the pixel shader 205.
As shown in fig. 2B, a schematic diagram of a rendering pipeline in yet another example is shown.
The tasks for a typical DX12 grid shading pipeline are distributed in order from the magnifying shader 211, the grid shader 212, to the pixel shader 213.
Typically, the hardware circuitry of the GPU includes computational resources and memory resources for supporting its operations. The computing resources include one or more Streaming Multiprocessors (SMs), which are core components in the GPU hardware circuitry. Each SM may adopt a Single-Instruction-Multiple (SIMT) architecture, and each SM includes Multiple stream processors (processors, SPs) and certain storage resources (e.g., registers, caches, etc.). The SM may further include a Thread scheduler, which may perform Thread scheduling in units of bundles (warp), where each bundle includes 32 threads (Thread), and each Thread executes the same instruction at the same time. In some examples, each Thread (Thread) may correspond to one (or more) SP.
Commands generated by the graphics display requirements of the application define the shading operations corresponding to each shader, thereby generating the shader's tasks. As described above, the task allocation manner of the push model is adopted in the rendering pipeline, and the processing situation of the downstream post-stage shader is not considered, which causes the waste of the calculation and storage resources allocated by the downstream post-stage shader, and reduces the calculation efficiency; it is also prone to cause pipeline blocking.
In view of the above problems, the embodiments of the present application provide corresponding solutions.
Fig. 3 is a schematic flow chart showing a task scheduling method in the embodiment of the present application.
The task scheduling method comprises the following steps:
step S301: inputs and outputs of shaders at respective stages in a rendering pipeline are monitored.
Fig. 2C is a block diagram illustrating the monitoring principle performed in step S301 according to an embodiment of the present invention. In fig. 2C, taking the rendering pipeline of fig. 2A as an example, monitors a to D are disposed between every two adjacent shaders of the vertex shader 201, the hull shader 202, the domain shader 203, the geometry shader 204, and the pixel shader 205 to collect the input and output of each stage of shaders. The output of the former stage shader is the same as the input of the latter stage shader.
In an embodiment, a Buffer (e.g., FIFO or other Buffer) for linking is disposed between each stage of shaders, and the output of the former stage of shaders is put into the Buffer and read by the latter stage of shaders as input. Accordingly, each monitor can monitor the input and output of each stage of shader by reading the data in the buffer.
Step S302: and acquiring the in-out balance information of each stage of shader based on the input and the output of each stage of shader.
Wherein, the access equalization information is used for indicating equalization conditions between a former stage shader and a latter stage shader. Such as between the output of the vertex shader stage and the output of the hull shader in the example of fig. 4.
In some embodiments, a set of threshold parameters may be set for each stage of shader to be used to evaluate whether the in-out equalization information indicates a balanced or unbalanced state. In a possible example, one or more of the threshold parameters may relate to a difference between an output quantity of a previous stage shader (corresponding to an input quantity of a subsequent stage shader) and an output quantity of a subsequent stage shader, and an imbalance state may be determined when the difference is higher than a certain upper threshold (Max); alternatively, if the difference is lower than a certain lower threshold (Min, such as 0), the state of equilibrium can be determined. Further, there are two cases of imbalance, i.e., input greater than output, or output greater than input, then a set of threshold parameters may be illustratively represented as (-Max, Min, Max).
Step S303: and scheduling tasks of all shaders based on all the access balance information.
In some embodiments, the imbalance is eliminated by adjusting task priorities of tasks corresponding to types of shaders at different levels. For example, if a stage of shader has more inputs than outputs, the task priority is adjusted to reduce the inputs and/or increase the outputs, so that the unbalanced shader finally reaches an unbalanced state. Specifically, the priority of the type of task corresponding to the shader in the stage can be reduced, which indicates that the task allocation of the shader in the stage needs to be reduced; or, the priority of the type of task corresponding to the stage of shader may be raised to indicate that the task to be allocated to the stage of shader needs to be increased.
The priority of the task corresponds to the priority of the computing resource and the storage resource. Therefore, further, the task scheduling unit may send an instruction to the thread scheduler of the GPU, so that the thread scheduler allocates the computing resources and the storage resources of each thread to different tasks in sequence according to the task priority to implement task scheduling. In an alternative example, the thread scheduler may be implemented by a hardware unit.
In some embodiments, when all stages of shaders in the rendering pipeline are in an in-out balance condition (i.e., the input and output of the whole rendering pipeline are balanced), the priority of each task can be set from high to low according to the queuing time precedence order of the tasks. If each task with the same queuing time exists, the task priority from high to low is set according to the front-rear order between shaders corresponding to each task.
In some embodiments, if each stage of shader requires a high task priority, processing may proceed with the initial task priority of each stage of shader.
To minimize the idle waste of computing and storage resources, in some embodiments, the use of computing resources may also be optimized through task scheduling. When the use of computing resources is optimized, the use of coordinated storage resources is also optimized.
Fig. 4 is a schematic flow chart illustrating task scheduling according to access equalization information of shaders in a rendering pipeline according to the embodiment of the present application.
In this embodiment, the process specifically includes:
step S401: and monitoring the use state information of the distributed computing resources of each stage of shader.
Illustratively, the usage state information may include usage of computing resources, amount of remaining resources, and the like. The steps may be performed in real-time to obtain current usage status information of the computing resource at each time.
Step S402: and when the in-out balance information indicates balance, scheduling the task of each shader according to the matching relation between the use state of the computing resource and the task requirement.
Wherein the matching relationship comprises: the usage rate in the usage state of the computing resource is positively correlated with the amount of delay in the task demand, and/or the remaining amount in the usage state of the computing resource is positively correlated with the amount of consumption in the task demand.
For example, in terms of using the utilization rate of the computing resources as a dimension, when the utilization rate of the computing resources becomes a bottleneck that hinders the computing efficiency, a task with high delay is suitably allocated to cover the delay in the task by using the computing delay, that is, the delay is covered by using the positive correlation between the utilization rate of the computing resources and the delay amount of the task; when the computing resource interest rate is low, that is, the idle rate is high, the delay of the task may become a bottleneck that hinders the computing efficiency, and it is suitable to fill the idle computing resource with the task with low delay. I.e., the amount of computing resources remaining and the amount of tasks consumed are positively correlated.
Then, taking the residual resource amount of the computing resources as a dimension, and when the residual amount of the computing resources is sufficient, the method is suitable for allocating tasks with high resource consumption; when the remaining amount of computing resources is low, it is suitable for allocating tasks with low resource consumption.
Thus, in some embodiments, as shown in fig. 5, step S402 may further include:
step S501: determining the task demand type of each stage of shader according to the use state of the computing resource; wherein each of the task demand types corresponds to a combination of task latency and a state of use of computing resources;
in some embodiments, the task requirement type includes at least one of:
a first task demand type, which demands tasks with a latency higher than or equal to a preset latency threshold and a computational resource usage lower than a preset resource usage threshold, i.e. a high latency low resource usage (HDLR) type.
A second task demand type, a task that demands a computational resource usage with a latency above or equal to a preset latency threshold and above or equal to a preset resource usage threshold, i.e. a high latency high resource usage (HDHR) type.
A third task demand type, tasks that demand computational resource usage with a latency below a preset latency threshold and below a preset resource usage threshold, i.e., a low latency high resource usage (LDHR) type.
A fourth task demand type that demands tasks with latency below a preset latency threshold and computational resource usage above or equal to the preset resource usage threshold, i.e., a low latency low resource usage (LDLR) type.
Step S502: and scheduling the corresponding tasks according to the task demand types of the shaders at all levels.
For example, if the utilization rate of the computing resource allocated by a certain shader is high and the remaining amount is low at the present moment, the task requirement type is the first task requirement type, that is, the high latency low resource usage (HDLR) type, the task selected by the shader for high latency low resource usage may be scheduled, and specifically, the task selected may be set to be of a high priority. In addition, by combining the scheduling rules in the previous embodiment, when the access of the whole rendering pipeline is balanced, the task priority is set according to the time sequence, so that the task to be scheduled can be uniquely determined. Illustratively, when scheduling the selected task to the computing resource of the corresponding shader, the selected task may be directly filled (i.e., input) into a computing unit (e.g., the aforementioned SP, SM, etc.) in the computing resource, or a high priority may be set in a task queue for the selected task, etc.
Fig. 6 is a schematic diagram of a graphics processor according to an embodiment of the present disclosure.
In this embodiment, the graphics processor 600 includes: computing resources 601, thread scheduler 602, monitor unit 603, and task scheduler 604.
Illustratively, the computing resources 601 may include: a group of cells 611 (such as the previous SPs or SMs) is calculated.
Illustratively, the graphics processor 600 may also include storage resources 605. Optionally, the storage resource 605 includes: register set 651, shared memory 652, and buffer 653, among others. The shared memory 652 is used for communication between threads; the buffer 653 can be used to concatenate the outputs and inputs between the various stages of shaders in the rendering pipeline.
Illustratively, the register set 651, the computing unit group 611, the shared memory 652, the thread scheduler 602, the buffer 653, and the like may be implemented by hardware circuits. The task scheduling unit 604 and the monitoring unit 603 may be implemented by hardware circuits, or implemented by the GPU running program instructions.
In this embodiment, the monitoring unit 603 may include the monitor shown in fig. 4 to monitor the input and output of each stage of shader; other types of monitors may also be included for monitoring usage status information of the computing resources 601.
Specifically, the monitoring unit 603 obtains input and output between the shaders at different levels according to the data in the buffer 653, provides the input and output to the task scheduling unit 604 to obtain the in-out balance information of the shaders at different levels to determine whether the states are balanced, and can set task priorities for scheduling, and further performs the allocation of the computing resources 601 corresponding to the set priorities of the task tasks through the thread controller 602. Possibly, the computing resource 601 may be shared by tasks of all levels of shaders, so as to be allocated to different tasks for use by the task scheduling unit 604 and the thread scheduler 602 in the order of priority of the corresponding tasks.
The monitoring unit 603 may also determine the task demand type of the corresponding shader by monitoring the usage state information (such as usage rate, remaining amount, and the like) of the computing resource 601, and perform corresponding task priority setting through the task scheduling unit 604 to schedule the task that meets the task demand type for the shader, so as to adapt to the usage state information of the computing unit group 611 (that is, the delay amount of the scheduled and allocated task is positively correlated to the usage rate of the computing resource 601, and the consumption amount of the task is positively correlated to the remaining amount of the computing resource 601), so as to improve the computing efficiency.
Fig. 7 is a block diagram of a task scheduling apparatus provided in an embodiment of the present application. The implementation of this embodiment may refer to the previous embodiment, and details of the technology are not repeated here.
The task scheduling device comprises:
a first monitoring module 701, configured to monitor inputs and outputs of shaders in each stage in a rendering pipeline;
and the task scheduling module 702 is configured to obtain access balance information of each stage of shader based on input and output of each stage of shader, and schedule tasks of each shader based on each access balance information.
In some embodiments, the task scheduling module 702 includes: a task priority setting module 721, configured to set task priorities from high to low according to the sequence of queuing times of the tasks when the access balancing information indicates balancing; or, when the access balancing information indicates balancing, the task priority from high to low is set according to the front-rear order between shaders corresponding to the tasks with the same queuing time.
In some embodiments, the task scheduling device further includes: a second monitoring module 703, configured to monitor usage state information of the computing resources allocated to each stage of shader; the task scheduling module 702 is configured to schedule the task of each shader according to a matching relationship between the use state of the computing resource and the task requirement when each piece of access balancing information indicates balancing; wherein the matching relationship comprises: the usage rate in the usage state of the computing resource is positively correlated with the amount of delay in the task demand, and/or the remaining amount in the usage state of the computing resource is positively correlated with the amount of consumption in the task demand.
In some embodiments, the task scheduling module 702 includes:
a requirement type obtaining module 722, configured to determine a task requirement type of each stage of shader according to a usage state of the computing resource; wherein each of the task demand types corresponds to a combination of task latency and a state of use of computing resources;
the requirement type scheduling module 723 is used for scheduling the tasks according to the task requirement types of the corresponding shaders at different levels.
In some embodiments of the present application, the task requirement type includes at least one of:
a first task demand type that demands tasks having a latency greater than or equal to a preset latency threshold and a computational resource usage less than a preset resource usage threshold;
a second task demand type that demands tasks having a latency greater than or equal to a preset latency threshold and a computational resource usage greater than or equal to a preset resource usage threshold;
a third task demand type that demands tasks having a latency below a preset latency threshold and a calculated resource usage below a preset resource usage threshold;
a fourth task demand type that demands tasks having a latency below a preset latency threshold and a computational resource usage above or equal to the preset resource usage threshold.
It should be understood that the disclosed embodiment of a task scheduler (e.g., fig. 7) is merely illustrative, and that, for example, the division of various modules is merely a logical division, and that in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some interfaces, and may be in an electrical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In the embodiment of the task scheduling device, each functional module may be integrated into one module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a form of hardware, and can also be realized in a form of a functional module formed by matching hardware with software. For example, each functional module in the task scheduler may be implemented entirely by a hardware circuit in the GPU. Or, part of the functional modules in the task scheduling device may be implemented by the GPU running the program instructions, and another part of the functional modules are implemented by hardware circuits, for example, the task scheduling module may be a software module or a hardware circuit module.
In some embodiments, the present application may also provide a computer-readable storage medium storing program instructions; when the program instructions are executed by the GPU, any task scheduling method is executed.
The technical solutions of the present application, or portions thereof that contribute to the prior art, may be embodied in the form of a software product stored in a memory, including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the methods according to the embodiments of the present invention. And the aforementioned memory comprises: various media capable of storing program codes, such as a U disk, a ROM, a RAM, a removable hard disk, a magnetic disk, or an optical disk.
In summary, the present application provides a task scheduling method, a task scheduling apparatus, a graphics processor, a computer system, and a storage medium, where the task scheduling method includes: monitoring the input and output of each stage of shader in the rendering pipeline; acquiring the in-out balance information of each stage of shader based on the input and the output of each stage of shader; and scheduling tasks of all shaders based on all the access balance information. By considering the downstream balancing condition on the rendering pipeline, the computing resources and the storage resources of the whole graphics processor can be more effectively utilized; on the other hand, unnecessary blocking of the pipeline is also reduced.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims (14)

1. A method for task scheduling, comprising:
monitoring the input and output of each stage of shader in the rendering pipeline;
acquiring the in-out balance information of each stage of shader based on the input and the output of each stage of shader;
and scheduling tasks of all shaders based on all the access balance information.
2. The method of claim 1, wherein the scheduling tasks for each shader based on each of the access balance information comprises:
when the access balance information indicates balance, setting task priorities from high to low according to the sequence of the queuing time of the tasks; or, for each task with the same queuing time, the task priorities from high to low are set according to the front-rear order between shaders corresponding to the task.
3. The task scheduling method according to claim 1, further comprising: monitoring the use state information of the computing resources distributed by each stage of shader;
the scheduling tasks for each shader based on each of the access balance information further comprises:
when the balance in and out information indicates balance, scheduling the task of each shader according to the matching relation between the use state of the computing resource and the task requirement; wherein the matching relationship comprises: the usage rate in the usage state of the computing resource is positively correlated with the amount of delay in the task demand, and/or the remaining amount in the usage state of the computing resource is positively correlated with the amount of consumption in the task demand.
4. The task scheduling method according to claim 3, wherein the scheduling the task of each shader according to the matching relationship between the usage state of the computing resource and the task delay comprises:
determining the task demand type of each stage of shader according to the use state of the computing resource; wherein each of the task demand types corresponds to a combination of task latency and a state of use of computing resources;
and scheduling the corresponding tasks according to the task demand types of the shaders at all levels.
5. The task scheduling method of claim 1, wherein the task requirement type comprises at least one of:
a first task demand type that demands tasks having a latency greater than or equal to a preset latency threshold and a computational resource usage less than a preset resource usage threshold;
a second task demand type that demands tasks having a latency greater than or equal to a preset latency threshold and a computational resource usage greater than or equal to a preset resource usage threshold;
a third task demand type that demands tasks having a latency below a preset latency threshold and a calculated resource usage below a preset resource usage threshold;
a fourth task demand type that demands tasks having a latency below a preset latency threshold and a computational resource usage above or equal to the preset resource usage threshold.
6. A task scheduling apparatus, comprising:
the first monitoring module is used for monitoring the input and the output of each stage of shader in the rendering pipeline;
and the task scheduling module is used for acquiring the access balance information of each stage of shader based on the input and the output of each stage of shader and scheduling the task of each shader based on each access balance information.
7. The task scheduler of claim 6, wherein the task scheduler comprises:
the task priority setting module is used for setting task priorities from high to low correspondingly according to the sequence of the queuing time of the tasks when the access balance information indicates balance; or, when the access balancing information indicates balancing, the task priority from high to low is set according to the front-rear order between shaders corresponding to the tasks with the same queuing time.
8. The task scheduler of claim 6, further comprising:
the second monitoring module is used for monitoring the use state information of the computing resources distributed by each stage of shader;
the task scheduling module is used for scheduling the tasks of the shaders according to the matching relation between the use states of the computing resources and the task requirements when the balance in and out information indicates balance; wherein the matching relationship comprises: the usage rate in the usage state of the computing resource is positively correlated with the amount of delay in the task demand, and/or the remaining amount in the usage state of the computing resource is positively correlated with the amount of consumption in the task demand.
9. The task scheduler of claim 8, wherein the task scheduling module comprises:
the demand type obtaining module is used for determining the task demand type of each stage of shader according to the use state of the computing resource; wherein each of the task demand types corresponds to a combination of task latency and a state of use of computing resources;
and the requirement type scheduling module is used for scheduling the tasks which are consistent with the task requirement types of the corresponding shaders at all levels.
10. The task scheduler of claim 9, wherein the task requirement types include at least one of:
a first task demand type that demands tasks having a latency greater than or equal to a preset latency threshold and a computational resource usage less than a preset resource usage threshold;
a second task demand type that demands tasks having a latency greater than or equal to a preset latency threshold and a computational resource usage greater than or equal to a preset resource usage threshold;
a third task demand type that demands tasks having a latency below a preset latency threshold and a calculated resource usage below a preset resource usage threshold;
a fourth task demand type that demands tasks having a latency below a preset latency threshold and a computational resource usage above or equal to the preset resource usage threshold.
11. A graphics processor, comprising:
computing resources;
the monitoring unit is used for monitoring the input and the output of each stage of shader;
the task scheduling unit is used for acquiring the access balance information of each stage of shader based on the input and the output of each stage of shader and scheduling the task of each shader based on each access balance information;
and the thread scheduler is used for allocating computing resources corresponding to the scheduled tasks.
12. The graphics processor of claim 11, comprising: a storage resource comprising: the buffer area is used for connecting the output and the input between all stages of shaders;
the monitoring unit is also used for monitoring the use state information of the computing resources distributed by each stage of the shader;
the task scheduling unit is further configured to schedule the tasks of the shaders according to the matching relationship between the use states of the computing resources and the task requirements when the access balance information indicates balance; wherein the matching relationship comprises: the usage rate in the usage state of the computing resource is positively correlated with the amount of delay in the task demand, and/or the remaining amount in the usage state of the computing resource is positively correlated with the amount of consumption in the task demand.
13. A computer system comprising a graphics processor as claimed in claim 11 or 12.
14. A computer-readable storage medium having stored thereon program instructions; the program instructions, when executed, perform a method of task scheduling as claimed in any one of claims 1 to 5.
CN202110542963.4A 2021-05-19 2021-05-19 Task scheduling method, device, graphics processor, computer system and storage medium Pending CN113342485A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110542963.4A CN113342485A (en) 2021-05-19 2021-05-19 Task scheduling method, device, graphics processor, computer system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110542963.4A CN113342485A (en) 2021-05-19 2021-05-19 Task scheduling method, device, graphics processor, computer system and storage medium

Publications (1)

Publication Number Publication Date
CN113342485A true CN113342485A (en) 2021-09-03

Family

ID=77469261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110542963.4A Pending CN113342485A (en) 2021-05-19 2021-05-19 Task scheduling method, device, graphics processor, computer system and storage medium

Country Status (1)

Country Link
CN (1) CN113342485A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925139A (en) * 2022-07-21 2022-08-19 沐曦科技(成都)有限公司 Method and device for hierarchically synchronizing data chains and electronic equipment
CN115562469A (en) * 2022-12-07 2023-01-03 深流微智能科技(深圳)有限公司 Power consumption management method and device, image processor and storage medium
CN116188243A (en) * 2023-03-02 2023-05-30 格兰菲智能科技有限公司 Graphics rendering pipeline management method and graphics processor

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070091089A1 (en) * 2005-10-14 2007-04-26 Via Technologies, Inc. System and method for dynamically load balancing multiple shader stages in a shared pool of processing units
US20130185728A1 (en) * 2012-01-18 2013-07-18 Karim M. Abdalla Scheduling and execution of compute tasks
WO2015192627A1 (en) * 2014-06-17 2015-12-23 华为技术有限公司 Service scheduling method, apparatus, and system
US20180165786A1 (en) * 2016-12-13 2018-06-14 Qualcomm Incorporated Resource sharing on shader processor of gpu
US20180197271A1 (en) * 2017-01-12 2018-07-12 Imagination Technologies Limited Graphics processing units and methods using cost indications for sets of tiles of a rendering space
CN111080761A (en) * 2019-12-27 2020-04-28 西安芯瞳半导体技术有限公司 Method and device for scheduling rendering tasks and computer storage medium
CN112189215A (en) * 2018-05-30 2021-01-05 超威半导体公司 Compiler assist techniques for implementing memory usage reduction in a graphics pipeline

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070091089A1 (en) * 2005-10-14 2007-04-26 Via Technologies, Inc. System and method for dynamically load balancing multiple shader stages in a shared pool of processing units
US20130185728A1 (en) * 2012-01-18 2013-07-18 Karim M. Abdalla Scheduling and execution of compute tasks
WO2015192627A1 (en) * 2014-06-17 2015-12-23 华为技术有限公司 Service scheduling method, apparatus, and system
US20180165786A1 (en) * 2016-12-13 2018-06-14 Qualcomm Incorporated Resource sharing on shader processor of gpu
US20180197271A1 (en) * 2017-01-12 2018-07-12 Imagination Technologies Limited Graphics processing units and methods using cost indications for sets of tiles of a rendering space
CN112189215A (en) * 2018-05-30 2021-01-05 超威半导体公司 Compiler assist techniques for implementing memory usage reduction in a graphics pipeline
CN111080761A (en) * 2019-12-27 2020-04-28 西安芯瞳半导体技术有限公司 Method and device for scheduling rendering tasks and computer storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SANJAYA K. PANDA: "Load balanced task scheduling for cloud computing: a probabilistic approach", 《KNOWLEDGE AND INFORMATION SYSTEMS》 *
何炎祥;张军;沈凡凡;江南;李清安;刘子骏;: "通用图形处理器线程调度优化方法研究综述", 计算机学报, no. 09 *
刘坚: "嵌入式多核GPU渲染流水线的研究与实现", 《中国优秀硕士学位论文全文数据库》, no. 02 *
刘子骏;何炎祥;张军;李清安;沈凡凡;: "一种面向GPGPU的行为感知的存储调度策略", 计算机工程与科学, no. 06 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925139A (en) * 2022-07-21 2022-08-19 沐曦科技(成都)有限公司 Method and device for hierarchically synchronizing data chains and electronic equipment
CN115562469A (en) * 2022-12-07 2023-01-03 深流微智能科技(深圳)有限公司 Power consumption management method and device, image processor and storage medium
CN115562469B (en) * 2022-12-07 2023-03-07 深流微智能科技(深圳)有限公司 Power consumption management method and device, image processor and storage medium
CN116188243A (en) * 2023-03-02 2023-05-30 格兰菲智能科技有限公司 Graphics rendering pipeline management method and graphics processor
CN116188243B (en) * 2023-03-02 2024-09-06 格兰菲智能科技股份有限公司 Graphics rendering pipeline management method and graphics processor

Similar Documents

Publication Publication Date Title
US10217183B2 (en) System, method, and computer program product for simultaneous execution of compute and graphics workloads
US8854381B2 (en) Processing unit that enables asynchronous task dispatch
KR101563098B1 (en) Graphics processing unit with command processor
US8074224B1 (en) Managing state information for a multi-threaded processor
US9286119B2 (en) System, method, and computer program product for management of dependency between tasks
CN113342485A (en) Task scheduling method, device, graphics processor, computer system and storage medium
US20120256922A1 (en) Multithreaded Processor and Method for Realizing Functions of Central Processing Unit and Graphics Processing Unit
US20100123717A1 (en) Dynamic Scheduling in a Graphics Processor
US9659399B2 (en) System, method, and computer program product for passing attribute structures between shader stages in a graphics pipeline
US10565670B2 (en) Graphics processor register renaming mechanism
US7747842B1 (en) Configurable output buffer ganging for a parallel processor
US20080198166A1 (en) Multi-threads vertex shader, graphics processing unit, and flow control method
CN110728616A (en) Tile allocation for processing cores within a graphics processing unit
US20170069054A1 (en) Facilitating efficient scheduling of graphics workloads at computing devices
KR102006584B1 (en) Dynamic switching between rate depth testing and convex depth testing
US10410311B2 (en) Method and apparatus for efficient submission of workload to a high performance graphics sub-system
CN114972607B (en) Data transmission method, device and medium for accelerating image display
US8363059B2 (en) Rendering processing apparatus, parallel processing apparatus, and exclusive control method
CN108140233A (en) Graphics processing unit with block of pixels level granularity is seized
JP2009505301A (en) Scalable parallel pipelined floating point unit for vector processing
US7876329B2 (en) Systems and methods for managing texture data in a computer
US9171525B2 (en) Graphics processing unit with a texture return buffer and a texture queue
US9214008B2 (en) Shader program attribute storage
CN113467959A (en) Method, device and medium for determining task complexity applied to GPU
US9477480B2 (en) System and processor for implementing interruptible batches of instructions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination