CN114880082A - Multithreading beam warp dynamic scheduling system and method based on sampling state - Google Patents

Multithreading beam warp dynamic scheduling system and method based on sampling state Download PDF

Info

Publication number
CN114880082A
CN114880082A CN202210278744.4A CN202210278744A CN114880082A CN 114880082 A CN114880082 A CN 114880082A CN 202210278744 A CN202210278744 A CN 202210278744A CN 114880082 A CN114880082 A CN 114880082A
Authority
CN
China
Prior art keywords
warp
sampling
sequencing
thread block
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210278744.4A
Other languages
Chinese (zh)
Other versions
CN114880082B (en
Inventor
贾世伟
张玉明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210278744.4A priority Critical patent/CN114880082B/en
Priority claimed from CN202210278744.4A external-priority patent/CN114880082B/en
Publication of CN114880082A publication Critical patent/CN114880082A/en
Application granted granted Critical
Publication of CN114880082B publication Critical patent/CN114880082B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/484Precedence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/486Scheduler internals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Slot Machines And Peripheral Devices (AREA)

Abstract

The invention discloses a multithreading beam warp dynamic scheduling system and a multithreading beam warp dynamic scheduling method based on a sampling state, wherein the system comprises the following steps: the device comprises a warp sequencing method judgment module, a Robin ring warp sequencing module, a warp sequencing module based on a sampling thread block and a warp scheduling module, wherein the warp sequencing module based on the sampling thread block comprises: the device comprises a sampling thread block judgment unit, a first sampling thread block warp sorting unit, a first non-sampling thread block warp sorting unit, a first warp ID searching unit, a second sampling thread block warp sorting unit and a second non-sampling thread block warp sorting unit. The invention provides a new strategy for carrying out multithreading beam warp dynamic scheduling based on the sampling state corresponding to the sampling thread block, which increases the blocking delay hiding capability, can realize the function design of a dynamic adjustment module, and can bring performance advantages when the function design of the dynamic adjustment module takes effect.

Description

Multithreading beam warp dynamic scheduling system and method based on sampling state
Technical Field
The invention belongs to the technical field of GPU (graphics processing Unit) architecture, and particularly relates to a multithreading beam warp dynamic scheduling system and method based on a sampling state.
Background
The warp scheduling strategy is to sort the currently scheduled warp in a certain sorting mode and deliver the selected, scheduled and executed by a plurality of schedulers. The quality of the scheduling policy is closely related to the execution performance of a Graphics Processing Unit (GPU).
Currently, published researches include wap scheduling policies such as a Loose Round Robin (lrr for short), Greedy-to-old (gto for short), two-level (tl for short), and the like, which are static wap scheduling policies independent of an operating program. The most common multithreading beam warp dynamic scheduling strategy is robinloop warp scheduling, which takes the warp corresponding to the warp ID transmitted in the previous cycle as input, searches the next warp next to the warp ID, and sequentially orders the rest of the warps by taking the warp ID as the first warp. It can be seen that, in the conventional robine ring scheduling policy, because each warp is executed at a close progress, execution of a sampled thread bundle warp is completed, when a sampling index is fed back to a control logic of the GPU, all warp tasks are also close to being completed, and when all thread bundles warp issue long-delay instructions (Load instructions) for accessing global storage at a close period, a large number of memory access instructions seize limited memory resources, which may cause memory blocking and damage cache performance, the design policy cannot achieve a good GPU execution effect.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a multithreading beam warp dynamic scheduling system and method based on a sampling state. The technical problem to be solved by the invention is realized by the following technical scheme:
in a first aspect, an embodiment of the present invention provides a multithreading beam warp dynamic scheduling system based on a sampling state, including: a warp sequencing method judging module, a Robin loop warp sequencing module, a warp sequencing module based on a sampling thread block and a warp scheduling module, wherein,
the warp sequencing method judging module is used for determining to adopt the Robin ring warp sequencing module or the warp sequencing module based on the sampling thread blocks to perform warp sequencing according to the sampling state corresponding to the pre-selected sampling thread blocks, and inputting the latest executed warp ID into the Robin ring warp sequencing module or the warp sequencing module based on the sampling thread blocks; the sampling state is used for representing warp execution conditions corresponding to the sampling thread blocks;
the robinyl ring warp sequencing module is connected with the warp sequencing method judging module and the warp scheduling module, and is used for determining the next warp next to the nearest executive warp ID according to the nearest executive warp ID, and sequencing the warps corresponding to all thread blocks in the system in sequence by taking the next warp as a head; inputting the warp sequencing result to the warp scheduling module;
the warp sorting module based on the sampling thread block comprises: a sampling thread block judgment unit, a first sampling thread block warp sequencing unit, a first non-sampling thread block warp sequencing unit, a first warp ID lookup unit, a second sampling thread block warp sequencing unit and a second non-sampling thread block warp sequencing unit, wherein,
the sampling thread block judging unit is connected with the warp sequencing method judging module and is used for judging whether the warp corresponding to the recently executed warp ID belongs to the sampling thread block, if so, the recently executed warp ID is input into the first sampling thread block warp sequencing unit, and if not, the recently executed warp ID is input into the first warp ID searching unit;
the first sampling thread block warp sequencing unit is connected with the sampling thread block judging unit and used for determining the next warp next to the last sampling thread block according to the latest executed warp ID, and sequencing the rest warps belonging to the same sampling thread block with the warp by taking the next warp as a head to obtain a first warp sequencing result; inputting the first warp sorting result to the first non-sampling thread block warp sorting unit;
the first non-sampling thread block warp sequencing unit is connected with the first sampling thread block warp sequencing unit and the warp scheduling module, and is used for sequencing the remaining warps except the sampling thread blocks in sequence by taking the first warp in the next thread block next to the sampling thread block as a head to obtain a second warp sequencing result, and sequencing the second warp sequencing result behind the first warp sequencing result in sequence; inputting the warp sequencing result to the warp scheduling module;
the first warp ID searching unit is connected with the sampling thread block judging unit and used for searching the first warp ID belonging to the sampling thread block; inputting the found warp ID and the most recently executed warp ID into the second sampling thread block warp sorting unit;
the second sampling thread block warp sequencing unit is connected with the first warp ID searching unit and used for sequencing the rest warps belonging to the same sampling thread block with the found warp ID as the first warp, so as to obtain a third warp sequencing result; inputting the third warp ordering result and the most recently executed warp ID to the second non-sampled thread block warp ordering unit;
the second non-sampling thread block warp sequencing unit is connected with the second sampling thread block warp sequencing unit and the warp scheduling module, and is used for determining the next warp next to the second non-sampling thread block according to the most recently executed warp ID, sequencing the remaining warps except the sampling thread block in sequence by taking the next warp as a head, obtaining a fourth warp sequencing result, and sequencing the fourth warp sequencing result to be behind the third warp sequencing result; the warp sort result is input to the warp scheduling module.
In an embodiment of the present invention, the system further includes an issue module, connected to the warp sorting method determining module, configured to count task execution conditions of warps corresponding to all thread blocks in the system, and output corresponding sampling states; inputting all sampling states and the most recently executed warp ID to the warp sorting method decision module;
the sampling states comprise a first sampling state and a second sampling state, the first sampling state represents that a warp task in the thread block is executed and completed, and the second sampling state represents that the warp task in the thread block is not executed and completed.
In an embodiment of the present invention, in the warp sorting method determination module, determining to use the robin ring warp sorting module or the warp sorting module based on the sampling thread block to perform warp sorting according to a preselected sampling thread block includes:
selecting a sampling state corresponding to the sampling thread block from the sampling states input by the issue module; judging which sampling state the sampling state is:
when the sampling state is a first sampling state, the warp sequencing method determination module determines to adopt the Robin ring warp sequencing module to perform warp sequencing;
and when the sampling state is a second sampling state, the warp sequencing method judgment module determines to adopt the warp sequencing module based on the sampling thread block to perform warp sequencing.
In an embodiment of the present invention, the warp scheduling module includes a plurality of schedulers, and is configured to sequentially determine whether each warp in a warp sequencing result received by each scheduler can be transmitted and executed, and if the warp can be transmitted and executed, terminate the determination of the corresponding scheduler and input the warp into the issue module;
the issue module is further connected to the warp scheduling module, and is further configured to receive the warp transmitted by the warp scheduling module, and use the warp as the corresponding most recently executed warp in the next round of sequencing.
In an embodiment of the present invention, the system further includes a sampling thread block selection module, connected to the sampling thread block determination unit, for selecting in advance any thread block from all thread blocks of the system as a sampling thread block.
In a second aspect, an embodiment of the present invention provides a method for multithreading beam warp dynamic scheduling based on a sampling state, including:
determining a warp sequencing method according to a sampling state corresponding to a preselected sampling thread block, wherein the warp sequencing method comprises a Robin loop warp sequencing method and a warp sequencing method based on the sampling thread block; the sampling state is used for representing warp execution conditions corresponding to the sampling thread blocks;
performing warp sequencing by using the determined warp sequencing method, and realizing warp dynamic scheduling by using a warp sequencing result, wherein,
when the determined warp sequencing method is the Robin loop warp sequencing method, the warp sequencing process comprises the following steps: determining the next warp next to the last-executed warp ID according to the last-executed warp ID, and sequencing the warps corresponding to all thread blocks in the system in sequence by taking the next-executed warp as a head;
when the determined warp sorting method is a sampling thread block-based warp sorting method, the process of performing warp sorting comprises the following steps:
judging whether the warp corresponding to the most recently executed warp ID belongs to a sampling thread block: if yes, determining the next warp next to the last-executed warp ID according to the last-executed warp ID, sequencing the rest warps belonging to the same sampling thread block with the warp in sequence by taking the next warp as a head to obtain a first warp sequencing result, sequencing the rest warps except the sampling thread block in sequence by taking the first warp in the next thread block next to the sampling thread block as a head to obtain a second warp sequencing result, and sequencing the second warp sequencing result sequentially behind the first warp sequencing result; if not, searching the first warp ID belonging to the sampling thread block, taking the found warp corresponding to the warp ID as the head, sequencing the rest warps belonging to the same sampling thread block as the found warp ID in sequence to obtain a third warp sequencing result, determining the next warp next to the found warp according to the most recently executed warp ID, sequencing the rest warps except the sampling thread block in sequence by taking the next warp as the head to obtain a fourth warp sequencing result, and sequencing the fourth warp sequencing result sequentially behind the third warp sequencing result.
In one embodiment of the present invention, further comprising:
counting task execution conditions of warp corresponding to all thread blocks in the system, and outputting corresponding sampling states;
the sampling states comprise a first sampling state and a second sampling state, the first sampling state represents that a warp task in the thread block is executed and completed, and the second sampling state represents that the warp task in the thread block is not executed and completed.
In an embodiment of the present invention, the method for determining warp sorting according to a sampling state corresponding to a preselected sampling thread block includes:
selecting a sampling state corresponding to the sampling thread block from all the statistical sampling states; judging which sampling state the sampling state is:
when the sampling state is a first sampling state, the determined warp sequencing method is a Robin ring warp sequencing method;
and when the sampling state is the second sampling state, determining the warp sorting method as a warp sorting method based on the sampling thread block.
In an embodiment of the present invention, the selecting process of the sampling thread block includes:
any one thread block is selected in advance from all thread blocks of the system as a sampling thread block.
In one embodiment of the present invention, further comprising:
each scheduler receives warp sequencing results;
and sequentially judging whether each warp in the warp sequencing result can be transmitted and executed or not, if yes, stopping the judgment of the corresponding scheduler and transmitting the warp, and taking the warp as the corresponding most recently executed warp in the next round of warp sequencing.
The invention has the beneficial effects that:
the invention provides a multithreading beam warp dynamic scheduling system based on a sampling state, and provides a new strategy for multithreading beam warp dynamic scheduling based on the sampling state corresponding to a sampling thread block, specifically: according to the invention, two different multithreading beam warp dynamic sorting strategies, namely a Robin ring warp sorting strategy and a sampling thread block-based warp sorting strategy, are firstly determined according to the sampling state corresponding to the sampling thread block, and aiming at the sampling thread block-based warp sorting strategy, the sampling thread block judgment module is used for further determining the two different multithreading beam warp dynamic sorting strategies which are preferential to the sampling thread block, namely the dynamic sorting strategies corresponding to a first sampling thread block warp sorting unit and a first non-sampling thread block warp sorting unit, and the dynamic sorting strategies corresponding to a first warp ID search unit, a second sampling thread block warp sorting unit and a second non-sampling thread block warp sorting unit. Therefore, the invention can complete three different multithreading beam warp dynamic sequencing strategies, and inputs the warp sequencing result into the warp scheduling module, thereby effectively realizing the multithreading beam warp dynamic scheduling strategy based on the sampling state.
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Drawings
FIG. 1 is a schematic structural diagram of a multithreading beam warp dynamic scheduling system based on a sampling state according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of another multi-thread beam warp dynamic scheduling system based on sampling states according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a multithreading beam warp dynamic scheduling method based on a sampling state according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.
In order to achieve better GPU execution performance, the embodiment of the invention provides a system and a method for multithreading beam warp dynamic scheduling based on a sampling state.
In a first aspect, referring to fig. 1, an embodiment of the present invention provides a multithreading beam warp dynamic scheduling system based on a sampling state, including: a warp sequencing method judgment module, a Robin loop warp sequencing module, a warp sequencing module based on a sampling thread block and a warp scheduling module, wherein,
the warp sequencing method judging module is used for determining to adopt the Robin ring warp sequencing module or the warp sequencing module based on the sampling thread blocks to perform warp sequencing according to the sampling state corresponding to the pre-selected sampling thread blocks, and inputting the latest executed warp ID into the Robin ring warp sequencing module or the warp sequencing module based on the sampling thread blocks; the sampling state is used for representing warp execution conditions corresponding to the sampling thread blocks;
the robinyl ring warp sequencing module is connected with the warp sequencing method judging module and the warp scheduling module, and is used for determining the next warp next to the nearest executive warp ID according to the nearest executive warp ID, and sequencing the warps corresponding to all thread blocks in the system in sequence by taking the next warp as a head; inputting the warp sequencing result to the warp scheduling module;
the warp sorting module based on the sampling thread block comprises: a sampling thread block judgment unit, a first sampling thread block warp sorting unit, a first non-sampling thread block warp sorting unit, a first warp ID searching unit, a second sampling thread block warp sorting unit and a second non-sampling thread block warp sorting unit, wherein,
the sampling thread block judgment unit is connected with the warp sequencing method judgment module and is used for judging whether the warp corresponding to the recently executed warp ID belongs to the sampling thread block, if so, the recently executed warp ID is input into the first sampling thread block warp sequencing unit, and if not, the recently executed warp ID is input into the first warp ID searching unit;
the first sampling thread block warp sequencing unit is connected with the sampling thread block judging unit and used for determining the next warp next to the last sampling thread block according to the recently executed warp ID, and sequencing the rest warps belonging to the same sampling thread block with the warp by taking the next warp as a head to obtain a first warp sequencing result; inputting a first warp sorting result into a first non-sampling thread block warp sorting unit;
the first non-sampling thread block warp sequencing unit is connected with the first sampling thread block warp sequencing unit and the warp scheduling module, and is used for sequencing the rest of warps except the sampling thread blocks in sequence by taking the first warp in the next thread block next to the sampling thread block as a first order to obtain a second warp sequencing result, and sequencing the second warp sequencing result behind the first warp sequencing result in sequence; inputting the warp sequencing result to a warp scheduling module;
the first warp ID searching unit is connected with the sampling thread block judging unit and used for searching the first warp ID belonging to the sampling thread block; inputting the found warp ID and the most recently executed warp ID into a second sampling thread block warp sequencing unit;
the second sampling thread block warp sequencing unit is connected with the first warp ID searching unit and used for sequencing the rest warps belonging to the same sampling thread block with the found warp ID as the first warp, so as to obtain a third warp sequencing result; inputting a third warp sorting result and a most recently executed warp ID into a second non-sampling thread block warp sorting unit;
the second non-sampling thread block warp sequencing unit is connected with the second sampling thread block warp sequencing unit and the warp scheduling module and used for determining the next warp next to the last sampling thread block according to the latest executed warp ID, sequencing the rest of the warps except the sampling thread block in sequence by taking the next warp as the head to obtain a fourth warp sequencing result, and sequencing the fourth warp sequencing result to the rear of the third warp sequencing result in sequence; the warp sort results are input to a warp scheduling module.
The traditional robine ring warp scheduling strategy is adopted to statically endow all the thread bundles warp in the system with the same priority, namely the progress of all the thread bundles warp in execution is close, which has two problems: 1. when all the thread bundles warp emit long-delay instructions (Load instructions) for accessing global storage in an approximate period, a large number of memory access instructions can occupy limited storage resources, storage blockage is caused, and the performance of a cache (cache) is damaged; 2. in part of GPU module designs (such as a Cache bypass system), indexes such as Cache hit rate, memory access instruction number and the like need to be dynamically sampled and fed back to a control logic of the GPU to dynamically adjust functions of some modules, and the sampling time is usually the time length for completing all execution tasks by one thread beam warp or one thread block. And the Robin ring scheduling strategy ensures that each thread bundle warp is executed at a close progress, so that the execution of the sampled thread bundle warp is completed, and when a sampling index is fed back to a control logic of the GPU, all thread bundle warp tasks are also nearly completed.
In view of the existing problems, a new idea is provided in an embodiment of the present invention, and a multi-thread warp beam dynamic scheduling policy based on a sampling state is designed, as shown in fig. 1, on the basis of a traditional robin-loop warp sequencing policy, a warp sequencing policy based on a sampling thread block is added, and by using the warp sequencing idea with sampling thread block priority, only part of the thread beams warp are always preferentially executed, for example, a warp sequencing result jointly output by a first sampling thread block warp sequencing unit and a first non-sampling thread block warp sequencing unit, and a warp sequencing result jointly output by a first warp ID lookup unit, a second sampling thread block warp sequencing unit and a second non-sampling thread block warp sequencing unit, where the two output modes are different from a first sampling thread beam warp in sequencing, and the warp sequencing ideas are consistent. Referring to fig. 2, a sampling thread block selection module is designed, connected to a sampling thread block determination unit, and configured to select any one thread block from all thread blocks of a system in advance as a sampling thread block. For example, the system includes a thread block 0, a thread block 1, and thread blocks 2 and … …, where the thread blocks N and N are integers, and generally because the thread blocks 0 may be designed in the system, in order to adapt to the universality of the system, the thread block 0 may be directly selected as the sampling thread block, or any one of the thread blocks 0 to N may be selected as the sampling thread block in real time according to the actual system condition.
For example, suppose that six thread bundles warp exist in the system, namely warp0, warp1, warp2, warp3, warp4 and warp5, wherein warp0 and warp1 belong to a thread block 0, warp2 and warp3 belong to a thread block 1, warp4 and warp5 belong to a thread block 2, and a sampling thread block is selected in advance to be a thread block 1:
assuming that a thread bundle warp executed in a cycle on a system is warp2, namely a thread bundle warp executed most recently is warp2, and it is determined in the sampling thread block determination unit that warp2 belongs to a sampling thread block, a thread bundle warp sorting result of the cycle is generated according to a first sampling thread block warp sorting unit and a first non-sampling thread block warp sorting unit, specifically, a next warp determined to be warp2 in the first sampling thread block warp sorting unit is warp3, and all the warps in the sampling thread block are sorted in order by taking warp3 as a head to obtain a first warp sorting result, where the first warp sorting result is: warp3 and warp2, and the first non-sampling thread block warp sorting unit takes the first warp in the next thread block next to the sampling thread block (thread block 1) as the head, namely takes the warp4 in the thread block 2 as the head, and sorts the remaining warps except the sampling thread block in sequence to obtain a second warp sorting result, wherein the second warp sorting result is as follows: warp4, warp5, warp0 and warp1, and then the second warp sorting result is sequentially arranged after the first warp sorting result, and the final warp sorting result is formed as follows: warp3, warp2, warp4, warp5, warp0, warp 1.
Assuming that a thread bundle warp executed in a cycle on a system is warp4, that is, a thread bundle warp executed most recently is warp4, and it is determined in the sampling thread block determination unit that warp4 does not belong to thread block 1, then the first warp of the thread block 1 is found in the first warp ID lookup unit to be warp2, a local cycle warp sorting result is generated according to the second sampling thread block warp sorting unit and the second non-sampling thread block warp sorting unit, specifically, the second sampling thread block warp sorting unit takes warp2 as the head, all the warps in the sampling thread blocks are sequentially sorted to obtain a third warp sorting result, and the third warp sorting result is: warp2 and warp3, the second non-sampling thread block warp sorting unit determines that the next warp next to the last-executed warp4 is warp5, and sorts the remaining warps except the sampling thread block in sequence by taking the warp5 as the head to obtain a fourth warp sorting result, wherein the fourth warp sorting result is: warp5, warp0, warp1 and warp4, and then the fourth warp sorting result is sequentially arranged after the third warp sorting result, and the final warp sorting result is formed as follows: warp2, warp3, warp5, warp0, warp1, warp 4.
From the above, no matter how ordered, all the thread bundles warp in the thread block 1 corresponding to the sampling thread block are always at the top of the warp ordering result based on the warp ordering policy of the sampling thread block priority, that is, the thread bundles warp are given the highest transmission and execution priority, that is, during execution, the execution progress of warp2 and warp3 is kept close to each other, which is faster than the execution progress of warp0, warp1, warp4 and warp 5. The warp sequencing strategy based on the sampling thread block provided by the embodiment of the invention brings two advantages: 1. when the warp2 and the warp3 transmit long-delay memory access instructions to be blocked, the long-delay memory access instructions cannot be simultaneously encountered because the warp0, the warp1, the warp4 and the warp5 are different from the execution progress of the long-delay memory access instructions, so that the blocking delay can be hidden by transmitting the execution warp0, the warp1, the warp4 and the warp5, and the hiding capability of the blocking delay is increased; 2. when execution of warp2 and warp3 is completed, sampling is finished, and module function adjustment is fed back, a large number of remaining warp tasks (warp0, warp1, warp4 and warp5) can still be executed, so that design of dynamic adjustment module functions is enabled, and performance advantages are brought.
Further, referring to fig. 2 again, the embodiment of the present invention further includes an issue module, connected to the warp sorting method determining module, and configured to count the task execution conditions of the warps corresponding to all the thread blocks in the system, and output corresponding sampling states; inputting all sampling states and the most recently executed warp ID into a warp sorting method judgment module; the sampling state comprises a first sampling state and a second sampling state, the first sampling state represents that a warp task in the thread block is executed and completed, and the second sampling state represents that the warp task in the thread block is not executed and completed.
Therefore, in the embodiment of the invention, before performing warp sequencing, statistics is performed on the execution conditions of the warp sampling tasks corresponding to all the thread blocks in the system in advance in the issue module, and then the sampling states of all the warps are sent to the subsequent module for processing. In fact, the method is not limited to the statistics of the sampling state in the issue module, and the same statistical task can be performed in the warp sorting method judgment module.
Further, in the warp sorting method determination module according to the embodiment of the present invention, determining to perform warp sorting by using a robin loop warp sorting module or a warp sorting module based on a sampling thread block according to a preselected sampling thread block, includes:
selecting a sampling state corresponding to the sampling thread block from the sampling states input by the issue module; judging which sampling state the sampling state is: when the sampling state is the first sampling state, the warp sequencing method determination module determines to adopt the Robin ring warp sequencing module to perform warp sequencing; and when the sampling state is the second sampling state, the warp sequencing method judgment module determines to adopt a warp sequencing module based on the sampling thread block to carry out warp sequencing.
In the embodiment of the present invention, warp sequencing is based on a sampling state corresponding to a sampling thread block, that is, execution conditions of all warps in the sampling thread block are judged, and when the sampling state is in a first sampling state, it is determined to perform warp sequencing by using a robin loop warp sequencing module, where a robin loop warp scheduling policy is: the system launches a thread-bundle warp of execution on the cycle above, i.e., the most recently executed one, as input, looks for the next warp to it, and sorts the remaining warps in order, beginning with that warp. Assuming that the system has six executed warps, which are warp0, warp1, warp2, warp3, warp4 and warp5, if the last cycle is launched and executed with warp3, that is, the most recently executed warp is warp3, it is determined that the next warp4 is the first according to warp3, and the warps corresponding to all thread blocks in the system are sorted in order, then under the robine ring policy, the warp sorting result of this cycle is: warp4, warp5, warp0, warp1, warp2, warp 3.
When the sampling state is in the second sampling state, it is determined that warp sequencing is performed by a warp sequencing module based on the sampling thread block, which has been described in detail above and is not described herein again.
Further, referring to fig. 2 again, a warp scheduling module in the embodiment of the present invention includes a plurality of schedulers, and is configured to sequentially determine whether each warp in a warp sorting result received by each scheduler can be transmitted and executed, and if the warp can be transmitted and executed, terminate the determination of the corresponding scheduler and input the warp to an issue module; and the issue module is also connected with the warp scheduling module, and is further used for receiving the warp transmitted by the warp scheduling module and taking the warp as the corresponding most recently executed warp in the next round of sequencing.
In the embodiment of the present invention, each scheduler in a warp scheduling module receives a warp sequencing result input from a first non-sampling thread block warp sequencing unit, a second non-sampling thread block warp sequencing unit, or a robin-loop warp sequencing module, and each scheduler may first determine whether a first warp in the warp sequencing results can be transmitted and executed, and if the first warp can be transmitted and executed, no subsequent warp is determined, and at this time, the warp scheduling module transmits the warp which can be transmitted and executed to an issue module, and uses the warp as a corresponding most recently executed warp in a next round of sequencing. It can be seen that the closer warp is to the first bit, the more chance is given to be transmitted for execution.
For example, the robinyl ring ordering module warp ordering result is: warp4, warp5, warp0, warp1, warp2 and warp3, each scheduler receives the sequencing result, judges whether warp4 can be transmitted for execution, if yes, does not judge the subsequent warp5, warp0, warp1, warp2 and warp3, and directly transmits warp2 to the issue module as the corresponding most recently executed warp in the next round of sequencing. In order to ensure that each thread block completes the release of the tasks in a close time and reduce the waiting period during the release of the tasks, the embodiment of the invention switches back to the Robining ring ordering strategy to realize the warp ordering for the completion of the execution of the warp tasks in all the thread blocks of the system after each thread block completes the execution in a close time.
And the warp sequencing result input by the first non-sampling thread block warp sequencing unit or the second non-sampling thread block warp sequencing unit adopts scheduling processing similar to the Robin ring sequencing module warp, and details are not repeated here. In the embodiment of the invention, because the sampling thread block and the non-sampling thread block adopt the sequencing strategy of the independent modules, when the function of one module is adjusted, other unswept modules still have a large amount of remaining warp tasks to be executed, so that the design of the function of the dynamic adjustment module is effective, the waiting period of task release in the scheduling process is ensured, and the problem of the waiting period of task release based on the scheduling of the thread block is avoided.
The embodiment of the invention provides a multithreading beam warp dynamic scheduling system based on a sampling state, and provides a new strategy for multithreading beam warp dynamic scheduling based on the sampling state corresponding to a sampling thread block, specifically: according to the embodiment of the invention, firstly, two different multithreading beam warp dynamic sorting strategies, namely a Robin ring warp sorting strategy and a sampling thread block-based warp sorting strategy, are determined according to the sampling state corresponding to the sampling thread block, and aiming at the sampling thread block-based warp sorting strategy, the embodiment of the invention further determines two different multithreading beam warp dynamic sorting strategies with sampling thread block priority through a sampling thread block judging module, namely, a dynamic sorting strategy based on a first sampling thread block warp sorting unit and a first non-sampling thread block warp sorting unit, and a dynamic sorting strategy based on a first warp ID searching unit, a second sampling thread block warp sorting unit and a second non-sampling thread block warp sorting unit. Therefore, the embodiment of the invention can complete three different multithreading beam warp dynamic sequencing strategies, and inputs the warp sequencing result into the warp scheduling module, thereby effectively realizing a multithreading beam warp dynamic scheduling strategy based on a sampling state.
In a second aspect, please refer to fig. 3, an embodiment of the present invention provides a method for multithreading beam warp dynamic scheduling based on a sampling state, including:
determining a warp sequencing method according to a sampling state corresponding to a preselected sampling thread block, wherein the warp sequencing method comprises a Robin loop warp sequencing method and a warp sequencing method based on the sampling thread block; the sampling state is used for representing the warp execution condition corresponding to the sampling thread block;
performing warp sequencing by using the determined warp sequencing method, and realizing warp dynamic scheduling by using a warp sequencing result, wherein,
when the determined warp sequencing method is the Robin loop warp sequencing method, the warp sequencing process comprises the following steps: determining the next warp next to the last executed warp ID according to the last executed warp ID, and sequencing the warps corresponding to all thread blocks in the system in sequence by taking the next warp as a head;
when the determined warp sorting method is a sampling thread block-based warp sorting method, the process of performing warp sorting comprises the following steps:
judging whether the warp corresponding to the most recently executed warp ID belongs to the sampling thread block: if yes, determining the next warp next to the last-executed warp ID according to the last-executed warp ID, sequencing the rest warps belonging to the same sampling thread block with the warp as the head of the next warp in sequence to obtain a first warp sequencing result, sequencing the rest warps except the sampling thread block in sequence with the first warp in the next thread block next to the sampling thread block as the head of the next warp to obtain a second warp sequencing result, and sequencing the second warp sequencing result sequentially behind the first warp sequencing result; if not, searching the first warp ID belonging to the sampling thread block, taking the found warp corresponding to the warp ID as the first, sequencing the rest warps belonging to the same sampling thread block as the found warp ID in sequence to obtain a third warp sequencing result, determining the next warp next to the found warp according to the most recently executed warp ID, sequencing the rest warps except the sampling thread block in sequence by taking the next warp as the first to obtain a fourth warp sequencing result, and sequencing the fourth warp sequencing result sequentially to the rear of the third warp sequencing result.
Further, the embodiment of the present invention further includes:
counting task execution conditions of warp corresponding to all thread blocks in the system, and outputting corresponding sampling states; the sampling state comprises a first sampling state and a second sampling state, the first sampling state represents that a warp task in the thread block is executed and completed, and the second sampling state represents that the warp task in the thread block is not executed and completed.
Further, the method for determining warp sorting according to the sampling state corresponding to the pre-selected sampling thread block in the embodiment of the present invention includes:
selecting a sampling state corresponding to the sampling thread block from all the statistical sampling states; judging which sampling state the sampling state is: when the sampling state is the first sampling state, the determined warp sequencing method is a Robin ring warp sequencing method; and when the sampling state is the second sampling state, the determined warp sorting method is a warp sorting method based on the sampling thread block.
Further, the selection process of the sampling thread block in the embodiment of the present invention includes:
any one thread block is selected in advance from all thread blocks of the system as a sampling thread block.
Further, the embodiment of the present invention further includes:
each scheduler receives warp sequencing results; and sequentially judging whether each warp in the warp sequencing results can be transmitted and executed or not, if yes, stopping the judgment of the corresponding scheduler and transmitting the warp, and taking the warp as the corresponding most recently executed warp in the next round of warp sequencing.
As for the method embodiment, since it is basically similar to the system embodiment, the description is simple, and the relevant points can be referred to the partial description of the system embodiment.
The embodiment of the invention provides a multithreading beam warp dynamic scheduling method based on a sampling state, provides a new strategy for multithreading beam warp dynamic scheduling based on the sampling state corresponding to a sampling thread block, and specifically comprises the following steps: according to the embodiment of the invention, two different multithreading beam warp dynamic sorting strategies are firstly determined according to the sampling states corresponding to the sampling thread blocks, namely a Robin Ring warp sorting strategy and a sampling thread block-based warp sorting strategy, and aiming at the sampling thread block-based warp sorting strategy, the embodiment of the invention further determines two different multithreading beam warp dynamic sorting strategies which are preferably used for sampling thread blocks, namely the sampling thread block sorting which takes the warp corresponding to the most recently executed warp ID as the first priority and the non-sampling thread block sorting which takes the first warp not belonging to the sampling thread block as the first priority, and the first warp belonging to the sampling thread block, the sampling thread block sorting which takes the found warp corresponding to the warp ID as the first priority and the non-sampling thread block sorting corresponding to the most recently executed warp ID as the first priority. Therefore, the embodiment of the invention can complete three different multithreading beam warp dynamic sequencing strategies, and inputs the warp sequencing result into the warp scheduler, thereby effectively realizing the multithreading beam warp dynamic scheduling strategy based on the sampling state, and the scheduling strategy has different execution schedules because the sampling thread block and the non-sampling thread block adopt different scheduling strategies, cannot simultaneously encounter a long-delay access instruction, and increases the hiding capacity of blocking delay; according to the embodiment of the invention, under different scheduling strategies, when some function adjustments are needed after sampling is finished and feedback is carried out, a large amount of remaining warp tasks can still be executed in other unaccompanied scheduling strategies, so that the design of a dynamic adjustment function is effective, and the performance advantage is brought.
In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
While the present application has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
The foregoing is a further detailed description of the invention in connection with specific preferred embodiments and it is not intended to limit the invention to the specific embodiments described. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (10)

1. A multi-threaded beam warp dynamic scheduling system based on sampling status, comprising: a warp sequencing method judgment module, a Robin loop warp sequencing module, a warp sequencing module based on a sampling thread block and a warp scheduling module, wherein,
the warp sequencing method judging module is used for determining to adopt the Robin ring warp sequencing module or the warp sequencing module based on the sampling thread blocks to perform warp sequencing according to the sampling state corresponding to the pre-selected sampling thread blocks, and inputting the latest executed warp ID into the Robin ring warp sequencing module or the warp sequencing module based on the sampling thread blocks; the sampling state is used for representing warp execution conditions corresponding to the sampling thread blocks;
the robinyl ring warp sequencing module is connected with the warp sequencing method judging module and the warp scheduling module, and is used for determining the next warp next to the nearest executive warp ID according to the nearest executive warp ID, and sequencing the warps corresponding to all thread blocks in the system in sequence by taking the next warp as a head; inputting the warp sequencing result to the warp scheduling module;
the warp sorting module based on the sampling thread block comprises: a sampling thread block judgment unit, a first sampling thread block warp sorting unit, a first non-sampling thread block warp sorting unit, a first warp ID searching unit, a second sampling thread block warp sorting unit and a second non-sampling thread block warp sorting unit, wherein,
the sampling thread block judging unit is connected with the warp sequencing method judging module and is used for judging whether the warp corresponding to the recently executed warp ID belongs to the sampling thread block, if so, the recently executed warp ID is input into the first sampling thread block warp sequencing unit, and if not, the recently executed warp ID is input into the first warp ID searching unit;
the first sampling thread block warp sequencing unit is connected with the sampling thread block judging unit and used for determining the next warp next to the most recently executed warp according to the most recently executed warp ID, and sequencing the rest warps belonging to the same sampling thread block as the warp by taking the next warp as a head to obtain a first warp sequencing result; inputting the first warp sorting result to the first non-sampling thread block warp sorting unit;
the first non-sampling thread block warp sequencing unit is connected with the first sampling thread block warp sequencing unit and the warp scheduling module, and is used for sequencing the remaining warps except the sampling thread blocks in sequence by taking the first warp in the next thread block next to the sampling thread block as a head to obtain a second warp sequencing result, and sequencing the second warp sequencing result behind the first warp sequencing result in sequence; inputting the warp sequencing result to the warp scheduling module;
the first warp ID searching unit is connected with the sampling thread block judging unit and used for searching the first warp ID belonging to the sampling thread block; inputting the found warp ID and the most recently executed warp ID into the second sampling thread block warp sorting unit;
the second sampling thread block warp sequencing unit is connected with the first warp ID searching unit and used for sequencing the rest warps belonging to the same sampling thread block with the found warp ID as the first warp, so as to obtain a third warp sequencing result; inputting the third warp ordering result and the most recently executed warp ID to the second non-sampled thread block warp ordering unit;
the second non-sampling thread block warp sequencing unit is connected with the second sampling thread block warp sequencing unit and the warp scheduling module, and is used for determining the next warp next to the second non-sampling thread block according to the most recently executed warp ID, sequencing the remaining warps except the sampling thread block in sequence by taking the next warp as a head, obtaining a fourth warp sequencing result, and sequencing the fourth warp sequencing result to be behind the third warp sequencing result; the warp sort result is input to the warp scheduling module.
2. The multithreading warp dynamic scheduling system based on the sampling state as claimed in claim 1, further comprising an issue module connected to the warp sorting method decision module, configured to count task execution conditions of warps corresponding to all thread blocks in the system, and output corresponding sampling states; inputting all sampling states and the most recently executed warp ID to the warp sorting method decision module;
the sampling states comprise a first sampling state and a second sampling state, the first sampling state represents that a warp task in the thread block is executed and completed, and the second sampling state represents that the warp task in the thread block is not executed and completed.
3. The multi-threaded warp dynamic scheduling system based on the sampling state according to claim 2, wherein the warp sorting method determination module determines to adopt the rogue ring warp sorting module or the warp sorting module based on the sampling thread blocks to perform warp sorting according to a pre-selected sampling thread block, and the method comprises the following steps:
selecting a sampling state corresponding to the sampling thread block from sampling states input by the issue module; judging which sampling state the sampling state is:
when the sampling state is a first sampling state, the warp sequencing method determination module determines to adopt the Robin ring warp sequencing module to perform warp sequencing;
and when the sampling state is a second sampling state, the warp sequencing method judgment module determines to adopt the warp sequencing module based on the sampling thread block to perform warp sequencing.
4. The multithreading warp dynamic scheduling system based on the sampling state as claimed in claim 2, wherein the warp scheduling module comprises a plurality of schedulers for sequentially judging whether each warp in the warp sequencing result received by each scheduler can be transmitted and executed, if yes, the judgment of the corresponding scheduler is terminated and the warp is input to the issue module;
the issue module is further connected to the warp scheduling module, and is further configured to receive the warp transmitted by the warp scheduling module, and use the warp as the corresponding most recently executed warp in the next round of sorting.
5. The multi-thread beam warp dynamic scheduling system based on the sampling state of claim 1, further comprising a sampling thread block selecting module connected to the sampling thread block determining unit for selecting any thread block from all thread blocks of the system as a sampling thread block in advance.
6. A multithreading beam warp dynamic scheduling method based on a sampling state is characterized by comprising the following steps:
determining a warp sequencing method according to a sampling state corresponding to a preselected sampling thread block, wherein the warp sequencing method comprises a Robin loop warp sequencing method and a warp sequencing method based on the sampling thread block; the sampling state is used for representing warp execution conditions corresponding to the sampling thread blocks;
performing warp sequencing by using the determined warp sequencing method, and realizing warp dynamic scheduling by using a warp sequencing result, wherein,
when the determined warp sequencing method is the Robin loop warp sequencing method, the warp sequencing process comprises the following steps: determining the next warp next to the last-executed warp ID according to the last-executed warp ID, and sequencing the warps corresponding to all thread blocks in the system in sequence by taking the next-executed warp as a head;
when the determined warp sorting method is a sampling thread block-based warp sorting method, the process of performing warp sorting comprises the following steps:
judging whether the warp corresponding to the most recently executed warp ID belongs to a sampling thread block: if yes, determining the next warp next to the last-executed warp ID according to the last-executed warp ID, sequencing the rest warps belonging to the same sampling thread block with the warp in sequence by taking the next warp as a head to obtain a first warp sequencing result, sequencing the rest warps except the sampling thread block in sequence by taking the first warp in the next thread block next to the sampling thread block as a head to obtain a second warp sequencing result, and sequencing the second warp sequencing result sequentially behind the first warp sequencing result; if not, searching the first warp ID belonging to the sampling thread block, taking the found warp corresponding to the warp ID as the head, sequencing the rest warps belonging to the same sampling thread block as the found warp ID in sequence to obtain a third warp sequencing result, determining the next warp next to the found warp according to the most recently executed warp ID, sequencing the rest warps except the sampling thread block in sequence by taking the next warp as the head to obtain a fourth warp sequencing result, and sequencing the fourth warp sequencing result sequentially behind the third warp sequencing result.
7. The method of multi-threaded beam warp dynamic scheduling based on sampling status of claim 6, further comprising:
counting task execution conditions of warp corresponding to all thread blocks in the system, and outputting corresponding sampling states;
the sampling states comprise a first sampling state and a second sampling state, the first sampling state represents that a warp task in the thread block is executed and completed, and the second sampling state represents that the warp task in the thread block is not executed and completed.
8. The method for multithreading warp dynamic scheduling based on the sampling state of claim 7, wherein the method for determining warp sorting according to the sampling state corresponding to the preselected sampling thread block comprises the following steps:
selecting a sampling state corresponding to the sampling thread block from all the statistical sampling states; judging which sampling state the sampling state is:
when the sampling state is a first sampling state, the determined warp sequencing method is a Robin loop warp sequencing method;
and when the sampling state is the second sampling state, determining the warp sorting method as a warp sorting method based on the sampling thread block.
9. The method of claim 6, wherein the selecting of the sample thread block comprises:
any one thread block is selected in advance from all thread blocks of the system as a sampling thread block.
10. The method of multi-threaded beam warp dynamic scheduling based on sampling status of claim 6, further comprising:
each scheduler receives warp sequencing results;
and sequentially judging whether each warp in the warp sequencing results can be transmitted and executed or not, if yes, stopping the judgment of the corresponding scheduler and transmitting the warp, and taking the warp as the corresponding most recently executed warp in the next round of warp sequencing.
CN202210278744.4A 2022-03-21 Multithreading beam warp dynamic scheduling system and method based on sampling state Active CN114880082B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210278744.4A CN114880082B (en) 2022-03-21 Multithreading beam warp dynamic scheduling system and method based on sampling state

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210278744.4A CN114880082B (en) 2022-03-21 Multithreading beam warp dynamic scheduling system and method based on sampling state

Publications (2)

Publication Number Publication Date
CN114880082A true CN114880082A (en) 2022-08-09
CN114880082B CN114880082B (en) 2024-06-04

Family

ID=

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106502632A (en) * 2016-10-28 2017-03-15 武汉大学 A kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam
CN106648545A (en) * 2016-01-18 2017-05-10 天津大学 Register file structure used for branch processing in GPU
CN109814989A (en) * 2018-12-12 2019-05-28 中国航空工业集团公司西安航空计算技术研究所 A kind of preferential unified dyeing graphics processor warp dispatching device of classification
CN109886407A (en) * 2019-02-27 2019-06-14 上海商汤智能科技有限公司 Data processing method, device, electronic equipment and computer readable storage medium
WO2019165428A1 (en) * 2018-02-26 2019-08-29 Zytek Communications Corporation Thread scheduling on simt architectures with busy-wait synchronization
CN112000846A (en) * 2020-08-19 2020-11-27 东北大学 Method for grouping LSM tree indexes based on GPU
CN113157418A (en) * 2021-04-25 2021-07-23 腾讯科技(深圳)有限公司 Server resource allocation method and device, storage medium and electronic equipment
CN116244072A (en) * 2023-01-16 2023-06-09 西安电子科技大学 GPGPU micro-architecture system for fence synchronization

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106648545A (en) * 2016-01-18 2017-05-10 天津大学 Register file structure used for branch processing in GPU
CN106502632A (en) * 2016-10-28 2017-03-15 武汉大学 A kind of GPU parallel particle swarm optimization methods based on self-adaptive thread beam
WO2019165428A1 (en) * 2018-02-26 2019-08-29 Zytek Communications Corporation Thread scheduling on simt architectures with busy-wait synchronization
CN109814989A (en) * 2018-12-12 2019-05-28 中国航空工业集团公司西安航空计算技术研究所 A kind of preferential unified dyeing graphics processor warp dispatching device of classification
CN109886407A (en) * 2019-02-27 2019-06-14 上海商汤智能科技有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN112000846A (en) * 2020-08-19 2020-11-27 东北大学 Method for grouping LSM tree indexes based on GPU
CN113157418A (en) * 2021-04-25 2021-07-23 腾讯科技(深圳)有限公司 Server resource allocation method and device, storage medium and electronic equipment
CN116244072A (en) * 2023-01-16 2023-06-09 西安电子科技大学 GPGPU micro-architecture system for fence synchronization

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SHIWEI JIA: ""A Survey of GPGPU Parallel Processing Architecture Performance Optimization"", 《2021 IEEE/ACIS 20TH INTERNATIONAL FALL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS FALL)》, 15 October 2021 (2021-10-15), pages 75 - 82, XP034040164, DOI: 10.1109/ICISFall51598.2021.9627400 *
张军: ""基于线程调度的通用图形处理器性能优化方法研究"", 《中国博士学位论文全文数据库 信息科技辑》, no. 2019, 15 June 2019 (2019-06-15), pages 137 - 2 *
贾世伟: ""一种面向二维三维卷积的GPGPU cache旁路系统"", 《西安电子科技大学学报》, vol. 50, no. 02, 11 January 2023 (2023-01-11), pages 92 - 100 *

Similar Documents

Publication Publication Date Title
US7559062B2 (en) Intelligent scheduler for multi-level exhaustive scheduling
US20100161941A1 (en) Method and system for improved flash controller commands selection
CN107992359B (en) Task scheduling method for cost perception in cloud environment
CN1188794C (en) Coprocessor with multiple logic interface
US6981260B2 (en) Apparatus for minimizing lock contention in a multiple processor system with multiple run queues when determining the threads priorities
CN113010302B (en) Multi-task scheduling method and system under quantum-classical hybrid architecture and quantum computer system architecture
CN102981807B (en) Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment
CN110489217A (en) A kind of method for scheduling task and system
CN106020954A (en) Thread management method and device
CN106227507A (en) Calculating system and controller thereof
CN109062604A (en) A kind of launching technique and device towards the mixing execution of scalar sum vector instruction
CN110457238A (en) The method paused when slowing down GPU access request and instruction access cache
CN108509280A (en) A kind of Distributed Calculation cluster locality dispatching method based on push model
CN109656733A (en) The method and apparatus of the more OCR recognition engines of intelligent scheduling
CN110515713A (en) A kind of method for scheduling task, equipment and computer storage medium
CN115237568A (en) Mixed weight task scheduling method and system for edge heterogeneous equipment
CN106155794A (en) A kind of event dispatcher method being applied in multi-threaded system and device
CN114880082A (en) Multithreading beam warp dynamic scheduling system and method based on sampling state
CN114880082B (en) Multithreading beam warp dynamic scheduling system and method based on sampling state
CN110084507A (en) The scientific workflow method for optimizing scheduling of perception is classified under cloud computing environment
CN1234069C (en) Method and equipment for realizing support to multithread under high speed clock two-stage thread state
CN112346836A (en) Method and device for preempting shared computing resource, user equipment and storage medium
US11061724B2 (en) Programmable hardware scheduler for digital processing systems
CN105653243A (en) Method for distributing tasks by general purpose graphic processing unit in multi-task concurrent execution manner
CN109257280A (en) A kind of micro engine and its method for handling message

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant