CN116225705A - Resource allocation circuit, method and device, task scheduler and chip - Google Patents

Resource allocation circuit, method and device, task scheduler and chip Download PDF

Info

Publication number
CN116225705A
CN116225705A CN202310184155.4A CN202310184155A CN116225705A CN 116225705 A CN116225705 A CN 116225705A CN 202310184155 A CN202310184155 A CN 202310184155A CN 116225705 A CN116225705 A CN 116225705A
Authority
CN
China
Prior art keywords
task
resource allocation
execution unit
execution units
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310184155.4A
Other languages
Chinese (zh)
Inventor
徐德前
夏晓旭
张启荣
王文强
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Power Tensors Intelligent Technology Co Ltd
Original Assignee
Shanghai Power Tensors Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Power Tensors Intelligent Technology Co Ltd filed Critical Shanghai Power Tensors Intelligent Technology Co Ltd
Priority to CN202310184155.4A priority Critical patent/CN116225705A/en
Publication of CN116225705A publication Critical patent/CN116225705A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the disclosure provides a resource allocation circuit, a method and a device, a task scheduler and a chip, wherein the resource allocation circuit comprises: a resource allocation register, a resource counter, a judging circuit and a resource scheduler; the resource allocation register is used for recording the enabling state of each execution unit in the plurality of execution units; the resource counter is used for carrying out real-time statistics on information of available execution units in the plurality of execution units; the judging circuit is used for setting the enabling state of at least one execution unit in the resource allocation register according to the statistical result; the resource scheduler is used for distributing target execution units for tasks to be scheduled according to the enabling states of all the execution units.

Description

Resource allocation circuit, method and device, task scheduler and chip
Technical Field
The disclosure relates to the technical field of chips, and in particular relates to a resource allocation circuit, a method and a device, a task scheduler and a chip.
Background
In the related art, tasks are typically scheduled to be executed on execution units. With the development of technology, the number of execution units is increasing. However, in the related art, a serial scheduling manner is generally used to schedule tasks, and the resource utilization rate of the scheduling manner for the execution unit is relatively low.
Disclosure of Invention
In a first aspect, embodiments of the present disclosure provide a resource allocation circuit, the resource allocation circuit comprising: a resource allocation register, a resource counter, a judging circuit and a resource scheduler; the resource allocation register is used for recording the enabling state of each execution unit in the plurality of execution units; the resource counter is used for carrying out real-time statistics on information of available execution units in the plurality of execution units; the judging circuit is used for setting the enabling state of at least one execution unit in the resource allocation register according to the statistical result; the resource scheduler is used for distributing target execution units for tasks to be scheduled according to the enabling states of all the execution units.
In some embodiments, the resource allocation circuitry further comprises: and the task transmitting circuit is connected with each execution unit through a transmission line corresponding to each execution unit and is used for enabling the transmission line corresponding to the target execution unit so as to transmit the task to the target execution unit.
In some embodiments, the resource allocation circuitry further comprises a configuration unit for storing configuration information; the judging circuit is used for: and determining the target execution unit from the available execution units based on the configuration information and the statistical result, and setting the enabling state of the target execution unit in the resource allocation register to be an enabled state.
In some embodiments, the configuration information includes: the first configuration information is used for determining the scheduling modes of the execution units; and/or second configuration information, configured to determine a constraint condition corresponding to the task, where the constraint condition is used to constrain the task-schedulable execution unit; and/or third configuration information for determining a power consumption level, different power consumption levels corresponding to different upper limits of the number of execution units in the enabled state.
In some embodiments, the constraints include at least any one of: the number of target execution units for which each of the tasks is scheduled; the number of clusters in which each task is scheduled to be executed by the target execution unit, wherein one cluster comprises one or more execution units; the number of target execution units for which a plurality of the tasks of the same user are scheduled; the number of clusters in which the target execution units of the tasks of the same user are scheduled.
In some embodiments, the resource allocation circuitry further comprises: and the task analysis circuit is used for analyzing the task to obtain the configuration information and transmitting the analyzed task to the resource scheduler.
In some embodiments, the resource allocation circuitry further comprises: and the task buffer is used for buffering the task to be scheduled and sending the buffered task to the resource scheduler.
In some embodiments, the number of task buffers is greater than 1, each task buffer for buffering one or more tasks to be scheduled; the number of the resource allocation registers and the number of the resource schedulers are both greater than 1; each task buffer corresponds to one resource allocation register and one resource scheduler, and each resource scheduler corresponding to a different task buffer allocates target execution units for tasks to be scheduled in parallel.
In some embodiments, the resource allocation circuitry further comprises: and the task distribution circuit is used for distributing the task to be scheduled to the task buffer.
In some embodiments, the plurality of execution units are divided into at least two clusters; the resource counter includes: a global resource counter for recording usage information of each of the at least two clusters; and a secondary resource counter for recording usage information of each execution unit in each cluster.
In some embodiments, the resource scheduler is configured to obtain tasks to be scheduled from a plurality of task buffers; the number of the secondary resource counters is larger than 1, and each secondary resource counter corresponds to one task buffer.
In some embodiments, the plurality of execution units are divided into at least two clusters; the resource allocation register includes: the first-level resource allocation register is used for recording the enabling state of each cluster in the at least two clusters; and a secondary resource allocation register for recording the enabling state of each execution unit in each cluster.
In some embodiments, the task includes a plurality of subtasks; the judging circuit is used for determining a target execution unit of each subtask from the available execution units and setting the enabling state of the target execution unit of each subtask in the resource allocation register to be an enabled state; the resource scheduler is configured to allocate a target execution unit for each subtask according to an enabling state of the target execution unit of each subtask.
In some embodiments, the determining circuit is further configured to: acquiring completion identification information sent by the target execution unit, wherein the completion identification information is used for indicating that the task scheduled to the target execution unit is processed and completed; and setting the resource allocation register according to the completion identification information.
In a second aspect, embodiments of the present disclosure provide a task scheduler, the task scheduler comprising: task buffer and resource allocation circuit; the task buffer is used for buffering the task to be scheduled and sending the buffered task to the resource allocation circuit; the resource allocation circuit is used for carrying out real-time statistics on information of available execution units in the plurality of execution units, allocating a target execution unit for the task from the available execution units according to a statistical result, and scheduling the task to be scheduled to the target execution unit.
In a third aspect, embodiments of the present disclosure provide a chip including a resource allocation circuit as described in any embodiment of the present disclosure, or including a task scheduler as described in any embodiment of the present disclosure.
In a fourth aspect, embodiments of the present disclosure provide an electronic device, including a chip according to any one of the embodiments of the present disclosure.
In a fifth aspect, an embodiment of the present disclosure provides a resource allocation method, which is applied to the judging circuit in the resource allocation circuit according to any one embodiment of the present disclosure; the method comprises the following steps: acquiring real-time statistical results of information of available execution units in a plurality of execution units; and setting the enabling state of each execution unit according to the real-time statistical result, so that the resource scheduler allocates a target execution unit for the task to be scheduled according to the enabling state of each execution unit.
In a sixth aspect, an embodiment of the present disclosure provides a resource allocation device, which is applied to the judging circuit in the resource allocation circuit according to any one embodiment of the present disclosure; the device comprises: the acquisition module is used for acquiring real-time statistical results of information of available execution units in the plurality of execution units; and the setting module is used for setting the enabling state of each execution unit according to the real-time statistical result so that the resource scheduler allocates a target execution unit for the task to be scheduled according to the enabling state of each execution unit.
In a seventh aspect, embodiments of the present disclosure provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the embodiments of the present disclosure.
According to the embodiment of the disclosure, the enabling states of all execution units are recorded through the resource allocation register, and the information of the available execution units is counted in real time through the resource counter, so that the judging circuit can set the enabling states in the resource allocation register according to real-time counting results, and the resource scheduler can allocate target execution units for tasks to be scheduled according to the enabling states in the resource allocation register. Because the real-time statistical result can reflect the idle condition of each execution unit, the method sets the enabling state according to the statistical result and performs task scheduling, and can distribute tasks to the idle execution units, thereby fully utilizing the resources of each execution unit and reducing the idle of the execution units.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.
Fig. 1 is a schematic diagram of a task allocation method in the related art.
Fig. 2 and 3 are schematic diagrams of resource allocation circuitry according to embodiments of the present disclosure, respectively.
FIG. 4 is a schematic diagram of a task scheduler of an embodiment of the present disclosure.
Fig. 5 is a flowchart of a resource allocation method of an embodiment of the present disclosure.
Fig. 6 is a block diagram of a resource allocation apparatus of an embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.
It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
In order to better understand the technical solutions in the embodiments of the present disclosure and make the above objects, features and advantages of the embodiments of the present disclosure more comprehensible, the technical solutions in the embodiments of the present disclosure are described in further detail below with reference to the accompanying drawings.
In the related art, each time a batch of tasks to be scheduled is received, the batch of tasks to be scheduled is scheduled to an execution unit, and after the batch of tasks is executed, the next batch of tasks is scheduled. That is, tasks of different batches are typically scheduled to execution units for execution by serial scheduling. However, the execution time of different tasks in the same batch may be different, which results in that after a part of the tasks is executed, the execution units allocated to the part of the tasks will be in an idle state. This phenomenon, known as task boundary wastage, can result in lower resource utilization by the execution units.
As shown in fig. 1, it is assumed that the execution unit includes CU1, CU2, and CU3, as shown by the blocks in the figure. In the figure, when an execution unit is colored other than white, it indicates that the execution unit is assigned to a certain task, and different colors (black, dark gray, and light gray) indicate that the execution unit is assigned to a different task, and when the execution unit is white, it indicates that the execution unit is not assigned to any task. At time t1, three tasks are respectively distributed to CU1, CU2 and CU3, so that the utilization rate of an execution unit reaches 100%; however, at time t2, since execution of the task assigned to CU3 is completed, CU3 is in an idle state; at time t3, the task assigned to CU2 is also performed, so both CU2 and CU3 are in an idle state; at time t4, tasks assigned to CU1, CU2, and CU3 are all performed, and at this time, a next batch of tasks may be reassigned to CU1, CU2, and CU3, respectively, so that the utilization of the execution units reaches 100%. It can be seen that at time t2 and time t3, since the tasks of the same batch are not all executed, there will be idle execution units, resulting in lower resource utilization of the execution units.
Based on this, the disclosed embodiments provide a resource allocation circuit, see fig. 2 and 3, comprising:
a resource allocation register 201, a resource counter 202, a judgment circuit 203, and a resource scheduler 204;
the resource allocation register 201 is configured to record an enable state of each execution unit of the plurality of execution units;
the resource counter 202 is configured to perform real-time statistics on information of available execution units in the plurality of execution units;
the judging circuit 203 is configured to set an enable state of at least one execution unit in the resource allocation register 201 according to the statistics result;
the resource scheduler 204 is configured to allocate a target execution unit to a task (kernel) to be scheduled according to an enabling state of each execution unit.
In the above embodiments, the execution unit may be a graphics processor (Graphic Processing Unit, GPU) or a computing core in an accelerator card, or a processing core in a central processor (Central Processing Unit, CPU), or the like. Further, the plurality of execution units may be divided into at least two clusters, each cluster including one or more execution units, and the number of execution units of different clusters may be the same or different.
The resource allocation register 201 may record the enable status of the various execution units. The enable state of an execution unit is used to characterize whether the execution unit is enabled. The enabled states may include enabled states that indicate that the execution unit has been enabled and disabled states that indicate that the execution unit has not been enabled. In case the plurality of execution units are divided into at least two clusters, the resource allocation register 201 may record the enable state of each cluster as well as the enable state of the respective execution units within the same cluster. Wherein the enable state of a cluster is used to characterize whether an enabled execution unit is included in the cluster. If the enabling state of a cluster is the enabled state, the cluster comprises at least one enabled execution unit; if the enabled state of a cluster is not enabled, it indicates that the cluster does not include enabled execution units.
In some embodiments, the resource allocation register 201 includes a primary resource allocation register for recording an enable state of each of the at least two clusters; and a secondary resource allocation register for recording the enabling state of each execution unit in each cluster. By dividing the clusters, the number of resource allocation registers 201 for recording the enabled state can be reduced, facilitating hardware implementation. Let m be the number of clusters, each cluster comprising n execution units, i.e. the total number of execution units is m x n, and let each task occupy one cluster fixedly. If the clusters are not partitioned, a total of m n resource allocation registers 201 are required. However, after the clusters are divided, only m primary resource allocation registers and n secondary resource allocation registers are needed to record the enabling states of m×n execution units.
The enabled execution units may be assigned one or more tasks. In some embodiments, the task may be one of a plurality of subtasks included in a task. The number of tasks that each execution unit can allocate is a finite upper limit, and the total number of tasks allocated to each execution unit is less than or equal to the upper limit of the number of tasks that the corresponding execution unit can allocate. The upper limits of the number of tasks that can be allocated by different execution units may or may not be equal. The upper limit of the number of tasks that each execution unit can allocate may be recorded in the determination circuit 203.
Each execution unit may be assigned to one or more tasks and execute the tasks assigned to the execution unit. Different execution units may execute tasks assigned to the execution units in parallel. If the total number of tasks allocated to an execution unit reaches an upper limit on the number of tasks that the execution unit can allocate, the execution unit is in an unavailable state, the execution unit is also called an unavailable execution unit; if the total number of tasks assigned to an execution unit does not reach the upper limit on the number of tasks that the execution unit can assign, the execution unit is in an available state, which is also referred to as an available execution unit.
Assuming a total number of execution units of N (e.g., 128), the resource counter 202 may count usage information for available execution units of the N execution units (i.e., execution units that may be assigned to one or more tasks). The usage information of one execution unit may include identification information of the execution unit for uniquely distinguishing the respective execution units; the number of tasks that the execution unit has been assigned to, and/or the occupancy status of the execution unit may also be included. Wherein the occupied state includes the above-mentioned available state and unavailable state.
In an embodiment in which the plurality of execution units are divided into at least two clusters, the resource counter 202 comprises a global resource counter for recording usage information of each of the at least two clusters; and a secondary resource counter for recording usage information of each execution unit in each cluster. Taking the example that the usage information is an occupied state, the occupied state of a cluster may be used to indicate whether an execution unit in an available state is included in the cluster. In the case that a cluster does not include an execution unit in an available state, the occupied state of the cluster is an unavailable state; in case a cluster comprises at least one execution unit in an available state, the occupied state of the cluster is the available state. In the case that the number of execution units in an available state in one cluster increases by 1, the count value corresponding to the cluster in the global resource counter may be incremented by 1; in the case where the number of execution units in an available state in one cluster is reduced by 1, the count value corresponding to the cluster in the global resource counter may be reduced by 1. If the count value corresponding to a certain cluster in the global resource counter is equal to 0, the occupied state of the cluster is an unavailable state; if the count value corresponding to a certain cluster in the global resource counter is greater than 0, the occupied state of the cluster is an available state.
Each time a task is scheduled to an execution unit, the count value corresponding to the execution unit in the secondary resource counter can be increased by 1; each time an execution unit completes a task, the count value corresponding to the execution unit in the secondary resource counter may be decremented by 1. In some embodiments, after scheduling the task on the execution unit, the execution unit may send an enable signal to the secondary resource counter to cause the secondary resource counter to increment the count value by 1; after the execution unit performs the task, the execution unit may send completion identification information to the secondary resource counter to cause the secondary resource counter to decrement the count value by 1. If the count value corresponding to a certain execution unit in the secondary resource counter reaches the upper limit of the task number which can be allocated by the execution unit, the occupied state of the execution unit is an unavailable state; if the count value corresponding to a certain execution unit in the secondary resource counter is smaller than the upper limit of the task number which can be allocated by the execution unit, the occupied state of the execution unit is an available state.
In some embodiments, the resource allocation circuit further includes a task buffer 205, configured to buffer tasks to be scheduled, and send the buffered tasks to the resource scheduler 204. Wherein the number of task buffers 205 is greater than or equal to 1, each task buffer 205 is configured to buffer one or more tasks to be scheduled. In the case where one task buffer 205 is used to buffer multiple tasks, the delay introduced by task switching within the same task buffer 205 (i.e., the delay between the time a task in the task buffer 205 is sent out and the time the next task is stored in the task buffer 205) can be reduced.
In the case where the number of task buffers 205 is greater than 1, the number of the resource allocation registers 201 and the number of the resource schedulers 204 are both greater than 1; each task buffer 205 corresponds to one resource allocation register 201 and one resource scheduler 204, and the respective resource schedulers 204 corresponding to the different task buffers 205 allocate target execution units for the tasks to be scheduled in parallel. For example, assuming that the number of task buffers 205 is 8, the number of resource allocation registers 201 and resource schedulers 204 are also 8, where the resource allocation registers 201 and resource schedulers 204 corresponding to a first task buffer 205 may be used to allocate tasks buffered in the first task buffer 205 to target execution units, the resource allocation registers 201 and resource schedulers 204 corresponding to a second task buffer 205 may be used to allocate tasks buffered in the second task buffer 205 to target execution units, and so on. The 8 resource schedulers 204 may operate in parallel such that tasks in different task buffers 205 are allocated to respective target execution units in parallel. Further, the resource allocation circuit further comprises a task distribution circuit 206 for distributing the tasks to be scheduled to the task buffer 205. The task distribution circuit 206 may distribute the respective tasks that do not have a dependency relationship to different task buffers 205 such that the tasks that do not have a dependency relationship are distributed to the respective target execution units in parallel.
In embodiments where the number of task buffers 205 is greater than 1, the resource scheduler 204 may obtain tasks to be scheduled from a plurality of task buffers 205. On this basis, the number of secondary resource counters may be greater than 1, each secondary resource counter corresponding to one task buffer 205.
The decision circuit 203 may set the enable status of at least one execution unit in the resource allocation register 201 based on the statistics of the resource counter 202. For example, after the resource allocation circuitry receives a task to be scheduled, the determination circuitry 203 may set the enabled state of at least one available execution unit to an enabled state. For another example, the determining circuit 203 may obtain the completion identification information sent by the target execution unit, and set the resource allocation register 201 according to the completion identification information. The completion identification information is used for indicating that the task scheduled to the target execution unit is processed to be completed. If the execution unit has sent completion identification information indicating that the execution unit can be assigned at least one task, the determination circuit 203 can set the enable state of the execution unit to the enabled state. For another example, after assigning the task to be scheduled to a certain execution unit, if the total number of tasks assigned by the execution unit reaches the upper limit of the number of tasks assignable by the execution unit, the execution unit is in an unavailable state, and thus, the determination circuit 203 may set the enabled state of the execution unit to an disabled state.
The same task may be assigned to one or more available execution units and the decision circuit 203 may set the enabling state of which execution unit or units based on a predetermined policy. The policy of setting the enabled state is exemplified below.
In some embodiments, the policy may include setting the number of execution units in the enabled state based on a pre-set power consumption level. Wherein, the higher the power consumption level, the lower the power consumption of the system. For example, in the case of a higher power consumption level, the enabled state of the fewer number of available execution units may be set to the enabled state, such that the same task is allocated to the fewer number of available execution units; in the case of a lower power consumption level, the enabling state of a larger number of available execution units may be set to the enabled state, so that the same task is allocated to the larger number of available execution units.
In other embodiments, the policy may include setting the enable state of the execution unit based on a set instruction sent by the user. The user may set the enabled state of a particular available execution unit to an enabled state by sending a set instruction.
In other embodiments, the policy may include setting the enabling state of the execution units according to information of the task to be scheduled. For example, the information of the task includes the type of the task, different types of tasks may be fixedly allocated to different execution units, and in case that a certain type of task is received, the enabled state of the available execution units corresponding to the type may be set to the enabled state.
In other embodiments, the policy may include setting the enable state of the execution units according to a scheduling manner. For example, in the case where the scheduling manner is the Round-Robin manner, assuming that the number of the available execution units that were set to the enabled state last time is k, the enabled state of one or more available execution units that have a number greater than k and are closest to k may be set to the enabled state when the available execution units are set this time.
In addition to the several strategies listed above, other strategies may be employed to set the enable state of the various execution units, which are not listed here.
In some embodiments, the resource allocation circuitry further comprises a configuration unit 207 for storing configuration information. The configuration unit 207 may include one or more configuration registers. The determining circuit 203 may determine the target execution unit from the available execution units based on the configuration information and the statistics, and set the enabled state of the target execution unit in the resource allocation register 201 to an enabled state. Wherein the configuration information may be carried in the task. Further, the resource allocation circuit further includes a task parsing circuit 209, configured to parse the task to obtain the configuration information. The task parsing circuit 209 may store the parsed configuration information in the configuration unit 207, and may also send the parsed task to the resource scheduler 204.
The configuration information may be used to determine policies for setting the enable state, with different configuration information corresponding to different policies. In some embodiments, the configuration information may include first configuration information for determining a scheduling manner of the plurality of execution units. The different first configuration information corresponds to different scheduling modes. For example, the first configuration information "00" may correspond to a Round-Robin manner. In this way, the determination circuit 203 may determine the target execution unit using Round-Robin when the first configuration information is "00" read from the configuration unit 207, and set the enabled state of the target execution unit to the enabled state.
In other embodiments, the configuration information includes second configuration information, where the second configuration information is used to determine a constraint condition corresponding to the task, where the constraint condition is used to constrain an execution unit that can be scheduled by the task. And the constraint is carried out on the task-schedulable execution units, including the constraint is carried out on the number and/or the number of the task-schedulable execution units. The different second configuration information corresponds to a different number and/or different numbering of schedulable execution units. Taking the example that the different second configuration information corresponds to the different number of schedulable execution units, the number of schedulable execution units corresponding to the second configuration information "10" may be 5, the determining circuit 203 may determine 5 available execution units as target execution units and set the enabled states of the 5 target execution units to the enabled states when reading the second configuration information "10" from the configuration unit 207. The number of schedulable execution units corresponding to each configuration information may be preset.
In particular, the constraints may include, but are not limited to, at least any of the following:
(1) The number of target execution units for which each of the tasks is scheduled. In particular, the constraint may be that the number of target execution units per task scheduled is less than or equal to a preset first number threshold. By setting the constraint condition, the situation that a single task occupies excessive resources and other tasks cannot be executed can be avoided. In the case where a task includes a plurality of subtasks, the constraint may be that the total number of target execution units for executing the respective subtasks included in the same task is less than or equal to a preset first number threshold.
(2) And the number of clusters where each task is scheduled by the target execution unit, wherein one cluster comprises one or more execution units. Specifically, the constraint condition may be that the number of clusters in which the target execution units scheduled for each task are located is less than or equal to a preset second number threshold. By setting the constraint condition, the execution units for executing the same task can be prevented from being excessively dispersed. In the case that a task includes a plurality of subtasks, the constraint condition may be that the total number of clusters where the target execution units of each subtask scheduled in the same task are located is less than or equal to a preset second number threshold.
(3) The number of target execution units for which a plurality of the tasks of the same user are scheduled. Specifically, the constraint condition may be that the number of target execution units for which a plurality of tasks of the same user are scheduled is less than or equal to a preset third number threshold. By setting the constraint condition, the situation that tasks of the same user occupy excessive resources and other users cannot execute can be avoided. In the case that a task includes a plurality of subtasks, the constraint condition may be that the total number of target execution units to which each subtask included in the task of the same user is scheduled is less than or equal to a preset third number threshold.
(4) The number of clusters in which the target execution units of the tasks of the same user are scheduled. Specifically, the constraint condition may be that the number of clusters where the target execution units where the tasks of the same user are scheduled are located is smaller than or equal to a preset fourth number threshold. By setting the constraint condition, the execution units for executing the tasks of the same user can be prevented from being excessively distributed. In the case that a task includes a plurality of subtasks, the constraint condition may be that the total number of clusters where the target execution units of the respective subtasks are scheduled in the task of the same user is smaller than or equal to a preset fourth number threshold.
In addition to the above constraints, one or more other constraints may be set according to actual needs, which are not listed here.
In other embodiments, the configuration information includes third configuration information for determining a power consumption level, different power consumption levels corresponding to different upper limits of the number of execution units in the enabled state. For example, the third configuration information "1" corresponds to a higher power consumption level at which the upper limit of the number of execution units in the enabled state is 3; the third configuration information "0" corresponds to a lower power consumption level at which the upper limit of the number of execution units in the enabled state is 5. In this way, the determination circuit 203 can set the number of execution units in the enabled state to a number not exceeding the corresponding upper limit of the number, respectively, when reading the third configuration information as "0" or "1" from the configuration unit 207.
In addition to the first configuration information, the second configuration information, and the third configuration information described above, other configuration information may be included in the configuration unit 207, which is not listed here.
After setting the enabling states of the execution units, the resource scheduler 204 may allocate a target execution unit for the task to be scheduled according to the enabling states of the respective execution units. Specifically, the resource scheduler 204 may determine each execution unit in the enabled state as a target execution unit and assign the task to be scheduled to the target execution unit. Wherein a task may comprise a plurality of sub-tasks, and different sub-tasks may be assigned to different target execution units. The granularity of the dividing subtasks can be adjusted according to the processing capability of the execution unit, for example, the subtasks can be divided with the granularity of a thread block (block), that is, each subtask is executed by one thread block; alternatively, subtasks may be divided at the granularity of threads (threads), i.e., each subtask is performed by one thread. Subtasks may also be divided according to other granularities, which are not listed here.
In the case that a task includes a plurality of sub-tasks, the judging circuit 203 may determine a target execution unit of each sub-task from the available execution units and set an enable state of the target execution unit of each sub-task in the resource allocation register 201 to an enabled state. The resource scheduler 204 may allocate a target execution unit for each sub-task based on the enable status of the target execution unit for each sub-task. For example, assuming that one task includes a subtask 1 and a subtask 2, and that the target execution unit of the subtask 1 is the execution unit 1 and the target execution unit of the subtask 2 is the execution unit 2, the determination circuit 203 may set the enabled states of the execution unit 1 and the execution unit 2 to the enabled state, so that the resource scheduler 204 may allocate the subtask 1 to the execution unit 1 and the subtask 2 to the execution unit 2.
Further, the resource allocation circuit may further include a task transmitting circuit 208, connected to each execution unit through a transmission line corresponding to each execution unit, for enabling the transmission line corresponding to the target execution unit to send the task to the target execution unit. For example, in the case where the target execution unit includes the execution unit 1 and the execution unit 2, the task transmission circuit 208 may enable the transmission line corresponding to the execution unit 1 and the transmission line corresponding to the execution unit 2 to transmit the task onto the execution unit 1 and the execution unit 2.
According to the embodiment of the disclosure, the enabling states of all execution units are recorded through the resource allocation register, and the information of the available execution units is counted in real time through the resource counter, so that the judging circuit can set the enabling states in the resource allocation register according to real-time counting results, and the resource scheduler can allocate target execution units for tasks to be scheduled according to the enabling states in the resource allocation register. Because the real-time statistical result can reflect the idle condition of each execution unit, the method sets the enabling state according to the statistical result and performs task scheduling, and can distribute tasks to the idle execution units, thereby fully utilizing the resources of each execution unit and reducing the idle of the execution units.
For example, after a batch of tasks are distributed to the execution units, if the execution units execute part of the tasks, the idle state of the execution units executing the part of the tasks can be counted by the resource counter in real time, so that the judging circuit can set the enabling state of the execution units to the enabled state immediately after the execution units are in the idle state, and the resource scheduler is enabled to redistribute new tasks to the execution units, so that the same batch of tasks are not required to execute to finish the redistribution of the new tasks, the boundary waste phenomenon is reduced, and the resource utilization rate of the execution units is improved.
Referring to fig. 4, the embodiment of the present disclosure further provides a task scheduler including:
a task buffer 401 and a resource allocation circuit 402;
the task buffer 401 is configured to buffer a task to be scheduled, and send the buffered task to the resource allocation circuit 402;
the resource allocation circuit 402 is configured to perform real-time statistics on information of available execution units in the multiple execution units, allocate a target execution unit for the task from the available execution units according to a statistical result, and schedule the task to be scheduled to the target execution unit.
The number of task buffers 401 may be greater than or equal to 1, and the tasks in each task buffer 401 may be issued to the resource allocation circuit 402 in parallel, and the task may be allocated to a corresponding computing core by the resource allocation circuit 402. The number of task buffers 401 and the number of computing cores may be parametrically adjusted, for example, 8 task buffers 401 are taken as an example, where each task buffer 401 may buffer multiple tasks (to solve the delay introduced by task switching in the same buffer), and the 8 task buffers 401 support parallel scheduling of 8 tasks at maximum. The task may be split into multiple subtasks, the granularity of a single subtask may be adjusted according to the computing core capabilities, the disclosure describes the splitting of subtasks based on thread block (block) granularity, and assumes a number of computing cores of 128.
The resource allocation circuit 402 monitors the idle states (i.e., the occupied states in the foregoing embodiments) of 128 computing cores in real time, and completes the occupation and release of the computing cores, supporting the dynamic allocation of the computing cores to a plurality of tasks simultaneously. The resource allocation circuitry 402 may allocate a portion of the compute cores for each task, enabling the parallel allocation of multiple tasks. The specific structure and operation principle of the resource allocation circuit 402 can be referred to the foregoing embodiments of the resource allocation circuit, and will not be described herein.
In some embodiments, the task scheduler further includes task emission circuitry 403, and the number of task emission circuitry 403 may be greater than or equal to 1. The number of task transmitting circuits 403 may be equal to the number of task buffers 401, and each task transmitting circuit 403 corresponds to one task buffer 401 and is configured to transmit the tasks in the corresponding task buffer 401 to a corresponding target execution unit (for example, a computing core, which is shown as a core 1, a core 2, a … …, and a core N). Each task transmitting circuit 403 is directly connected to all the computing cores, and the transmission paths are managed by the resource allocation circuit 402, so that dynamic enabling of the transmission paths is realized.
The computing core receives the subtasks and performs the computation, returns to the completion state of the computing core after the computing core completes the computation, sends the completion state to the global resource counter through the bus interface, and the resource allocation circuit 402 updates the primary/secondary resource allocation register and the secondary resource counter. The functions of the resource allocation circuitry 402 are repeated until all tasks are completed.
On the basis of dynamic allocation of the computing cores, constraint conditions of tasks can be set, including cluster constraint conditions (used for constraining the clusters to which the tasks are allocated) and computing core constraint conditions (used for constraining the computing cores to which the tasks are allocated), and each execution unit is dynamically allocated to a new task under the constraint of the constraint conditions. The constraint condition can be obtained based on configuration information, the configuration information can be carried in a task, and the task is analyzed by a task analysis circuit. The configuration information can be stored in a configuration register, and the judging circuit jointly generates enabling signals required by the resource scheduler according to the configuration register value, the global resource counter, the primary/secondary resource allocation register and the secondary resource counter. The cluster number and the computation core number generated by the resource scheduler satisfy the constraint conditions.
According to the embodiment of the disclosure, the parallel scheduling of the tasks is realized in a hardware mode, the multi-task dynamic allocation of the resources can be realized at a hardware level, the dynamic occupation and release of the computing cores are realized, and due to the fact that the mixing of the task sizes is uncertain, compared with the static allocation, the idle of part of the computing cores cannot be avoided, the boundary waste of the tasks can be avoided by the dynamic allocation, and the computing cores can be allocated between the tasks in a seamless connection mode, so that the resource utilization rate of the computing cores is improved, and the execution delay of the multi-task is reduced.
Since the number of computing cores occupied by each task can be constrained by constraints, more complex task mix scenarios can be accommodated. For example, some resident subtasks may be assigned fewer computing cores to reduce situations where the resident subtasks occupy too many computing cores for a long period of time, resulting in other subtasks not being able to execute. For example, more computing cores can be allocated to subtasks with higher real-time requirements, so that the processing real-time performance of the computing cores is improved.
The embodiment of the disclosure also provides a chip, which comprises the resource allocation circuit of any embodiment of the disclosure or the task scheduler of any embodiment of the disclosure.
The embodiment of the disclosure also provides electronic equipment, which comprises the chip of any embodiment of the disclosure.
As shown in fig. 5, the embodiment of the present disclosure further provides a resource allocation method, which is applied to the judging circuit in the resource allocation circuit according to any one embodiment of the present disclosure; the method comprises the following steps:
step 501: acquiring real-time statistical results of information of available execution units in a plurality of execution units;
step 502: and setting the enabling state of each execution unit according to the real-time statistical result, so that the resource scheduler allocates a target execution unit for the task to be scheduled according to the enabling state of each execution unit.
The specific steps executed by the judging circuit in the embodiments of the present disclosure are detailed in the foregoing embodiments of the resource allocation circuit, and are not repeated here.
Referring to fig. 6, an embodiment of the present disclosure further provides a resource allocation device, which is applied to the judging circuit in the resource allocation circuit according to any one embodiment of the present disclosure; the device comprises:
an obtaining module 601, configured to obtain a real-time statistical result of information of an available execution unit of the plurality of execution units;
the setting module 602 is configured to set an enabling state of each execution unit according to the real-time statistics result, so that the resource scheduler allocates a target execution unit to a task to be scheduled according to the enabling state of each execution unit.
In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated for brevity.
The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of the previous embodiments.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
From the foregoing description of embodiments, it will be apparent to those skilled in the art that the present embodiments may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be embodied in essence or what contributes to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present specification.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer apparatus or entity, or by an article of manufacture having some function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The apparatus embodiments described above are merely illustrative, in which the modules illustrated as separate components may or may not be physically separate, and the functions of the modules may be implemented in the same piece or pieces of software and/or hardware when implementing the embodiments of the present disclosure. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing is merely a specific implementation of the embodiments of this disclosure, and it should be noted that, for a person skilled in the art, several improvements and modifications may be made without departing from the principles of the embodiments of this disclosure, and these improvements and modifications should also be considered as protective scope of the embodiments of this disclosure.

Claims (15)

1. A resource allocation circuit, the resource allocation circuit comprising:
a resource allocation register, a resource counter, a judging circuit and a resource scheduler;
the resource allocation register is used for recording the enabling state of each execution unit in the plurality of execution units;
the resource counter is used for carrying out real-time statistics on information of available execution units in the plurality of execution units;
the judging circuit is used for setting the enabling state of at least one execution unit in the resource allocation register according to the statistical result;
the resource scheduler is used for distributing target execution units for tasks to be scheduled according to the enabling states of all the execution units.
2. The resource allocation circuit according to claim 1, wherein the resource allocation circuit further comprises:
the task transmitting circuit is connected with each execution unit through a transmission line corresponding to each execution unit and is used for enabling the transmission line corresponding to the target execution unit so as to transmit the task to the target execution unit;
And/or the number of the groups of groups,
the resource allocation circuit also comprises a configuration unit for storing configuration information; the judging circuit is used for:
and determining the target execution unit from the available execution units based on the configuration information and the statistical result, and setting the enabling state of the target execution unit in the resource allocation register to be an enabled state.
3. The resource allocation circuit according to claim 2, wherein the configuration information comprises:
the first configuration information is used for determining the scheduling modes of the execution units; and/or
The second configuration information is used for determining constraint conditions corresponding to the tasks, and the constraint conditions are used for constraining the task-schedulable execution units; and/or
And third configuration information for determining a power consumption level, different power consumption levels corresponding to different upper limits of the number of execution units in the enabled state.
4. A circuit according to claim 3, wherein the constraint comprises at least any one of:
the number of target execution units for which each of the tasks is scheduled;
the number of clusters in which each task is scheduled to be executed by the target execution unit, wherein one cluster comprises one or more execution units;
The number of target execution units for which a plurality of the tasks of the same user are scheduled;
the number of clusters in which the target execution units of the tasks of the same user are scheduled.
5. The resource allocation circuit according to any one of claims 1 to 4, wherein,
the resource allocation circuit also comprises a configuration unit for storing configuration information; the judging circuit is used for: determining the target execution unit from the available execution units based on the configuration information and the statistical result, and setting the enabling state of the target execution unit in the resource allocation register to be an enabled state; the resource allocation circuit further comprises: the task analysis circuit is used for analyzing the task to obtain the configuration information and sending the analyzed task to the resource scheduler;
and/or
The resource allocation circuit further comprises: the task buffer is used for buffering the task to be scheduled and sending the buffered task to the resource scheduler;
and/or
The resource allocation circuit further comprises: and the task distribution circuit is used for distributing the task to be scheduled to the task buffer.
6. The resource allocation circuit according to claim 5, wherein in case the resource allocation circuit comprises task buffers, the number of task buffers is greater than 1, each task buffer being for buffering one or more tasks to be scheduled;
the number of the resource allocation registers and the number of the resource schedulers are both greater than 1; each task buffer corresponds to one resource allocation register and one resource scheduler, and each resource scheduler corresponding to a different task buffer allocates target execution units for tasks to be scheduled in parallel.
7. The resource allocation circuit according to any one of claims 1 to 6, wherein the plurality of execution units are divided into at least two clusters; the resource counter includes:
a global resource counter for recording usage information of each of the at least two clusters; and
and the secondary resource counter is used for recording the use information of each execution unit in each cluster.
8. The resource allocation circuit according to claim 7, wherein the resource scheduler is configured to obtain tasks to be scheduled from a plurality of task buffers;
The number of the secondary resource counters is larger than 1, and each secondary resource counter corresponds to one task buffer.
9. The resource allocation circuit according to any one of claims 1 to 8, wherein,
the plurality of execution units are divided into at least two clusters; the resource allocation register includes:
the first-level resource allocation register is used for recording the enabling state of each cluster in the at least two clusters; and
the second-level resource allocation register is used for recording the enabling state of each execution unit in each cluster;
and/or
The task includes a plurality of subtasks;
the judging circuit is used for determining a target execution unit of each subtask from the available execution units and setting the enabling state of the target execution unit of each subtask in the resource allocation register to be an enabled state;
the resource scheduler is used for distributing target execution units for each subtask according to the enabling state of the target execution units of each subtask;
and/or
The judging circuit is further configured to:
acquiring completion identification information sent by the target execution unit, wherein the completion identification information is used for indicating that the task scheduled to the target execution unit is processed and completed;
And setting the resource allocation register according to the completion identification information.
10. A task scheduler, the task scheduler comprising:
task buffer and resource allocation circuit;
the task buffer is used for buffering the task to be scheduled and sending the buffered task to the resource allocation circuit;
the resource allocation circuit is used for carrying out real-time statistics on information of available execution units in the plurality of execution units, allocating a target execution unit for the task from the available execution units according to a statistical result, and scheduling the task to be scheduled to the target execution unit.
11. A chip comprising the resource allocation circuitry of any one of claims 1 to 9 or comprising the task scheduler of claim 10.
12. An electronic device comprising the chip of claim 11.
13. A resource allocation method characterized by being applied to the judgment circuit in the resource allocation circuit according to any one of claims 1 to 9; the method comprises the following steps:
acquiring real-time statistical results of information of available execution units in a plurality of execution units;
And setting the enabling state of each execution unit according to the real-time statistical result, so that the resource scheduler allocates a target execution unit for the task to be scheduled according to the enabling state of each execution unit.
14. A resource allocation apparatus characterized by a judgment circuit applied to the resource allocation circuit according to any one of claims 1 to 9; the device comprises:
the acquisition module is used for acquiring real-time statistical results of information of available execution units in the plurality of execution units;
and the setting module is used for setting the enabling state of each execution unit according to the real-time statistical result so that the resource scheduler allocates a target execution unit for the task to be scheduled according to the enabling state of each execution unit.
15. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method of claim 13.
CN202310184155.4A 2023-02-28 2023-02-28 Resource allocation circuit, method and device, task scheduler and chip Pending CN116225705A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310184155.4A CN116225705A (en) 2023-02-28 2023-02-28 Resource allocation circuit, method and device, task scheduler and chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310184155.4A CN116225705A (en) 2023-02-28 2023-02-28 Resource allocation circuit, method and device, task scheduler and chip

Publications (1)

Publication Number Publication Date
CN116225705A true CN116225705A (en) 2023-06-06

Family

ID=86571007

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310184155.4A Pending CN116225705A (en) 2023-02-28 2023-02-28 Resource allocation circuit, method and device, task scheduler and chip

Country Status (1)

Country Link
CN (1) CN116225705A (en)

Similar Documents

Publication Publication Date Title
EP1868108B1 (en) Information processing apparatus and access control method for high-speed data access
US20100211954A1 (en) Practical contention-free distributed weighted fair-share scheduler
JP2005509976A (en) Method and system for allocating budget surplus to tasks
KR20200054403A (en) System on chip including multi-core processor and task scheduling method thereof
CN105183565A (en) Computer and service quality control method and device
CN111352736A (en) Method and device for scheduling big data resources, server and storage medium
US7617344B2 (en) Methods and apparatus for controlling access to resources in an information processing system
CN110347602B (en) Method and device for executing multitasking script, electronic equipment and readable storage medium
CN114327843A (en) Task scheduling method and device
CN113238848A (en) Task scheduling method and device, computer equipment and storage medium
CN109840149B (en) Task scheduling method, device, equipment and storage medium
KR20130068685A (en) Hybrid main memory system and task scheduling method therefor
CN112181613B (en) Heterogeneous resource distributed computing platform batch task scheduling method and storage medium
CN115562843A (en) Container cluster computational power scheduling method and related device
CN112925616A (en) Task allocation method and device, storage medium and electronic equipment
CN113342544B (en) Design method of data storage architecture, message transmission method and device
CN114629960A (en) Resource scheduling method, device, system, device, medium, and program product
CN112860387A (en) Distributed task scheduling method and device, computer equipment and storage medium
CN114637536A (en) Task processing method, computing coprocessor, chip and computer equipment
CN112214299A (en) Multi-core processor and task scheduling method and device thereof
CN115904671B (en) Task scheduling method, device, equipment and medium in edge computing environment
CN116225705A (en) Resource allocation circuit, method and device, task scheduler and chip
CN115766582A (en) Flow control method, device and system, medium and computer equipment
CN110704182A (en) Deep learning resource scheduling method and device and terminal equipment
CN107678853B (en) Method and device for scheduling graphic processing task

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination