CN106155811B - Resource service device, resource scheduling method and device - Google Patents

Resource service device, resource scheduling method and device Download PDF

Info

Publication number
CN106155811B
CN106155811B CN201510208923.0A CN201510208923A CN106155811B CN 106155811 B CN106155811 B CN 106155811B CN 201510208923 A CN201510208923 A CN 201510208923A CN 106155811 B CN106155811 B CN 106155811B
Authority
CN
China
Prior art keywords
gpu
resource
graphics processing
scheduling
processing device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510208923.0A
Other languages
Chinese (zh)
Other versions
CN106155811A (en
Inventor
孔建钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201510208923.0A priority Critical patent/CN106155811B/en
Priority to PCT/CN2016/079865 priority patent/WO2016173450A1/en
Publication of CN106155811A publication Critical patent/CN106155811A/en
Application granted granted Critical
Publication of CN106155811B publication Critical patent/CN106155811B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements

Abstract

The embodiment of the application discloses a graphic processing device. The GPU-MPS is an agent for scheduling the graphics processing device, one client of the GPU-MPS can schedule at least one logic unit, one task process corresponds to one client of the GPU-MPS, and the maximum number of logic units which can be contained in the graphics processing device is M multiplied by N multiplied by K; m is the number of logical units which can be dispatched by one client of the GPU-MPS, N is the maximum client number contained in the GPU-MPS, and K is the number of the GPU-MPS mapped by the graphic processing device. By the method and the device, the utilization rate of GPU resources can be improved, and meanwhile, the expenditure for establishing and switching GPU contexts can be saved for the graphics processing device. The application also discloses a resource service device, a resource scheduling method and a resource scheduling device.

Description

Resource service device, resource scheduling method and device
Technical Field
The present application relates to the field of computer applications, and in particular, to a graphics processing apparatus, a resource service apparatus, a resource scheduling method, and a resource scheduling apparatus.
Background
As Graphics Processing is becoming more important in modern computers, a core processor dedicated to Graphics Processing is required, and a Graphics Processing Unit (GPU) is a device dedicated to Graphics Processing. Meanwhile, the GPU has an increasing popularity in handling General-purpose computing (GPGPU) with its powerful computing power, and is used in various high-performance computing clusters.
Currently, in the existing GPU clustering technology, when a job (job) submitted by a user is processed, two methods for scheduling GPU resources mainly exist. One of the scheduling methods is that the resource scheduler schedules a GPU (e.g., a GPU card) to only one job of a user. Another scheduling method is that the resource scheduler schedules one GPU to jobs of multiple users simultaneously.
In the process of implementing the present application, the inventors of the present application found that at least the following problems exist in the prior art: in the first scheduling method, since one GPU is exclusively occupied by only one user's job, and the job of one user is likely to fail to fully utilize the resources of one GPU, the problem of low utilization rate of GPU resources may occur. In the second scheduling method, one GPU is shared by jobs of multiple users, and the multiple users are more likely to fully utilize resources of one GPU, so that the utilization rate of GPU resources is improved to a certain extent.
Although the second scheduling method can improve the utilization rate of GPU resources, when jobs of multiple users share one GPU, the number of processes for simultaneously starting the jobs of the multiple users may be large, and for each process, the GPU needs to establish a GPU context, so the number of the GPU contexts established on the GPU may be very large, and switching may be performed in a large number of GPU contexts, and establishing and switching the GPU contexts may cause huge overhead on GPU resources, thereby causing the problem of excessive sharing of the GPU.
Disclosure of Invention
In order to solve the above technical problems, embodiments of the present application provide a graphics processing apparatus, a resource service apparatus, a resource scheduling method, and a resource scheduling apparatus, so that the overhead of establishing and switching a context of a GPU can be saved while the utilization rate of GPU resources is improved. Further, the problem of excessive sharing of GPUs is avoided as much as possible.
The embodiment of the application discloses the following technical scheme:
a graphics processing device in which a logical unit is a smallest graphics processor GPU resource scheduling unit, the graphics processing device mapping at least one GPU multiprocess proxy server GPU-MPS, the GPU-MPS being a proxy for scheduling the graphics processing device, one client of the GPU-MPS being schedulable for the at least one logical unit, one task process being one client of the GPU-MPS, the graphics processing device being capable of containing a maximum number of logical units of mxnxk;
wherein, M is a client schedulable logical unit number of the GPU-MPS, N is a maximum client number included in the GPU-MPS, K is a GPU-MPS number mapped by the graphics processing apparatus, and M, N and K are both non-zero positive integers.
Preferably, one client of the GPU-MPS may schedule one logical unit.
Preferably, the graphics processing apparatus maps one GPU multiprocess proxy server.
Preferably, the graphics processing apparatus includes M × N × K logic units.
A resource servicing apparatus comprising at least one graphics processing apparatus, a monitoring unit and a first communication unit as described in any one of the above,
the monitoring unit is used for monitoring the number of the logic units left in the graphic processing device in the current period when the monitoring period is reached;
the first communication unit is used for sending the monitored data to the monitoring nodes in the cluster, so that the monitoring nodes update the preset resource dynamic table by using the monitored data atoms when the update cycle arrives;
wherein the resource dynamic table at least contains the number of the logic units left in the graphic processing device.
Preferably, the resource service device is a slave node in a cluster.
Preferably, the resource dynamic table further includes an actual usage rate of the graphics processing apparatus; the monitoring unit is further configured to monitor an actual usage rate of the local graphics processing apparatus in the current period when the monitoring period arrives.
A resource scheduling method applied to the resource service device in any one of the above embodiments, the method comprising:
receiving a scheduling request for scheduling GPU resources of a graphic processor for a target operation, wherein the number of logic units requiring scheduling is indicated in the scheduling request;
in response to the scheduling request, searching the graphics processing devices with the number of the remaining logic units not equal to zero from a preset resource dynamic table, and scheduling the logic units for the target job from the searched graphics processing devices according to the number indicated by the scheduling request;
wherein the resource dynamic table at least contains the number of the logic units left in the graphic processing device.
Preferably, the resource dynamic table further includes an actual usage rate of the graphics processing apparatus;
in response to the scheduling request, searching for the graphics processing apparatus with the non-zero number of remaining logical units from a preset resource dynamic table, and according to the number indicated by the scheduling request, scheduling the logical units for the target job from the searched graphics processing apparatus is:
and responding to the scheduling request, searching the graphics processing devices with the actual utilization rate less than or equal to a preset maximum threshold value and the number of the remaining logic units not equal to zero from a preset resource dynamic table, and scheduling the logic units for the target operation from the searched graphics processing devices according to the number indicated by the scheduling request.
Preferably, the resource dynamic table further includes an operating state of a resource service device in the resource server cluster and an operating state of a graphics processing device in the resource service device; the method further comprises the following steps:
and when the updating period is reached, atomically updating the working state of the resource service device and the working state of the graphic processing device in the resource dynamic table, wherein the working state comprises working state and non-working state.
A resource scheduling apparatus applied to any one of the above resource service apparatuses, comprising:
the second communication unit is used for receiving a scheduling request for scheduling GPU resources for a target job, wherein the number of logic units requiring scheduling is indicated in the scheduling request;
a response unit, configured to, in response to the scheduling request, search for graphics processing apparatuses with a non-zero number of remaining logic units from a preset resource dynamic table, and schedule logic units for the target job from the searched graphics processing apparatuses according to the number indicated by the scheduling request;
wherein the resource dynamic table at least contains the number of the logic units left in the graphic processing device.
Preferably, the resource dynamic table further includes an actual usage rate of the graphics processing apparatus;
the response unit is specifically configured to, in response to the scheduling request, search, from a preset resource dynamic table, for the graphics processing apparatus whose actual utilization rate is less than or equal to a preset maximum threshold and the number of remaining logic units is not zero, and schedule the logic units for the target job from the searched graphics processing apparatus according to the number indicated by the scheduling request.
Preferably, the resource dynamic table further includes an operating state of a resource service device in the resource server cluster and an operating state of a graphics processing device in the resource service device; the device further comprises:
and the updating unit is used for atomically updating the working state of the resource service device and the working state of the graphic processing device in the resource dynamic table when the updating period is up, wherein the working state comprises working and non-working.
As can be seen from the above embodiments, compared with the prior art, the present application has the following advantages:
because the logic unit is the minimum GPU resource scheduling unit, different logic units in one graphic processing device can be scheduled to different task processes, so that different user jobs jointly occupy the same graphic processing device, and the utilization rate of GPU resources in the graphic processing device is ensured. Meanwhile, one task process becomes a client of the GPU-MPS by utilizing the GPU-MPS technology, so that the GPU-MPS can manage the task process like a management client. Since all the clients in a GPU-MPS share a GPU context, in a GPU multiprocess proxy server, a plurality of task processes as the clients only need to share one GPU context.
In addition, when the resources are scheduled, the logic units are scheduled based on the actual utilization rate of each GPU, and the problem of GPU excessive sharing can be avoided.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 schematically shows a block diagram of a graphics processing apparatus according to an embodiment of the present application;
FIG. 2 schematically shows a block diagram of another graphics processing apparatus according to an embodiment of the present application;
FIG. 3 schematically shows a block diagram of another graphics processing apparatus according to an embodiment of the present application;
FIG. 4 schematically shows a block diagram of another graphics processing apparatus according to an embodiment of the present application;
FIG. 5 schematically shows a block diagram of a resource serving apparatus according to an embodiment of the present application;
FIG. 6 schematically illustrates an exemplary application scenario in which embodiments according to the present application may be implemented;
fig. 7 is a block diagram schematically illustrating a structure of a resource scheduling apparatus according to an embodiment of the present application;
fig. 8 schematically shows a flowchart of a resource scheduling method according to an embodiment of the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the present application are described in detail below.
A job (job) submitted by a user is composed of a plurality of tasks (task), and one task is completed by one task process. Thus, scheduling GPU resources for a user's job is actually scheduling GPU resources for all task processes that complete the job.
Referring to fig. 1, fig. 1 schematically shows a block diagram of a Graphics Processing apparatus according to an embodiment of the present application, in the Graphics Processing apparatus 10, the logical Unit 11 is a minimum GPU resource scheduling Unit, the Graphics Processing apparatus maps a GPU multi-process proxy server (GPU-MPS, Graphics Processing Unit-Multiple Processing server)20, the GPU-MPS20 has a maximum client number of 16, and the GPU-MPS20 is a proxy for scheduling the Graphics Processing apparatus 10, a client of the GPU-MPS20 may schedule one logical Unit 11, a task process is a client of the GPU-MPS20, and the Graphics Processing apparatus may include a maximum logical Unit number of 16.
It can be understood that, because the logic unit is the smallest GPU resource scheduling unit, different logic units in one graphics processing apparatus can be scheduled to different task processes, so that different user jobs jointly occupy the same graphics processing apparatus, and the utilization rate of GPU resources in the graphics processing apparatus is ensured. Meanwhile, one task process is made to be a client of the GPU-MPS by utilizing the GPU-MPS technology, so that the GPU-MPS can manage the task process like a management client. Since all the clients in a GPU-MPS share a GPU context, multiple task processes as their clients in a GPU-MPS need only share a GPU context. For example, when a graphics processing apparatus maps a GPU-MPS, only one GPU context needs to be shared for all task processes scheduling the graphics processing apparatus, and GPU contexts do not need to be established separately, thereby reducing the number of GPU contexts and finally saving the overhead of establishing and switching GPU contexts.
In addition, when configuring the logic unit for the graphics processing apparatus 10, the number of logic units may be arbitrarily configured between 1 to 16 (including 1 and 16).
A client of GPU-MPS20 may schedule multiple logical units, e.g., 2, 3, or even more logical units, in addition to one logical unit. For example, when a client of the GPU-MPS20 can schedule two logical units and the graphics processing device 10 still maps one GPU-MPS20, the graphics processing device 10 can contain a maximum number of 32 logical units, as shown in FIG. 2. It can be seen that, in the case that the number of GPU-MPS20 mapped by the graphics processing apparatus 10 is constant, the graphics processing apparatus 10 can contain the maximum number of logical units related to and proportional to the number of logical units that can be dispatched by one client of the GPU-MPS 20.
In addition, the graphics processing apparatus 10 may map only one GPU-MPS20, or may map a plurality of GPU-MPS20, such as 2, 3, or even more GPU-MPS 20. For example, when the graphics processing device 10 maps two GPU-MPSs 20 and one client of the GPU-MPS20 can schedule one logical unit, the graphics processing device can contain a maximum of 32 logical units, as shown in FIG. 3. It can be seen that, in the case that the schedulable logical unit number of a client of the GPU-MPS20 is fixed, the maximum logical unit number that the GPU-MPS20 can contain is related to and proportional to the number of the GPU-MPS20 mapped by the GPU-MPS 10.
That is, the maximum number of logical units that the graphics processing device 10 can contain is related to and proportional to the number of logical units that a client of the GPU-MPS20 can dispatch, and the number of GPU-MPS20 that the graphics processing device 10 maps. For example, when the graphics processing device 10 maps two GPU-MPSs 20 and one client of the GPU-MPS20 can schedule two logical units, the graphics processing device can contain a maximum number of 64 logical units, as shown in FIG. 4.
Thus, for graphics processing device 10, it can contain a maximum number of logical units of M N K, where M is the number of logical units that can be dispatched by one client of the GPU-MPS, N is the maximum number of clients that one GPU-MPS contains, K is the number of GPU-MPS mapped by the graphics processing device, and M, N and K are both non-zero positive integers.
When configuring the logical units in the graphics processing apparatus 10, the logical units may be configured within the maximum number of logical units that the graphics processing apparatus 10 can include.
In a preferred embodiment of the present application, the graphics processing apparatus 10 includes M × N × K logic units.
In another preferred embodiment of the present application, a client of the GPU-MPS can schedule a logical unit, and the graphics processing device 10 maps one GPU-MPS 20. It will be appreciated that in this preferred embodiment, a graphics processing device contains the same maximum number of logical units as a GPU-MPS contains the maximum number of clients.
In addition, the graphics processing apparatus 10 is physically a single graphics processor.
Besides the graphics processing device, the embodiment of the application also provides a resource service device. Referring to fig. 5, fig. 5 schematically shows a structural diagram of a resource service device according to an embodiment of the present application, wherein the resource service device 50 includes that the resource service device 50 includes at least one graphics processing device 51, (e.g., two graphics processing devices 511 and 512), a monitoring unit 52 and a first communication unit 53. Moreover, the graphics processing device 511 is mapped to the GPU-MPS611, a client of the GPU-MPS611 can call a logic unit in the graphics processing device 511, the graphics processing device 512 is mapped to the GPU-MPS612, a client of the GPU-MPS612 can call a logic unit in the graphics processing device 512, and a task process can be a client of the GPU-MPS611 or a client of the GPU-MPS 612.
A monitoring unit 52, configured to monitor the number of logic units remaining in the graphics processing apparatus in a current period when a monitoring period arrives;
a first communication unit 53, configured to send the monitored data to a monitoring node in a cluster, so that the monitoring node updates a preset resource dynamic table by using the monitored data atom when an update cycle arrives;
wherein the resource dynamic table at least contains the number of the logic units left in the graphic processing device.
Each logic unit generates a PIPE file under the specified path of the resource server, and once the logic unit is used, the corresponding PIPE file is generated, so that the monitoring unit 11 can determine the number of the remaining logic units by monitoring the number of PIPE under the path.
It will be appreciated that this update operation may support offline scheduling, i.e., not scheduling resources through a unified scheduler, but directly using GPU resources locally), when each slave node in the cluster dynamically updates the number of logical units remaining in the local GPU).
It should be noted that the structure of the resource service device shown in fig. 5 is only an example, and a greater number of graphics processing devices may be provided. Moreover, the application also does not limit the number of GPU-MPS mapped by each graphics processing device, the number of logic units that can be called by one client of the GPU-MPS and the number of logic units contained by each graphics processing device.
In a preferred embodiment of the present application, the resource server 50 is a resource server in physical form.
In another preferred embodiment of the application, the resource server may be a slave node in a cluster.
For example, referring to fig. 6, fig. 6 schematically illustrates an exemplary application scenario in which embodiments according to the present application may be implemented. A cluster includes a plurality of slave nodes 10 (only one slave node is shown in fig. 1 for convenience of description and illustration), a monitoring node 20, and a monitoring node 30. The slave node 10 is a resource server, and a plurality of Graphics Processing Units (GPUs) are included in the slave node 10, only two GPUs, namely GPU-0 and GPU-1, are shown in fig. 1, each GPU respectively includes 16 logic units, MPS-0 is an agent for scheduling GPU-0, MPS-1 is an agent for scheduling GPU-1, and MPS-0 and MPS-1 respectively have 16 clients, one client of MPS-0 can schedule one logic unit of GPU-0, one client of MPS-1 can schedule one logic unit of GPU-1, and one task process in the user job can be either one client of MPS-0 or one client of MPS-1.
For example, when a logical unit in GPU-0 is dispatched to a task process in a user job, the task process will connect to the proxy of GPU-0 to which the logical unit belongs, i.e., to MPS-0.
The monitoring node 30 includes a job management device 31 and a resource scheduling device 32, the job management device 31 receives a request 61 for allocating GPU resources for a target user job sent by the cluster client 60, and the number of logical units requested to be scheduled is indicated in the request 61. The job management apparatus 31 transfers the request to the resource scheduling apparatus 32.
As shown in the structural block diagram of the resource scheduling apparatus in fig. 7, the resource scheduling apparatus 32 includes a second communication unit 321 and a response unit 322, wherein the second communication unit 321 is configured to receive a scheduling request 61 for scheduling GPU resources of a graphics processor for a target job; the response unit 322 is configured to, in response to the scheduling request, search for a graphics processing apparatus with a non-zero number of remaining logic units from a preset resource dynamic table, and schedule a logic unit for the target job from the searched graphics processing apparatus according to the number indicated by the scheduling request; wherein the resource dynamic table at least contains the number of the logic units left in the graphic processing device.
In the present application, the resource scheduling device 32 may schedule the logical unit by using any scheduling method in the prior art. For example, First fit scheduling, Best fit scheduling, Backfill scheduling, CFS scheduling, or the like.
The resource scheduling device 32 generates a resource dynamic table, and the slave node 10 dynamically updates the number of the remaining logical units in the GPU-0 and GPU-1 on the resource dynamic table, so that the resource scheduling device 32 can perform resource scheduling according to the remaining logical units of each GPU. The remaining logical units are then the logical units not scheduled to the task process.
Of course, if other slave nodes are included in the cluster, the resource dynamic table is also dynamically maintained by the other slave nodes, and the resource dynamic table further includes the number of remaining logical units in each GPU located on the other slave nodes. That is, the resource dynamic table contains the number of logical units remaining in the GPUs on all the slave nodes.
In addition, the resource dynamic table may further include identifiers of all slave nodes and identifiers of all GPUs in each slave node, so as to determine the location of each logic unit. For example, as shown in FIG. 6, the resource dynamic table includes the identification of the slave node 10 (e.g., the identification may be the global number of the slave node 10 in the cluster), the identifications of GPU-0 and GPU-1 included in the slave node 10, and the number of logical units remaining in GPU-0 and GPU-1.
In addition, considering that the GPU resources actually used by a job are likely to be larger than the GPU resources requested by the job, the actual used resources for one GPU may be larger than the scheduling resources of the GPU. The problem of over-sharing the GPU is also easily created when scheduling resources in the GPU for different jobs.
Therefore, in order to avoid the problem of excessive sharing of the GPUs, the actual utilization rate of each GPU can be maintained in the resource dynamic table, so that the resource scheduling device schedules the logic units in each GPU according to the actual utilization rate of each GPU. That is, the resource dynamic table contains the identification of all slave nodes in the cluster, the identification of all GPUs in each slave node, the number of logical units remaining in each GPU, and the actual utilization of each GPU.
In a preferred embodiment of the present application, the resource dynamic table further includes actual usage rates of GPU-0 and GPU-1, and in the slave node 10, the monitoring unit 11 is further configured to monitor actual usage rates of GPU-0 and GPU-1 in the current cycle when the monitoring cycle arrives.
Correspondingly, for the monitoring node 30, the response unit 322 in the resource scheduling device 32 is specifically configured to, in response to the scheduling request, search, from a preset resource dynamic table, a graphics processing device whose actual utilization rate is less than or equal to a preset maximum threshold and the number of remaining logic units is not zero, and schedule a logic unit for the target job from the searched graphics processing device according to the number indicated by the scheduling request.
In another preferred embodiment of the present application, the resource dynamic table may further include an operating status of a resource service device in the resource server cluster and an operating status of a graphics processing device in the resource service device, and is dynamically updated by the resource scheduling device, and the resource scheduling device 32 further includes:
and the updating unit is used for atomically updating the working state of the resource service device and the working state and the using state of the graphics processing device in the resource dynamic table when the updating period is reached, wherein the working state comprises working and non-working, and the using state comprises the using amount and the overall utilization rate of the logic unit.
For example, when a slave node or GPU is deleted or a slave node or GPU fails, its operating state is changed from active to inactive, and when a new slave node or a new GPU is added, its operating state is set to active.
In this application, the updating unit 323 may initialize the dynamic resource table when a cluster is initialized, or the updating unit 323 may update the dynamic resource table when a job migration fails or a migration task is needed due to QoS or job migration failure when a job is migrated. In addition, the updating unit can also update the number of the logic units left in each GPU in the resource dynamic table according to the resource scheduling response.
Corresponding to the resource scheduling device, the embodiment of the application also provides a resource scheduling method. Referring to fig. 8, fig. 8 schematically shows a flowchart of a resource scheduling method according to an embodiment of the present application, which may be performed by the resource scheduling apparatus 32, and the method may include, for example:
step 801: and receiving a scheduling request for scheduling GPU resources of the graphic processor for the target operation, wherein the number of the logic units requiring scheduling is indicated in the scheduling request.
Step 802: and responding to the scheduling request, searching the graphics processing devices with the number of the remaining logic units not equal to zero from a preset resource dynamic table, and scheduling the logic units for the target job from the searched graphics processing devices according to the number indicated by the scheduling request.
Wherein the resource dynamic table at least contains the number of the logic units left in the graphic processing device.
In a preferred embodiment of the present application, the resource dynamic table further comprises an actual usage rate of the graphics processing apparatus; the step 802 is:
and responding to the scheduling request, searching the graphics processing devices with the actual utilization rate less than or equal to a preset maximum threshold value and the number of the remaining logic units not equal to zero from a preset resource dynamic table, and scheduling the logic units for the target operation from the searched graphics processing devices according to the number indicated by the scheduling request.
In another preferred embodiment of the present application, the resource dynamic table further includes an operating status of a resource service device in the resource server cluster and an operating status of a graphics processing device in the resource service device; the method may further comprise: and when the updating period is reached, atomically updating the working state of the resource service device and the working state of the graphic processing device in the resource dynamic table, wherein the working state comprises working state and non-working state.
As can be seen from the above embodiments, compared with the prior art, the present application has the following advantages:
because the logic unit is the minimum GPU resource scheduling unit, different logic units in one graphic processing device can be scheduled to different task processes, so that different user jobs jointly occupy the same graphic processing device, and the utilization rate of GPU resources in the graphic processing device is ensured. Meanwhile, one task process becomes a client of the GPU-MPS by utilizing the GPU-MPS technology, so that the GPU-MPS can manage the task process like a management client. Since all the clients in a GPU-MPS share a GPU context, multiple task processes as their clients in a GPU-MPS need only share a GPU context.
In addition, when the resources are scheduled, the logic units are scheduled based on the actual utilization rate of each GPU, and the problem of GPU excessive sharing can be avoided.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when the actual implementation is performed, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not performed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may be or may be physically separate, and parts displayed as units may be or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can be realized in a form of a software functional unit.
It should be noted that, as will be understood by those skilled in the art, all or part of the processes in the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The graphics processing apparatus, the resource service apparatus, the resource scheduling method, and the apparatus provided in the present application are introduced in detail above, and a specific embodiment is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understanding the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (11)

1. A resource servicing apparatus comprising at least one graphics processing apparatus, a monitoring unit, and a first communication unit, wherein,
in the graphics processing device, a logical unit is a smallest Graphics Processing Unit (GPU) resource scheduling unit, the graphics processing device maps at least one GPU multiprocessing proxy server (GPU-MPS), the GPU-MPS is a proxy for scheduling the graphics processing device, one client of the GPU-MPS can schedule at least one logical unit, one task process is one client of the GPU-MPS, and the graphics processing device can contain the largest logical unit number of M multiplied by N multiplied by K;
wherein, M is the logical unit number which can be dispatched by a client of the GPU-MPS, N is the maximum client number contained in the GPU-MPS, K is the GPU-MPS number mapped by the graphics processing device, and M, N and K are all non-zero positive integers;
the monitoring unit is used for monitoring the number of the logic units left in the graphic processing device in the current period when the monitoring period is reached;
the first communication unit is used for sending the monitored data to the monitoring nodes in the cluster, so that the monitoring nodes update the preset resource dynamic table by using the monitored data atoms when the update cycle arrives;
wherein the resource dynamic table at least contains the number of the logic units left in the graphic processing device.
2. The resource service apparatus of claim 1, wherein the resource service apparatus is a slave node in a cluster.
3. The resource servicing device of claim 1, wherein the resource dynamic table further comprises an actual usage rate of a graphics processing device; the monitoring unit is further configured to monitor an actual usage rate of the local graphics processing apparatus in the current period when the monitoring period arrives.
4. The resource servicing device of claim 1, wherein one client of the GPU-MPS can schedule one logical unit.
5. The resource servicing device of claim 1, wherein the graphics processing device maps one GPU multiprocess proxy server.
6. A resource scheduling method applied to the resource service apparatus according to any one of claims 1 to 5, the method comprising:
receiving a scheduling request for scheduling GPU resources of a graphic processor for a target operation, wherein the number of logic units requiring scheduling is indicated in the scheduling request;
in response to the scheduling request, searching the graphics processing devices with the number of the remaining logic units not equal to zero from a preset resource dynamic table, and scheduling the logic units for the target job from the searched graphics processing devices according to the number indicated by the scheduling request;
wherein the resource dynamic table at least contains the number of the logic units left in the graphic processing device.
7. The method of claim 6, wherein the dynamic list of resources further comprises actual usage of graphics processing devices;
in response to the scheduling request, searching for the graphics processing apparatus with the non-zero number of remaining logical units from a preset resource dynamic table, and according to the number indicated by the scheduling request, scheduling the logical units for the target job from the searched graphics processing apparatus is:
and responding to the scheduling request, searching the graphics processing devices with the actual utilization rate less than or equal to a preset maximum threshold value and the number of the remaining logic units not equal to zero from a preset resource dynamic table, and scheduling the logic units for the target operation from the searched graphics processing devices according to the number indicated by the scheduling request.
8. The method of claim 7, wherein the resource dynamic table further comprises an operating status of a resource service device in the resource server cluster and an operating status of a graphics processing device in the resource service device; the method further comprises the following steps:
and when the updating period is reached, atomically updating the working state of the resource service device and the working state of the graphic processing device in the resource dynamic table, wherein the working state comprises working state and non-working state.
9. A resource scheduling apparatus, applied to the resource service apparatus according to any one of claims 1 to 5, comprising:
the second communication unit is used for receiving a scheduling request for scheduling GPU resources for a target job, wherein the number of logic units requiring scheduling is indicated in the scheduling request;
a response unit, configured to, in response to the scheduling request, search for graphics processing apparatuses with a non-zero number of remaining logic units from a preset resource dynamic table, and schedule logic units for the target job from the searched graphics processing apparatuses according to the number indicated by the scheduling request;
wherein the resource dynamic table at least contains the number of the logic units left in the graphic processing device.
10. The apparatus of claim 9, wherein the dynamic list of resources further comprises actual usage of graphics processing apparatus;
the response unit is specifically configured to, in response to the scheduling request, search, from a preset resource dynamic table, for the graphics processing apparatus whose actual utilization rate is less than or equal to a preset maximum threshold and the number of remaining logic units is not zero, and schedule the logic units for the target job from the searched graphics processing apparatus according to the number indicated by the scheduling request.
11. The apparatus of claim 10, wherein the resource dynamic table further comprises an operating status of a resource service apparatus in the resource server cluster and an operating status of a graphics processing apparatus in the resource service apparatus; the device further comprises:
and the updating unit is used for atomically updating the working state of the resource service device and the working state of the graphic processing device in the resource dynamic table when the updating period is up, wherein the working state comprises working and non-working.
CN201510208923.0A 2015-04-28 2015-04-28 Resource service device, resource scheduling method and device Active CN106155811B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510208923.0A CN106155811B (en) 2015-04-28 2015-04-28 Resource service device, resource scheduling method and device
PCT/CN2016/079865 WO2016173450A1 (en) 2015-04-28 2016-04-21 Graphic processing device, resource service device, resource scheduling method and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510208923.0A CN106155811B (en) 2015-04-28 2015-04-28 Resource service device, resource scheduling method and device

Publications (2)

Publication Number Publication Date
CN106155811A CN106155811A (en) 2016-11-23
CN106155811B true CN106155811B (en) 2020-01-07

Family

ID=57198136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510208923.0A Active CN106155811B (en) 2015-04-28 2015-04-28 Resource service device, resource scheduling method and device

Country Status (2)

Country Link
CN (1) CN106155811B (en)
WO (1) WO2016173450A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106686352B (en) * 2016-12-23 2019-06-07 北京大学 The real-time processing method of the multi-path video data of more GPU platforms
CN107688495B (en) * 2017-06-22 2020-11-03 平安科技(深圳)有限公司 Method and apparatus for scheduling processors
CN107544845B (en) * 2017-06-26 2020-08-11 新华三大数据技术有限公司 GPU resource scheduling method and device
CN107247629A (en) * 2017-07-04 2017-10-13 北京百度网讯科技有限公司 Cloud computing system and cloud computing method and device for controlling server
CN107329834A (en) * 2017-07-04 2017-11-07 北京百度网讯科技有限公司 Method and apparatus for performing calculating task
CN109936604B (en) * 2017-12-18 2022-07-26 北京图森智途科技有限公司 Resource scheduling method, device and system
CN112559164A (en) * 2019-09-25 2021-03-26 中兴通讯股份有限公司 Resource sharing method and device
CN110795249A (en) * 2019-10-30 2020-02-14 亚信科技(中国)有限公司 GPU resource scheduling method and device based on MESOS containerized platform
WO2021142614A1 (en) * 2020-01-14 2021-07-22 华为技术有限公司 Chip state determining method and device, and cluster resource scheduling method and device
CN111400051B (en) * 2020-03-31 2023-10-27 京东方科技集团股份有限公司 Resource scheduling method, device and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541640A (en) * 2011-12-28 2012-07-04 厦门市美亚柏科信息股份有限公司 Cluster GPU (graphic processing unit) resource scheduling system and method
CN102959517A (en) * 2010-06-10 2013-03-06 Otoy公司 Allocation of gpu resources accross multiple clients
CN104541247A (en) * 2012-08-07 2015-04-22 超威半导体公司 System and method for tuning a cloud computing system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7673304B2 (en) * 2003-02-18 2010-03-02 Microsoft Corporation Multithreaded kernel for graphics processing unit
CN101403983B (en) * 2008-11-25 2010-10-13 北京航空航天大学 Resource monitoring method and system for multi-core processor based on virtual machine
US20120188259A1 (en) * 2010-12-13 2012-07-26 Advanced Micro Devices, Inc. Mechanisms for Enabling Task Scheduling
US8370283B2 (en) * 2010-12-15 2013-02-05 Scienergy, Inc. Predicting energy consumption
US11386257B2 (en) * 2012-10-15 2022-07-12 Amaze Software, Inc. Efficient manipulation of surfaces in multi-dimensional space using energy agents
CN104407920B (en) * 2014-12-23 2018-02-09 浪潮(北京)电子信息产业有限公司 A kind of data processing method and system based on interprocess communication

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102959517A (en) * 2010-06-10 2013-03-06 Otoy公司 Allocation of gpu resources accross multiple clients
CN102541640A (en) * 2011-12-28 2012-07-04 厦门市美亚柏科信息股份有限公司 Cluster GPU (graphic processing unit) resource scheduling system and method
CN104541247A (en) * 2012-08-07 2015-04-22 超威半导体公司 System and method for tuning a cloud computing system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster;Wang Xian et al;《Parallel Computering》;20110930;第37卷(第9期);521-535 *
一种GPU集群的动态任务映射策略;陈庆奎等;《计算机工程》;20120930;第38卷(第17期);268-271 *
基于GPU集群的通用并行渲染系统设计与实现;张勤飞;《万方学位论文库》;20131008;全文 *

Also Published As

Publication number Publication date
WO2016173450A1 (en) 2016-11-03
CN106155811A (en) 2016-11-23

Similar Documents

Publication Publication Date Title
CN106155811B (en) Resource service device, resource scheduling method and device
US9906589B2 (en) Shared management service
US9576332B1 (en) Systems and methods for remote graphics processing unit service
WO2021098182A1 (en) Resource management method and apparatus, electronic device and storage medium
CN107015972B (en) Method, device and system for migrating machine room services
CN112527520A (en) Method and device for deploying message middleware
CN112306636B (en) Cloud rendering platform and intelligent scheduling method thereof
CN109218356B (en) Method and apparatus for managing stateful applications on a server
CN113849312A (en) Data processing task allocation method and device, electronic equipment and storage medium
CN102523109A (en) Resource state updating method, management client ends, and server
US10235223B2 (en) High-performance computing framework for cloud computing environments
CN110389825B (en) Method, apparatus and computer program product for managing dedicated processing resources
CN114675964A (en) Distributed scheduling method, system and medium based on Federal decision tree model training
CN115686875A (en) Method, apparatus and program product for transferring data between multiple processes
CN109697114B (en) Method and machine for application migration
WO2019018474A1 (en) Scalable statistics and analytics mechanisms in cloud networking
CN107528871B (en) Data analysis in storage systems
CN105653347B (en) A kind of server, method for managing resource and virtual machine manager
US11656914B2 (en) Anticipating future resource consumption based on user sessions
CN111614702B (en) Edge calculation method and edge calculation system
CN116700933A (en) Heterogeneous computing power federation-oriented multi-cluster job scheduling system and method
CN113703906A (en) Data processing method, device and system
US10148503B1 (en) Mechanism for dynamic delivery of network configuration states to protocol heads
CN115361382B (en) Data processing method, device, equipment and storage medium based on data group
CN109510877B (en) Method and device for maintaining dynamic resource group and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant