WO2016173450A1

WO2016173450A1 - Graphic processing device, resource service device, resource scheduling method and device thereof

Info

Publication number: WO2016173450A1
Application number: PCT/CN2016/079865
Authority: WO
Inventors: 孔建钢
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2015-04-28
Filing date: 2016-04-21
Publication date: 2016-11-03
Also published as: CN106155811B; CN106155811A

Abstract

Disclosed in an embodiment of this application is a graphic processing device. A logical unit is the minimal GPU resource scheduling unit. The graphic processing device maps at least one GPU multi-process proxy server (GPU-MPS), and the GPU-MPS is a proxy for scheduling the graphic processing device. One client of the GPU-MPS can schedule at least one logical unit. One task process corresponds to one client of the GPU-MPS. The maximum number of logical units which can be comprised by the graphic processing device is M×N×K; M is the number of the logical units which can be scheduled by one client of the GPU-MPS; N is the maximum number of the client comprised by the GPU-MPS; and K is the number of the GPU-MPS mapped by the graphic processing device. By this application, the utilization rate of GPU resources is improved, at the same time, the cost of establishing and switching the GPU context for the graphic processing device is saved. Also disclosed in this application are a resource service device, resource scheduling method and device thereof.

Description

Graphics processing device, resource service device, resource scheduling method and device

The present application claims priority to Chinese Patent Application No. 201510208923.0, entitled "Graphic Processing Device, Resource Service Device, Resource Scheduling Method and Device", filed on April 28, 2015, the entire contents of which are incorporated herein by reference. In the application.

Technical field

The present application relates to the field of computer applications, and in particular, to a graphics processing device, a resource service device, a resource scheduling method, and a device.

Background technique

Since the processing of graphics is becoming more and more important in modern computers, a core processor dedicated to graphics processing is required, and a graphics processing unit (GPU) is a type of graphics processing. Device. At the same time, the GPU's powerful computing power (General Purpose GPU) is also becoming more and more popular in various high-performance computing clusters.

Currently, in the existing GPU clustering technology, when processing a job submitted by a user, there are mainly two scheduling methods of GPU resources. Among them, a scheduling method is that the resource scheduler dispatches only one GPU (for example, one GPU card) to one user's job. Another scheduling method is that the resource scheduler simultaneously schedules one GPU to jobs for multiple users.

In the process of implementing the present application, the inventors of the present application found that at least the following problems exist in the prior art: in the first scheduling method, since one GPU is exclusively occupied by one user's job, one user's job may not be able to Make full use of one GPU's resources, so there will be problems with low GPU resource utilization. In the second scheduling method, since one GPU is shared by multiple users' jobs, and multiple users are more likely to fully utilize the resources of one GPU, the utilization of GPU resources is improved to some extent.

Although the second scheduling method can improve the utilization of GPU resources, when multiple users' jobs share a GPU, the number of processes simultaneously opened by multiple users may be large. For each process, the GPU must Establishing a GPU context for it, so the number of GPU contexts built on the GPU can be very large, and it can also be switched in a large number of GPU contexts, creating and switching GPU contexts can cause huge overhead for GPU resources. , which leads to excessive sharing of GPU issues.

Summary of the invention

In order to solve the above technical problem, the embodiment of the present application provides a graphics processing device, a resource service device, a resource scheduling method, and a device, so as to improve the utilization of GPU resources, and also save the overhead of establishing and switching a GPU context. Further, the problem of excessive sharing of the GPU is avoided as much as possible.

The embodiment of the present application discloses the following technical solutions:

A graphics processing device, in which the logic unit is a minimum graphics processor GPU resource scheduling unit, the graphics processing device mapping at least one GPU multi-process proxy server GPU-MPS, the GPU-MPS is Scheduling the agent of the graphics processing device, a client of the GPU-MPS may schedule at least one of the logical units, one task process is a client of the GPU-MPS, and the maximum number of logical units that the graphics processing device may include M × N × K;

Where M is the number of logical units that can be scheduled by one client of the GPU-MPS, N is the maximum number of clients included in one GPU-MPS, and K is the number of GPU-MPS mapped by the graphics processing device, and M, N, and K are both Is a non-zero positive integer.

Preferably, one client of the GPU-MPS can schedule one logical unit.

Preferably, the graphics processing device maps a GPU multi-process proxy server.

Preferably, the graphics processing device comprises M×N×K logical units.

A resource service device comprising the graphics processing device, the monitoring unit and the first communication unit according to any one of the above, wherein

a monitoring unit, configured to monitor the number of remaining logic units in the graphics processing device during the current period when the monitoring period arrives;

a first communication unit, configured to send the monitored data to a monitoring node in the cluster, so that the monitoring node updates the preset resource dynamic table by using the monitored data atom when the update period arrives;

The resource dynamic table includes at least the number of logical units remaining in the graphics processing device.

Preferably, the resource service device is a slave node in the cluster.

Preferably, the resource dynamic table further includes an actual usage rate of the graphics processing device; and the monitoring unit is further configured to monitor an actual usage rate of the local graphics processing device in the current period when the monitoring period arrives.

A resource scheduling method, which is applied to the resource service device according to any one of the preceding claims, the method comprising:

Receiving a scheduling request for scheduling a GPU resource of a graphics processor for a target job, where the number of logical units requesting scheduling is indicated in the scheduling request;

Responding to the scheduling request, searching for a graphics processing device whose number of remaining logical units is not zero from a preset resource dynamic table, and according to the number of the scheduling request indication, from the found graphics processing device State Job scheduling logic unit;

Preferably, the resource dynamic table further includes an actual usage rate of the graphics processing device;

And in response to the scheduling request, searching for, from the preset resource dynamic table, a graphics processing device whose number of remaining logical units is not zero, and according to the quantity indicated by the scheduling request, from the searched graphics processing device The scheduling logic unit for the target job is:

And in response to the scheduling request, searching for, from the preset resource dynamic table, a graphics processing device whose actual usage rate is less than or equal to a preset maximum threshold and the number of remaining logical units is not zero, and according to the scheduling request indication The number of logical units that are scheduled for the target job from the found graphics processing device.

Preferably, the resource dynamic table further includes an operation state of the resource service device in the resource server cluster and an operation state of the graphics processing device in the resource service device; the method further includes:

Upon arrival of the update cycle, the atom updates the operational state of the resource service device in the resource dynamics table and the operational state of the graphics processing device, the operational state including work and non-work.

A resource scheduling apparatus, comprising: the resource service apparatus according to any one of the preceding items, comprising:

a second communication unit, configured to receive a scheduling request for scheduling a GPU resource of the graphics processor for the target job, where the number of logical units requesting scheduling is indicated in the scheduling request;

a response unit, configured to, in response to the scheduling request, search for a graphics processing device whose number of remaining logical units is not zero from a preset resource dynamic table, and according to the number of indications of the scheduling request, from the found graphics Scheduling a logical unit for the target job in the processing device;

The response unit is configured to: in response to the scheduling request, search for a graphics processing device that is less than or equal to a preset maximum threshold and the number of remaining logical units is not zero from a preset resource dynamic table. And scheduling, according to the quantity indicated by the scheduling request, a logical unit for the target job from the found graphics processing device.

Preferably, the resource dynamic table further includes an operation status of the resource service device in the resource server cluster and an operation status of the graphics processing device in the resource service device; the device further includes:

And an update unit, configured to, when the update period arrives, atomically update an operation state of the resource service device in the resource dynamic table and an operation state of the graphics processing device, where the work state includes work and non-work.

As can be seen from the above embodiments, the advantages of the present application over the prior art are:

Since the logical unit is the smallest GPU resource scheduling unit, it can be different in one graphics processing device. The logical unit is scheduled to different task processes, so that different user jobs share the same graphics processing device to ensure the utilization of GPU resources in the graphics processing device. At the same time, this application utilizes GPU-MPS technology to make a task process a client of GPU-MPS, so that GPU-MPS can manage the task process just like the management client. Since all clients in a GPU-MPS share a GPU context, in a GPU multi-process proxy server, multiple task processes as their clients only need to share one GPU context.

In addition, when resource scheduling, the logic unit is scheduled based on the actual usage rate of each GPU, and the problem of excessive GPU sharing may also be avoided.

DRAWINGS

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings to be used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present application, and other drawings can be obtained according to the drawings without any creative labor for those skilled in the art.

FIG. 1 schematically shows a block diagram of a graphics processing apparatus according to an embodiment of the present application; FIG.

FIG. 2 schematically shows a block diagram of another graphics processing apparatus according to an embodiment of the present application; FIG.

FIG. 3 schematically shows a block diagram of another graphics processing apparatus according to an embodiment of the present application; FIG.

FIG. 4 schematically shows a block diagram of another graphics processing apparatus according to an embodiment of the present application; FIG.

FIG. 5 schematically shows a structural diagram of a resource service device according to an embodiment of the present application; FIG.

FIG. 6 schematically illustrates an exemplary application scenario in which an embodiment of the present application may be implemented;

FIG. 7 is a block diagram showing a structure of a resource scheduling apparatus according to an embodiment of the present application; FIG.

FIG. 8 schematically illustrates a flow chart of a resource scheduling method in accordance with an embodiment of the present application.

detailed description

The above described objects, features and advantages of the present application will become more apparent and understood.

A job submitted by a user is composed of multiple tasks, and one task is completed by a task process. Therefore, scheduling GPU resources for a user's job is actually scheduling GPU resources for all task processes that complete the job.

Please refer to FIG. 1. FIG. 1 schematically illustrates a junction of a graphics processing apparatus according to an embodiment of the present application. In the graphics processing device 10, the logic unit 11 is a minimum GPU resource scheduling unit, and the graphics processing device maps a GPU-Multiple Process Server (GPU-MPS) 20, which is a GPU- The MPS 20 has a maximum number of clients of 16, and the GPU-MPS 20 is a proxy for scheduling the graphics processing device 10. One client of the GPU-MPS 20 can schedule one logical unit 11, and one task process is the GPU-MPS. For a client of 20, the graphics processing device can include a maximum of 16 logical units.

It can be understood that, since the logical unit is the smallest GPU resource scheduling unit, different logical units in one graphics processing device can be scheduled to different task processes, so that different user jobs share the same graphics processing device to ensure graphics. The utilization of GPU resources in the processing device. At the same time, this application utilizes GPU-MPS technology to make a task process a client of GPU-MPS, so that GPU-MPS can manage task processes like management clients. Since all clients in a GPU-MPS share a GPU context, in a GPU-MPS, multiple task processes as their clients only need to share a single GPU context. For example, when a graphics processing device maps a GPU-MPS, it only needs to share a GPU context for all task processes that schedule the graphics processing device, instead of separately establishing a GPU context, thereby reducing the number of GPU contexts and ultimately saving The overhead of establishing and switching GPU contexts.

In addition, when the logic unit is configured for the graphics processing apparatus 10, the number of logical units can be arbitrarily configured between 1 and 16.

A client of GPU-MPS 20 can schedule multiple logical units, such as 2, 3, or even more logical units, in addition to scheduling one logical unit. For example, when one client of the GPU-MPS 20 can schedule two logical units, and the graphics processing device 10 still maps one GPU-MPS 20, the maximum number of logical units that the graphics processing device 10 can include is 32. 2 is shown. It can be seen that, in the case that the number of GPU-MPSs 20 mapped by the graphics processing device 10 is fixed, the maximum number of logical units that the graphics processing device 10 can include and one client schedulable logic of the GPU-MPS 20 The number of units is related and proportional.

In addition, the graphics processing apparatus 10 may map only one GPU-MPS 20, or may map multiple GPU-MPSs 20, such as two, three, or even more GPU-MPSs 20. For example, when the graphics processing device 10 maps two GPU-MPSs 20, and one client of the GPU-MPS 20 can schedule one logical unit, the graphics processing device can include 32 maximum logical units, as shown in FIG. Show. It can be seen that, in the case that the number of logical units that can be scheduled by one client of the GPU-MPS 20 is fixed, the maximum number of logical units that the graphics processing device 10 can include is mapped to the GPU-MPS 20 mapped by the graphics processing apparatus 10. The number is related and proportional.

That is, the maximum number of logical units that the graphics processing device 10 can include is both a guest of the GPU-MPS20. The number of logical units that can be scheduled by the terminal is related to the number of GPU-MPSs 20 mapped by the graphics processing device 10, and is proportional. For example, when the graphics processing device 10 maps two GPU-MPSs 20, and one client of the GPU-MPS 20 can schedule two logical units, the graphics processing device can include 64 maximum logical units, as shown in FIG. Shown.

Therefore, for the graphics processing apparatus 10, the maximum number of logical units that can be included is M×N×K, where M is the number of configurable logical units of one client of the GPU-MPS, and N is a GPU-MPS. The maximum number of clients included, K is the number of GPU-MPS mapped by the graphics processing device, and M, N, and K are all non-zero positive integers.

When the logic unit in the graphics processing device 10 is arranged, it may be arranged within the maximum number of logical units that the graphics processing device 10 can include.

In a preferred embodiment of the present application, the graphics processing apparatus 10 includes M x N x K logic units.

In another preferred embodiment of the present application, one client of the GPU-MPS can schedule one logical unit, and the graphics processing device 10 maps one GPU-MPS 20. It will be appreciated that in this preferred embodiment, a graphics processing device includes a maximum number of logical units equal to the maximum number of clients included in a GPU-MPS.

In addition, it should be noted that the graphics processing device 10 is physically a graphics processor.

In addition to the graphics processing device, the embodiment of the present application further provides a resource service device. Referring to FIG. 5, FIG. 5 is a schematic structural diagram of a resource service apparatus according to an embodiment of the present application, wherein the resource service apparatus 50 includes at least one graphics processing apparatus 51. (for example, two graphics processing devices 511 and 512), a monitoring unit 52, and a first communication unit 53. Moreover, the graphics processing device 511 is mapped to the GPU-MPS 611. One client of the GPU-MPS 611 can call a logic unit in the graphics processing device 511. The graphics processing device 512 is mapped to the GPU-MPS 612, and a client of the GPU-MPS 612. One of the logic units in the graphics processing device 512 can be invoked. A task process can be a client of the GPU-MPS 611 or a client of the GPU-MPS 612.

The monitoring unit 52 is configured to monitor the number of remaining logical units in the graphics processing device in the current period when the monitoring period arrives;

The first communication unit 53 is configured to send the monitored data to the monitoring node in the cluster, so that the monitoring node updates the preset resource dynamic table by using the monitored data atom when the update period arrives;

Each of the logical units generates a PIPE file in the specified path of the resource server. Once the logical unit is used, the corresponding PIPE file is generated. Therefore, the monitoring unit 11 only needs to monitor the path. The number of PIPEs can be used to determine the number of remaining logical units.

It can be understood that when each slave node in the cluster dynamically updates the number of logical units remaining in the local GPU, the update operation can support offline scheduling, that is, the resource is not directly used by the unified scheduler, but the GPU resource is directly used locally. ).

It should be noted that the structure of the resource service device shown in FIG. 5 is only an example, and it is also possible to have a larger number of graphics processing devices. Moreover, the present application also does not limit the number of GPU-MPSs mapped by each graphics processing device, the number of logical units that one client of GPU-MPS can call, and the number of logical units that each graphics processing device includes.

In a preferred embodiment of the present application, the resource service device 50 is physically a resource server.

In another preferred embodiment of the application, the resource server can be a slave node in the cluster.

For example, please refer to FIG. 6, which schematically illustrates an exemplary application scenario in which embodiments may be implemented in accordance with embodiments of the present application. Among them, in a cluster, there are a plurality of slave nodes 10 (only one slave node is shown in FIG. 1 for convenience of description and display), one monitoring node 20 and one monitoring node 30. The slave node 10 is a resource server, and the slave node 10 includes a plurality of graphics processors (GPUs). Only two GPUs are shown in FIG. 1, namely, GPU-0 and GPU-1, and each GPU is used. Contains 16 logical units, MPS-0 is the agent that schedules GPU-0, MPS-1 is the agent that schedules GPU-1, and MPS-0 and MPS-1 each have 16 clients, one customer of MPS-0 The terminal can schedule one logical unit of GPU-0. One client of MPS-1 can schedule one logical unit of GPU-1. One task process in the user job can be either a client of MPS-0 or an MPS. -1 for a client.

For example, when a logical unit in GPU-0 is scheduled to a task process in a user job, the task process is connected to the agent of GPU-0 to which the logical unit belongs, that is, connected to MPS- 0 on.

The monitoring node 30 includes a job management device 31 and a resource scheduling device 32. The job management device 31 first receives a request 61 sent by the cluster client 60 to allocate GPU resources for the target user job, and in the request 61, a logical unit requesting scheduling is indicated. quantity. The job management device 31 forwards the request to the resource scheduling device 32.

As shown in the structural block diagram of the resource scheduling apparatus shown in FIG. 7, the resource scheduling apparatus 32 includes a second communication unit 321 and a response unit 322, wherein the second communication unit 321 is configured to receive a scheduling request for scheduling a graphics processor GPU resource for a target job. The response unit 322 is configured to: in response to the scheduling request, search for a graphics processing device whose number of remaining logical units is not zero from a preset resource dynamic table, and according to the quantity indicated by the scheduling request, from the search to a logic processing unit for the target job in the graphics processing device; wherein the resource dynamic table includes at least a graph The number of logical units remaining in the processing device.

In the present application, the resource scheduling device 32 may schedule the logical unit using any of the prior art scheduling methods. For example, First Fit scheduling, Best fit scheduling, Backfill scheduling, or CFS scheduling.

The resource scheduling device 32 generates a resource dynamic table, and dynamically updates the number of logical units remaining in the GPU-0 and GPU-1 on the resource dynamic table by the slave node 10, so that the resource scheduling device 32 can according to the remaining logical units of each GPU. Perform resource scheduling. The remaining logical unit refers to the logical unit that is not scheduled to the task process.

Of course, if the cluster further includes other slave nodes, the resource dynamic table is also dynamically maintained by other slave nodes, and the resource dynamic table further includes the number of logical units remaining in each GPU on the other slave nodes. That is, the resource dynamic table contains the number of logical units remaining in the GPU on all slave nodes.

In addition, the resource dynamic table may further include the resource dynamic table including the identifiers of all the slave nodes and the identifiers of all the GPUs in each slave node to determine the location of each logical unit. For example, as shown in FIG. 6, the resource dynamic table includes the identifier of the slave node 10 (for example, the identifier may be a global number of the slave node 10 in the cluster), the GPU-0 and the GPU included in the slave node 10. The identifier of -1, and the number of logical units remaining in GPU-0 and GPU-1.

In addition, considering that the GPU resources actually used by the job are likely to be larger than the GPU resources they request, for a GPU, the actual use of resources may be larger than its scheduling resources. When scheduling resources in the GPU for different jobs, it is also easy to create problems of over-sharing the GPU.

Therefore, in order to avoid the problem of excessive sharing of the GPU, the actual usage rate of each GPU may also be maintained in the resource dynamic table, so that the resource scheduling apparatus schedules the logical units in each GPU according to the actual usage rate of each GPU. That is, the resource dynamic table includes the identifiers of all the slave nodes in the cluster, the identifiers of all the GPUs in each slave node, the number of remaining logical units in each GPU, and the actual usage rate of each GPU.

In a preferred embodiment of the present application, the resource dynamic table further includes the actual usage rate of GPU-0 and GPU-1. In the slave node 10, the monitoring unit 11 is further configured to monitor the GPU in the current cycle when the monitoring period arrives. -0 and GPU-1 actual usage.

Correspondingly, for the monitoring node 30, the response unit 322 in the resource scheduling device 32 is configured to: in response to the scheduling request, find that the actual usage rate is less than or equal to a preset maximum threshold and remaining from the preset resource dynamic table. The graphics processing device of the number of logical units is not zero, and the logical unit is scheduled for the target job from the found graphics processing device according to the number indicated by the scheduling request.

In another preferred embodiment of the present application, the resource dynamic table may further include an operating state of the resource serving device in the resource server cluster and an operating state of the graphics processing device in the resource serving device, and is scheduled by the resource. The device dynamically updates, and the resource scheduling device 32 further includes:

And an update unit, configured to: when the update period arrives, atomically update an operation state of the resource service device in the resource dynamic table and an operation state and a use state of the graphics processing device, where the work state includes work and non-work, and the use state includes logic Unit usage and overall utilization.

For example, when a slave node or GPU or a slave node or GPU fails, its working state changes from work to inoperative. When a new slave node or a new GPU is added, its working state is set to work.

In the present application, the update unit 323 may initialize the resource dynamic table when the cluster is initialized, or may update the resource dynamic table when the task needs to be migrated due to job migration failure or QoS during job migration. . In addition, the update unit may further update the number of remaining logical units in each GPU in the resource dynamic table according to the resource scheduling response.

Corresponding to the foregoing resource scheduling apparatus, the embodiment of the present application further provides a resource scheduling method. Please refer to FIG. 8. FIG. 8 is a flowchart of a resource scheduling method according to an embodiment of the present application. The method may be performed by the resource scheduling device 32. The method may include, for example:

Step 801: Receive a scheduling request for scheduling a graphics processor GPU resource for a target job, where the number of logical units requesting scheduling is indicated in the scheduling request.

Step 802: In response to the scheduling request, searching for a graphics processing device whose number of remaining logical units is not zero from a preset resource dynamic table, and searching for the graphics processing device according to the quantity indicated by the scheduling request. Scheduling a logical unit for the target job.

In a preferred embodiment of the present application, the resource dynamic table further includes an actual usage rate of the graphics processing device; the step 802 is:

In another preferred embodiment of the present application, the resource dynamic table further includes an operation status of the resource service device in the resource server cluster and an operation status of the graphics processing device in the resource service device; the method may further include: updating When the cycle arrives, the atom updates the working state of the resource service device in the resource dynamic table and the working state of the graphics processing device, the working state including work and non-work.

Since the logical unit is the smallest GPU resource scheduling unit, different logical units in one graphics processing device can be scheduled to different task processes, so that different user jobs share the same graphics processing device to ensure the GPU in the graphics processing device. Utilization of resources. At the same time, this application utilizes GPU-MPS technology to make a task process a client of GPU-MPS, so that GPU-MPS can manage the task process just like the management client. Since all clients in a GPU-MPS share a GPU context, in a GPU-MPS, multiple task processes as their clients only need to share a single GPU context.

A person skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined. Or it can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may be or may be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware and can be implemented in the form of a software functional unit.

It should be noted that those skilled in the art can understand that all or part of the process of implementing the foregoing embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage. In the medium, the program, when executed, may include the flow of an embodiment of the methods as described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

The graphics processing device, the resource service device, the resource scheduling method, and the device provided by the present application have been DETAILED DESCRIPTION OF THE INVENTION The principles and embodiments of the present application have been described with reference to the specific embodiments herein. The description of the above embodiments is only for the purpose of helping to understand the method of the present application and its core idea; and, for a person skilled in the art, In view of the idea of the present application, there are variations in the specific embodiments and the scope of application, and the contents of the present specification should not be construed as limiting the present application.

Claims

A graphics processing device, wherein in the graphics processing device, a logic unit is a minimum graphics processor GPU resource scheduling unit, the graphics processing device mapping at least one GPU multi-process proxy server GPU-MPS, GPU-MPS is a proxy for scheduling the graphics processing device. One client of the GPU-MPS can schedule at least one of the logical units, and one task process is a client of the GPU-MPS, and the graphics processing device can include the largest The number of logical units is M × N × K;

Where M is the number of logical units that can be scheduled by one client of the GPU-MPS, N is the maximum number of clients included in one GPU-MPS, and K is the number of GPU-MPS mapped by the graphics processing device, and M, N, and K are both Is a non-zero positive integer.
The graphics processing device of claim 1 wherein a client of the GPU-MPS can schedule a logical unit.
A graphics processing apparatus according to claim 1 or 2, wherein said graphics processing means maps a GPU multi-process proxy server.
A graphics processing apparatus according to claim 1, wherein said graphics processing means comprises M x N x K logical units.
A resource service device, comprising: at least one graphics processing device, a monitoring unit, and a first communication unit according to any one of claims 1 to 4, wherein

a monitoring unit, configured to monitor the number of remaining logic units in the graphics processing device during the current period when the monitoring period arrives;

a first communication unit, configured to send the monitored data to a monitoring node in the cluster, so that the monitoring node updates the preset resource dynamic table by using the monitored data atom when the update period arrives;

The resource dynamic table includes at least the number of logical units remaining in the graphics processing device.
The resource service apparatus according to claim 5, wherein said resource service apparatus is a slave node in a cluster.
The resource service device according to claim 5, wherein the resource dynamic table further comprises an actual usage rate of the graphics processing device; the monitoring unit is further configured to: monitor the local time in the current cycle when the monitoring period arrives The actual usage rate of the graphics processing device.
A resource scheduling method, characterized by the resource service device according to any one of claims 5 to 7, the method comprising:

Receiving a scheduling request for scheduling a GPU resource of a graphics processor for a target job, where the number of logical units requesting scheduling is indicated in the scheduling request;

Responding to the scheduling request, searching for a graphics processing device whose number of remaining logical units is not zero from a preset resource dynamic table, and according to the number of the scheduling request indication, from the found graphics processing device The target job scheduling logic unit;

The resource dynamic table includes at least the number of logical units remaining in the graphics processing device.
The method according to claim 8, wherein the resource dynamic table further comprises an actual usage rate of the graphics processing device;

And in response to the scheduling request, searching for, from the preset resource dynamic table, a graphics processing device whose number of remaining logical units is not zero, and according to the quantity indicated by the scheduling request, from the searched graphics processing device The scheduling logic unit for the target job is:

And in response to the scheduling request, searching for, from the preset resource dynamic table, a graphics processing device whose actual usage rate is less than or equal to a preset maximum threshold and the number of remaining logical units is not zero, and according to the scheduling request indication The number of logical units that are scheduled for the target job from the found graphics processing device.
The method according to claim 8 or 9, wherein the resource dynamic table further comprises an operation state of the resource service device in the resource server cluster and an operation state of the graphics processing device in the resource service device; include:

Upon arrival of the update cycle, the atom updates the operational state of the resource service device in the resource dynamics table and the operational state of the graphics processing device, the operational state including work and non-work.
A resource scheduling apparatus, comprising: the resource service apparatus according to any one of claims 5 to 7, comprising:

a second communication unit, configured to receive a scheduling request for scheduling a GPU resource of the graphics processor for the target job, where the number of logical units requesting scheduling is indicated in the scheduling request;

a response unit, configured to, in response to the scheduling request, search for a graphics processing device whose number of remaining logical units is not zero from a preset resource dynamic table, and according to the number of indications of the scheduling request, from the found graphics Scheduling a logical unit for the target job in the processing device;

The resource dynamic table includes at least the number of logical units remaining in the graphics processing device.
The apparatus according to claim 11, wherein said resource dynamic table further comprises an actual usage rate of the graphics processing device;

The response unit is specifically configured to: in response to the scheduling request, find an actual resource from a preset resource dynamic table. a graphics processing device having a usage rate less than or equal to a preset maximum threshold and the number of remaining logical units being non-zero, and scheduling the target job from the found graphics processing device according to the number indicated by the scheduling request Logical unit.
The apparatus according to claim 11 or 12, wherein the resource dynamic table further comprises an operation state of the resource service device in the resource server cluster and an operation state of the graphics processing device in the resource service device; include:

And an update unit, configured to, when the update period arrives, atomically update an operation state of the resource service device in the resource dynamic table and an operation state of the graphics processing device, where the work state includes work and non-work.