CN107193650B

CN107193650B - Method and device for scheduling display card resources in distributed cluster

Info

Publication number: CN107193650B
Application number: CN201710250265.0A
Authority: CN
Inventors: 李远策
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2017-04-17
Filing date: 2017-04-17
Publication date: 2021-01-19
Anticipated expiration: 2037-04-17
Also published as: CN107193650A

Abstract

The invention discloses a method and a device for scheduling display card resources in a distributed cluster. The method comprises the following steps: acquiring display card resources in the distributed cluster, and recording the number of available display cards on each PCI-E bus in a display card resource scheduling table; receiving submitted jobs, wherein the jobs comprise the number of the display cards applied by the jobs; and searching the display card resource scheduling table, and when the number of the available display cards on one PCI-E bus meets the number of the display cards applied for the operation, selecting the display cards with the number matched with the number of the display cards applied for the operation from the PCI-E bus as the display card resources distributed to the operation. The technical scheme can ensure that each submitted job is executed by the display card which does not need to carry out communication across the PCI-E bus as far as possible, avoids low efficiency caused by communication across the PCI-E bus, greatly improves the efficiency of deep learning jobs and other job types with high requirements on display card resources, has fine scheduling granularity, and meets the requirements of distributed clusters.

Description

Method and device for scheduling display card resources in distributed cluster

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for scheduling video card resources in a distributed cluster.

Background

There are many kinds of resource managers or resource schedulers in distributed clusters, such as k8s, messes, yarn, etc. However, they cannot schedule the graphics card resources well, and for the computation tasks with high demand on the graphics card resources, such as deep learning, the performance of the computation tasks will be greatly affected by the quality of the allocated graphics card resources.

Disclosure of Invention

In view of the above, the present invention has been made to provide a method and apparatus for scheduling graphics card resources in a distributed cluster that overcomes or at least partially solves the above-mentioned problems.

According to an aspect of the present invention, there is provided a method for scheduling graphics card resources in a distributed cluster, comprising:

acquiring display card resources in the distributed cluster, and recording the number of available display cards on each PCI-E bus in a display card resource scheduling table;

receiving submitted jobs, wherein the jobs comprise the number of the display cards applied by the jobs;

and searching the display card resource scheduling table, and when the number of the available display cards on one PCI-E bus meets the number of the display cards applied for the operation, selecting the display cards with the number matched with the number of the display cards applied for the operation from the PCI-E bus as the display card resources distributed to the operation.

Optionally, the acquiring the graphics card resource in the distributed cluster includes:

and reading the display card resources on each computing device deployed in the distributed cluster from the PCI-E bus of the computing device.

Optionally, the recording, in the display card resource scheduling table, the number of available display cards on each PCI-E bus includes:

and recording the IDs of the available display cards on each PCI-E bus in an open linked list, and sequencing according to the number of the available display cards on each PCI-E bus.

Optionally, the sorting is an ascending order, and the searching for the display card resource scheduling table includes:

and traversing the open linked list through a depth-first algorithm to judge whether the number of the available display cards on each PCI-E bus meets the number of the display cards applied by the operation.

Optionally, when the number of the available display cards on all the PCI-E buses does not satisfy the number of the display cards applied for the job, the open linked list is traversed again through a depth-first algorithm, and the display cards with the number matched with the number of the display cards applied for the job are selected from the PCI-E buses as the display card resources allocated to the job.

Optionally, re-traversing the open linked list through a depth-first algorithm, and selecting, from the multiple PCI-E buses, display cards of which the number matches the number of display cards applied for the job as display card resources allocated to the job includes:

distributing all the searched available display cards on the first PCI-E bus to the operation, judging whether the number of the available display cards on the next PCI-E bus meets the number of the remaining display cards applied by the operation, if so, selecting the display cards with the number matched with the number of the display cards applied by the operation from the PCI-E bus as display card resources distributed to the operation, if not, distributing all the available display cards on the PCI-E bus to the operation, and judging whether the number of the available display cards on the next PCI-E bus meets the number of the remaining display cards applied by the operation or not until the number of the remaining display cards applied by the operation is met.

Optionally, the method further comprises:

deleting all available display cards distributed for the operation from the open linked list, and reordering the open linked list;

and/or the presence of a gas in the gas,

and modifying the open linked list according to the released display card resources, and reordering the open linked list.

According to another aspect of the present invention, there is provided an apparatus for scheduling graphics card resources in a distributed cluster, including:

the recording unit is suitable for acquiring the display card resources in the distributed cluster and recording the number of the available display cards on each PCI-E bus in a display card resource scheduling table;

and the scheduling unit is suitable for receiving submitted jobs, the jobs comprise the number of the display cards applied for the jobs, the display card resource scheduling table is searched, and when the number of the available display cards on one PCI-E bus meets the number of the display cards applied for the jobs, the display cards with the number matched with the number of the display cards applied for the jobs are selected from the PCI-E bus to serve as the display card resources distributed to the jobs.

Optionally, the recording unit is adapted to read, from a PCI-E bus of each computing device deployed in the distributed cluster, a graphics card resource on the computing device.

Optionally, the recording unit is adapted to record the IDs of the available graphics cards on each PCI-E bus in an open linked list, and sort the IDs by the number of available graphics cards on each PCI-E bus.

Optionally, the recording unit performs ascending sorting in an open-link table;

and the scheduling unit is suitable for traversing the open linked list through a depth-first algorithm and judging whether the number of the available display cards on each PCI-E bus meets the number of the display cards applied by the operation.

Optionally, the scheduling unit is further adapted to, when the number of the available display cards on all the PCI-E buses does not satisfy the number of the display cards applied for the job, traverse the open linked list again through a depth-first algorithm, and select, from the plurality of PCI-E buses, the display cards whose number matches the number of the display cards applied for the job as the display card resources allocated to the job.

Optionally, the scheduling unit is adapted to allocate all available display cards on the found first PCI-E bus to the job, determine whether the number of available display cards on the next PCI-E bus meets the number of remaining display cards in the job application, select, if yes, a number of display cards matching the number of display cards in the job application from the PCI-E bus as display card resources allocated to the job, and if not, allocate all available display cards on the PCI-E bus to the job, and determine whether the number of available display cards on the next PCI-E bus meets the number of remaining display cards in the job application until the number of remaining display cards in the job application is met.

Optionally, the recording unit is adapted to delete all available display cards allocated for the job from the open linked list, and reorder the open linked list; and/or the method is suitable for modifying the open linked list according to the released display card resources and reordering the open linked list.

According to the technical scheme, after the display card resources in the distributed cluster are obtained, the number of the available display cards on each PCI-E bus is recorded in one display card resource scheduling table, when the operation containing the number of the applied display cards is received, the display card resource scheduling table is searched, the PCI-E buses which can meet the number of the display cards applied by the operation are selected from the display card resource scheduling table, and the corresponding number of display cards are distributed to the operation from the buses. The technical scheme can ensure that each submitted job is executed by the display card which does not need to carry out communication across the PCI-E bus as far as possible, avoids low efficiency caused by communication across the PCI-E bus, greatly improves the efficiency of deep learning jobs and other job types with high requirements on display card resources, has fine scheduling granularity, and meets the requirements of distributed clusters.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flow diagram illustrating a method for scheduling graphics card resources in a distributed cluster according to one embodiment of the invention;

fig. 2 is a schematic structural diagram illustrating an apparatus for scheduling graphics card resources in a distributed cluster according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a flowchart illustrating a method for scheduling graphics card resources in a distributed cluster according to an embodiment of the present invention, where as shown in fig. 1, the method includes:

and step S110, acquiring the display card resources in the distributed cluster, and recording the number of available display cards on each PCI-E bus in a display card resource scheduling table.

The PCI-E (PCI-Express) bus is a relatively new bus protocol, and in most of the computing devices, devices such as a video card and a network card are connected to the PCI-E bus.

Step S120, receiving the submitted job, where the job includes the number of display cards applied for the job.

And step S130, searching a display card resource scheduling table, and selecting the display cards with the number matched with the number of the display cards applied for the operation from the PCI-E bus as the display card resources distributed to the operation when the number of the available display cards on the PCI-E bus meets the number of the display cards applied for the operation.

In practice it has been found that the efficiency becomes particularly low if multiple graphics cards to which a job is assigned need to communicate across the PCI-E bus, and relatively high if multiple graphics cards are connected to the same PCI-E bus. The present embodiment is proposed to avoid the situation of communication across the bus.

It can be seen that, in the method shown in fig. 1, after the graphics card resources in the distributed cluster are acquired, the number of available graphics cards on each PCI-E bus is recorded in one graphics card resource scheduling table, when a job including the number of the applied graphics cards is received, the graphics card resource scheduling table is searched, a PCI-E bus that can satisfy the number of the graphics cards applied by the job is selected from the schedule table, and the corresponding number of graphics cards are allocated to the job from the bus. The technical scheme can ensure that each submitted job is executed by the display card which does not need to carry out communication across the PCI-E bus as far as possible, avoids low efficiency caused by communication across the PCI-E bus, greatly improves the efficiency of deep learning jobs and other job types with high requirements on display card resources, has fine scheduling granularity, and meets the requirements of distributed clusters.

In an embodiment of the present invention, the acquiring the video card resources in the distributed cluster includes: and reading the display card resources on each computing device deployed in the distributed cluster from the PCI-E bus of the computing device.

For example, all devices on the PCI-E bus are checked through the lspci command, and graphics card resources are screened out.

In an embodiment of the present invention, the recording, in the display card resource scheduling table, the number of available display cards on each PCI-E bus includes: and recording the IDs of the available display cards on each PCI-E bus in an open linked list, and sequencing according to the number of the available display cards on each PCI-E bus.

For example: the affinity of the video cards on the PCI-E bus is high for PCI-E0 [ GPU0, GPU1], PCI-E1 [ GPU2, GPU3] … …, such as GPU0 and GPU 1. Thus, a display card resource scheduling table is obtained. The next work is how to realize the assignment of the display card with high affinity to the job. In an embodiment of the present invention, in the method, the sorting is in an ascending order, and the searching for the display card resource scheduling table includes: and traversing the open linked list through a depth-first algorithm to judge whether the number of the available display cards on each PCI-E bus meets the number of the display cards applied by the operation.

In the above method, if the job requires 1 graphics card, it is obvious that the graphics cards on the PCI-E0 and the PCI-E1 can both satisfy the condition, and taking the above sequence as an example, the GPU0 on the line of the PCI-E0 that is found first can be allocated to the job.

And if the following example: PCI-E0 [ GPU0], PCI-E1 [ GPU1, GPUs 2, GPUs 3], where 2 graphics cards are needed for a job, then the graphics cards on PCI-E0 are not allocated for the job, and GPUs 1, 2 on PCI-E1 are allocated for the job. The depth-first algorithm can save time and quickly schedule to the display card meeting the operation requirement. The problem is that the method can meet the operation with less required number of the display cards, and the operation cannot be processed when the number of the available display cards on all the PCI-E buses does not meet the number of the display cards applied by the operation.

Therefore, in an embodiment of the present invention, in the above method, when the number of available display cards on all PCI-E buses does not satisfy the number of display cards of the job application, the open linked list is traversed again by the depth-first algorithm, and the number of display cards matching the number of display cards of the job application is selected from the multiple PCI-E buses as the display card resource allocated to the job. This solves the problem.

However, a second traversal also introduces new problems. For example, when 4 graphics cards are needed for a job, the graphics card available on PCI-E0 is GPU0, the graphics card available on PCI-E1 is GPU1, the graphics cards available on PCI-E2 are GPU2 and 3, and the graphics cards available on PCI-E3 are GPU4 and GPU 5. Then, it is worth exploring whether the combination of PCI-E2 and PCI-E3 is better or the combination of PCI-E0, PCI-E1 and PCI-E2 is better.

Since both of the above two methods require cross-bus communication of the graphics card, we choose a combination of PCI-E0, PCI-E1, and PCI-E2 in order to reduce the residual fragments. To implement this selection, in an embodiment of the present invention, the above method, in which the traversing the open-linked list again by using a depth-first algorithm, and selecting, from the multiple PCI-E buses, the number of display cards that matches the number of display cards applied for the job as the display card resource allocated to the job includes: distributing all the searched available display cards on the first PCI-E bus to the operation, judging whether the number of the available display cards on the next PCI-E bus meets the number of the remaining display cards applied by the operation, if so, selecting the display cards with the number matched with the number of the display cards applied by the operation from the PCI-E bus as display card resources distributed to the operation, if not, distributing all the available display cards on the PCI-E bus to the operation, and judging whether the number of the available display cards on the next PCI-E bus meets the number of the remaining display cards applied by the operation or not until the number of the remaining display cards applied by the operation is met.

In order to ensure the accuracy of scheduling, in an embodiment of the present invention, in the method, all available display cards allocated to the job are deleted from the open link list, and the open link list is reordered; and/or modifying the open linked list according to the released display card resources and sequencing the open linked list again. This ensures correct implementation of the scheduling algorithm described above.

Fig. 2 is a schematic structural diagram illustrating an apparatus for scheduling graphics card resources in a distributed cluster according to an embodiment of the present invention, and as shown in fig. 2, an apparatus 200 for scheduling graphics card resources in a distributed cluster includes:

the recording unit 210 is adapted to acquire the graphics card resources in the distributed cluster, and record the number of available graphics cards on each PCI-E bus in the graphics card resource scheduling table.

The scheduling unit 220 is adapted to receive the submitted job, where the job includes the number of display cards to which the job applies, search for a display card resource scheduling table, and select, from a PCI-E bus, display cards with a number matching the number of display cards to which the job applies as display card resources allocated to the job when the number of available display cards on the PCI-E bus satisfies the number of display cards to which the job applies.

It can be seen that, in the apparatus shown in fig. 2, through the mutual cooperation of the units, after the graphics card resources in the distributed cluster are acquired, the number of available graphics cards on each PCI-E bus is recorded in one graphics card resource scheduling table, when a job including the number of the graphics cards requested is received, the graphics card resource scheduling table is searched, a PCI-E bus that can satisfy the number of the graphics cards requested by the job is selected from the list, and a corresponding number of graphics cards are allocated to the job from the bus. The technical scheme can ensure that each submitted job is executed by the display card which does not need to carry out communication across the PCI-E bus as far as possible, avoids low efficiency caused by communication across the PCI-E bus, greatly improves the efficiency of deep learning jobs and other job types with high requirements on display card resources, has fine scheduling granularity, and meets the requirements of distributed clusters.

In an embodiment of the present invention, in the above apparatus, the recording unit 210 is adapted to read, from a PCI-E bus of each computing device deployed in the distributed cluster, a graphics card resource on the computing device.

In an embodiment of the present invention, in the above apparatus, the recording unit 210 is adapted to record the IDs of the available graphics cards on each PCI-E bus in an open list and sort the IDs by the number of available graphics cards on each PCI-E bus.

For example: the affinity of the video cards on the PCI-E bus is high for PCI-E0 [ GPU0, GPU1], PCI-E1 [ GPU2, GPU3] … …, such as GPU0 and GPU 1. Thus, a display card resource scheduling table is obtained. The next work is how to realize the assignment of the display card with high affinity to the job. In an embodiment of the present invention, in the above apparatus, the recording unit 210 performs ascending sorting in the open linked list; the scheduling unit 220 is adapted to traverse the open linked list through a depth-first algorithm, and determine whether the number of available display cards on each PCI-E bus satisfies the number of display cards applied for the job.

In the above example, if a job needs 1 graphics card, it is obvious that the graphics cards on the PCI-E0 and the PCI-E1 can both satisfy the condition, and taking the above sequence as an example, the GPU0 on the line of the PCI-E0 that is found first can be allocated to the job.

Therefore, in an embodiment of the present invention, in the above apparatus, the scheduling unit 220 is further adapted to, when the number of available display cards on all PCI-E buses does not satisfy the number of display cards applied for the job, traverse the open-linked list again through a depth-first algorithm, and select, from the multiple PCI-E buses, display cards with the number matching the number of display cards applied for the job as display card resources allocated to the job.

Since both of the above two methods require cross-bus communication of the graphics card, we choose a combination of PCI-E0, PCI-E1, and PCI-E2 in order to reduce the residual fragments. In order to implement this selection, in an embodiment of the present invention, in the above apparatus, the scheduling unit 220 is adapted to allocate all available display cards on the found first PCI-E bus to the job, determine whether the number of available display cards on the next PCI-E bus meets the number of remaining display cards applied for the job, if so, select display cards with a number matching the number of display cards applied for the job from the PCI-E bus as display card resources allocated to the job, if not, allocate all available display cards on the PCI-E bus to the job, and determine whether the number of available display cards on the next PCI-E bus meets the number of remaining display cards applied for the job until the number of remaining display cards applied for the job is met.

In order to ensure the scheduling accuracy, in an embodiment of the present invention, in the apparatus, the recording unit 210 is adapted to delete all available display cards allocated for the job from the open linked list, and reorder the open linked list; and/or the method is suitable for modifying the open linked list according to the released display card resources and sequencing the open linked list again. This ensures correct implementation of the scheduling algorithm described above.

In summary, according to the technical scheme of the present invention, after the graphics card resources in the distributed cluster are obtained, the number of available graphics cards on each PCI-E bus is recorded in one graphics card resource scheduling table, when a job including the number of the applied graphics cards is received, the graphics card resource scheduling table is searched, a PCI-E bus that can satisfy the number of the graphics cards applied by the job is selected from the schedule table, and a corresponding number of graphics cards are allocated to the job from the bus. The technical scheme can ensure that each submitted job is executed by the display card which does not need to carry out communication across the PCI-E bus as far as possible, avoids low efficiency caused by communication across the PCI-E bus, greatly improves the efficiency of deep learning jobs and other job types with high requirements on display card resources, has fine scheduling granularity, and meets the requirements of distributed clusters.

It should be noted that:

the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components of the apparatus for scheduling graphics card resources in a distributed cluster according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The embodiment of the invention discloses A1, a method for scheduling graphics card resources in a distributed cluster, wherein the method comprises the following steps:

A2, the method as in A1, wherein the acquiring the graphics card resources in the distributed cluster comprises:

A3, the method as in A1, wherein the recording the number of available graphics cards on each PCI-E bus in the graphics card resource schedule includes:

A4, the method as in A3, wherein the sorting is in ascending order, and the searching the graphics card resource schedule table comprises:

The method a5, as in a4, wherein when the number of available graphics cards on all PCI-E buses does not satisfy the number of graphics cards applied for the job, the open-linked list is traversed again by a depth-first algorithm, and the graphics cards with the number matching the number of graphics cards applied for the job are selected from the PCI-E buses as the graphics card resources allocated to the job.

A6, the method as in a5, wherein the re-traversing the open-linked list by a depth-first algorithm, and the selecting, from the plurality of PCI-E buses, a number of display cards that matches the number of display cards that are applied for the job as the display card resources allocated to the job comprises:

A7, the method of A3, wherein the method further comprises:

and/or the presence of a gas in the gas,

The embodiment of the invention also discloses B8, a device for scheduling graphics card resources in a distributed cluster, wherein the device comprises:

B9, the device of B8, wherein,

the recording unit is suitable for reading the video card resources on the computing equipment from the PCI-E bus of each computing equipment deployed in the distributed cluster.

B10, the device of B8, wherein,

and the recording unit is suitable for recording the IDs of the available display cards on each PCI-E bus in the open linked list and sequencing the IDs according to the number of the available display cards on each PCI-E bus.

B11, the device of B10, wherein,

the recording unit performs ascending sequencing in the open chain table;

B12, the device of B11, wherein,

the scheduling unit is further adapted to traverse the open linked list again through a depth-first algorithm when the number of the available display cards on all the PCI-E buses does not satisfy the number of the display cards applied for the job, and select the display cards with the number matched with the number of the display cards applied for the job from the PCI-E buses as the display card resources allocated to the job.

B13, the device of B12, wherein,

the dispatching unit is suitable for distributing all the searched available display cards on the first PCI-E bus to the operation, judging whether the number of the available display cards on the next PCI-E bus meets the number of the residual display cards applied by the operation, if so, selecting the display cards with the number matched with the number of the display cards applied by the operation from the PCI-E bus as display card resources distributed to the operation, if not, distributing all the available display cards on the PCI-E bus to the operation, and judging whether the number of the available display cards on the next PCI-E bus meets the number of the residual display cards applied by the operation until the number of the residual display cards applied by the operation is met.

B14, the device of B10, wherein,

the recording unit is suitable for deleting all available display cards distributed for the operation from the open linked list and reordering the open linked list; and/or the method is suitable for modifying the open linked list according to the released display card resources and reordering the open linked list.

Claims

1. A method for scheduling graphics card resources in a distributed cluster, wherein the method comprises:

searching the display card resource scheduling table, and selecting the display cards with the number matched with the number of the display cards applied for the operation from the PCI-E bus as the display card resources distributed to the operation when the number of the available display cards on the PCI-E bus meets the number of the display cards applied for the operation;

and if so, selecting the display cards with the number matched with the number of the display cards applied for the operation from the PCI-E bus as display card resources allocated to the operation.

2. The method of claim 1, wherein the obtaining graphics card resources in a distributed cluster comprises:

3. The method of claim 1, wherein the recording the number of available graphics cards on each PCI-E bus in the graphics card resource schedule comprises:

4. The method of claim 3, wherein the ordering is in ascending order, and the finding the schedule of graphics card resources comprises:

5. The method of claim 4, wherein when the number of available graphics cards on all PCI-E buses does not satisfy the number of graphics cards of the job application, the open list is traversed again through a depth-first algorithm, and the number of graphics cards matching the number of graphics cards of the job application is selected from the plurality of PCI-E buses as the graphics card resources allocated to the job.

6. The method of claim 5, wherein re-traversing the open-linked list through a depth-first algorithm, the selecting a number of display cards from the plurality of PCI-E buses that matches the number of display cards applied for the job as the display card resource allocated to the job comprises:

and allocating all the searched available display cards on the first PCI-E bus to the operation, judging whether the number of the available display cards on the next PCI-E bus meets the number of the remaining display cards applied by the operation, if not, allocating all the available display cards on the PCI-E bus to the operation, and judging whether the number of the available display cards on the next PCI-E bus meets the number of the remaining display cards applied by the operation until the number of the remaining display cards applied by the operation is met.

7. The method of claim 3, wherein the method further comprises:

and/or the presence of a gas in the gas,

8. An apparatus for scheduling graphics card resources in a distributed cluster, the apparatus comprising:

the scheduling unit is suitable for receiving submitted jobs, the jobs comprise the number of the display cards applied for the jobs, the display card resource scheduling table is searched, and when the number of the available display cards on one PCI-E bus meets the number of the display cards applied for the jobs, the display cards with the number matched with the number of the display cards applied for the jobs are selected from the PCI-E bus to serve as display card resources distributed to the jobs;

the scheduling unit is suitable for allocating all the searched available display cards on the first PCI-E bus to the operation, judging whether the number of the available display cards on the next PCI-E bus meets the number of the remaining display cards applied by the operation, and if so, selecting the display cards with the number matched with the number of the display cards applied by the operation from the PCI-E bus as the display card resources allocated to the operation.

9. The apparatus of claim 8, wherein,

10. The apparatus of claim 8, wherein,

11. The apparatus of claim 10, wherein,

the recording unit performs ascending sequencing in the open chain table;

12. The apparatus of claim 11, wherein,

13. The apparatus of claim 12, wherein,

the scheduling unit is suitable for allocating all the searched available display cards on the first PCI-E bus to the operation, judging whether the number of the available display cards on the next PCI-E bus meets the number of the residual display cards applied by the operation, if not, allocating all the available display cards on the PCI-E bus to the operation, and judging whether the number of the available display cards on the next PCI-E bus meets the number of the residual display cards applied by the operation until the number of the residual display cards applied by the operation is met.

14. The apparatus of claim 10, wherein,