CN113742064A - Resource arrangement method, system, equipment and medium for server cluster - Google Patents
Resource arrangement method, system, equipment and medium for server cluster Download PDFInfo
- Publication number
- CN113742064A CN113742064A CN202110904366.1A CN202110904366A CN113742064A CN 113742064 A CN113742064 A CN 113742064A CN 202110904366 A CN202110904366 A CN 202110904366A CN 113742064 A CN113742064 A CN 113742064A
- Authority
- CN
- China
- Prior art keywords
- node
- list
- gpus
- nodes
- screening
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000012216 screening Methods 0.000 claims abstract description 134
- 230000004044 response Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims 3
- 238000013473 artificial intelligence Methods 0.000 description 7
- 239000012634 fragment Substances 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000013467 fragmentation Methods 0.000 description 3
- 238000006062 fragmentation reaction Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000007596 consolidation process Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a resource sorting method, a system, equipment and a medium of a server cluster, wherein the method comprises the following steps: acquiring a node list of a server cluster and a task list corresponding to each node in the node list; screening the nodes in the node list according to the number of the residual GPUs and the computing environments in the task list corresponding to the GPUs in the residual GPUs, and generating a screening list; selecting two nodes from the screening list, deleting the creating information corresponding to the residual GPUs of one node from the task list of the node, and generating the creating information of the residual GPUs in the task list of the other node; the two nodes are removed from the filter list and the step of selecting two nodes from the filter list is returned until there is no node or only one node remains in the filter list. By the scheme of the invention, the nodes in the server cluster are ensured to have sufficient GPU idle resources, the application of subsequent tasks is supported, the realization is simple, complex operation is not needed, and the original codes are not damaged.
Description
Technical Field
The present invention relates to the technical field of computing resource management, and in particular, to a method, a system, a device, and a medium for organizing resources of a server cluster.
Background
With the continuous development of Artificial Intelligence technology and the continuous promotion of industrial AI (Artificial Intelligence), more and more industrial users begin to build their own AI resource management platforms to support the development and operation of the enterprise AI services, and perform resource allocation and computing environment creation in a Docker container binding manner.
In the face of multi-user server cluster resource scheduling allocation, GPU fragment resources often appear after long-time operation. When the server cluster executes the tasks to be processed, the number of GPUs required by the tasks to be processed is smaller than the number of GPUs installed in each GPU server, and GPU fragments occur. In order to solve the problem of GPU fragmentation, the conventional AI computing platform has two types of methods: one type is that the resource application configuration of the user needs to have a uniform specification, the resource application configuration with the uniform specification limits the use of the user, and the problem of waiting for the idle resource application or no resource in the specification can also exist; the other type is that a distributed framework is used, for example, the calculation tasks of two single cards of a single machine are transformed into the distributed tasks of two single cards of a double machine, the mode of the distributed framework needs to be transformed in an intrusive mode, the operation threshold is higher for a user, and meanwhile, the computation precision and performance after transformation are difficult to guarantee. Therefore, the invention provides a GPU fragment resource sorting method.
Disclosure of Invention
In view of this, the invention provides a resource arrangement method, system, device and medium for a server cluster, which ensure that nodes in the server cluster have sufficient GPU idle resources by integrating fragmented resources, effectively support continuous application of subsequent tasks, are simple to implement, do not require complex operations, and do not destroy original codes of the server cluster.
Based on the above object, an aspect of the embodiments of the present invention provides a resource arrangement method for a server cluster, which specifically includes the following steps:
acquiring a node list of a server cluster and a task list corresponding to each node in the node list, wherein the node comprises a GPU, and the task list comprises creation information and a computing environment of each GPU in the node;
screening the nodes in the node list according to the number of the residual GPUs and the computing environments in the task list corresponding to the GPUs in the residual GPUs, and generating a screening list;
selecting two nodes from the screening list, deleting the creating information corresponding to the residual GPU of one node from the task list of the node, and generating the creating information of the residual GPU in the task list of the other node;
and deleting the two nodes from the screening list, and returning to the step of selecting two nodes from the screening list until no node or only one node remains in the screening list.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
screening out nodes with idle preset number of GPUs from the node list;
and screening out the GPU containing the preset card number computing environment from the node list based on the computing environment in the task list corresponding to the nodes with the idle preset number of GPUs.
In some embodiments, screening the node list for a preset number of free GPUs includes:
and screening out nodes with 1 GPU in idle from the node list.
In some embodiments, screening out GPUs including a preset card number computing environment from the node list based on the computing environments in the task list corresponding to the nodes with the free preset number of GPUs includes:
and screening out the GPU containing 1-card computing environment from the node list based on the computing environment in the task list corresponding to the nodes of the idle 1 GPU.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 1 GPU and the GPU containing 1 card computing environment, screening out nodes with 2 free GPUs from the node list;
and screening the GPU containing 2-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the idle 2 GPUs, and generating a screening list of the idle 2 GPUs.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 2 GPUs which comprise 2-card computing environments, screening out nodes with free 4 GPUs from the node list;
and screening the GPU containing 4-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the free 4 GPUs, and generating a screening list of the free 4 GPUs.
In some implementations, the 2-card computing environment includes a 1-2 or 2-1-card computing environment, and the 4-card computing environment includes a 4-1-card, or 2-2, or 2-1-and 1-2, or 1-and 1-3, or 1-4-card computing environment.
In another aspect of the embodiments of the present invention, a resource arrangement system for a server cluster is further provided, including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is configured to acquire a node list of a server cluster and a task list corresponding to each node in the node list, the node comprises a GPU, and the task list comprises creation information and a computing environment of each GPU in the node;
the screening module is configured to screen the nodes in the node list according to the number of the remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs, and generate a screening list;
the sorting module is configured to select two nodes from the screening list, delete the creation information corresponding to the remaining GPUs of one node from the task list of the node, and generate the creation information of the remaining GPUs in the task list of the other node;
a return sorting module configured to delete the two nodes from the filter list and return to the step of selecting two nodes from the filter list until there is no node or only one node remains in the filter list.
In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing a computer program executable on the processor, the computer program when executed by the processor implementing the steps of the method as above.
In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.
The invention has the following beneficial technical effects: the method has the advantages that the nodes in the server cluster are guaranteed to have sufficient GPU idle resources by integrating the fragment resources, follow-up task continuous application is effectively supported, the method is simple to implement, complex operation is not needed, and original codes of the server cluster cannot be damaged.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a block diagram of an embodiment of a resource arrangement method for a server cluster according to the present invention;
FIG. 2 is a schematic diagram of an embodiment of a resource arrangement system of a server cluster provided in the present invention;
FIG. 3 is a schematic structural diagram of an embodiment of a computer device provided in the present invention;
fig. 4 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
Based on the above purpose, a first aspect of the embodiments of the present invention provides an embodiment of a resource arrangement method for a server cluster. As shown in fig. 1, it includes the following steps:
s101, a node list of a server cluster and a task list corresponding to each node in the node list are obtained, wherein the node comprises a GPU, and the task list comprises creation information and a computing environment of each GPU in the nodes;
s103, screening the nodes in the node list according to the number of the residual GPUs and the computing environments in the task list corresponding to the GPUs in the residual GPUs, and generating a screening list;
s105, selecting two nodes from the screening list, deleting the creating information corresponding to the residual GPU of one node from the task list of the node, and generating the creating information of the residual GPU in the task list of the other node;
s107, deleting the two nodes from the screening list, and returning to the step of selecting two nodes from the screening list until no node or only one node remains in the screening list.
Taking a GPU server cluster for AI calculation as an example, each node in the server cluster is 1 GPU server, and each GPU server may include 2, 4, 8, 16, 32, or even more GPUs. Each GPU receives a schedule of a cluster of servers to perform a task to be processed.
Specifically, a resource management interface is called to obtain a node list of a server cluster and a task list corresponding to each node in the node list, wherein the node is a GPU server and comprises GPUs, the task list comprises creation information and computing environments of each GPU in each node, the creation information comprises docker images, the number of CPUs (central processing units), the number of GPUs, directory mounting and the like, and the computing environments refer to the fact that several GPUs are needed to execute computing tasks contained by the current GPU;
assuming that 4 nodes a, b, c, d, a currently contain 8 GPUs and execute a computation task requiring 6 GPUs, b contains 8 GPUs and execute a computation task requiring 8 GPUs, c contains 8 GPUs and execute a computation task requiring 7 GPUs, and d contains 8 GPUs and execute a computation task requiring 16 GPUs, screening out the remaining nodes of the GPUs, namely screening out GPU fragments in the current node, namely a and c, then screening out eligible nodes from a and c after checking the computation tasks contained in the GPU fragments in a and c, and forming a screening list. In an actual application scenario, there are many servers connected in the server cluster, and the number of the servers is not limited to the above 4. Suppose that 11 eligible nodes are screened out from the current screening list, two nodes 1 and 2 are selected from the screening list, the creation information corresponding to the GPU fragmentation in the node 1 is deleted from the task list of the node 1, the creation information of the GPU fragmentation of the node 1 is newly created in the task list of the node 2, then the nodes 1 and 2 are deleted from the screening list, and two nodes are continuously selected from the screening list for sorting until one node remains in the screening list.
According to the embodiment, the GPU fragmented resources in the nodes are eliminated by sorting the GPU fragmented resources, the nodes in the server cluster are guaranteed to have sufficient GPU idle resources, continuous application of subsequent tasks is effectively supported, the implementation is simple, complex operation is not needed, and original codes of the server cluster cannot be damaged.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
screening out nodes with idle preset number of GPUs from the node list;
and screening out the GPU containing the preset card number computing environment from the node list based on the computing environment in the task list corresponding to the nodes with the idle preset number of GPUs.
In some embodiments, screening the node list for a preset number of free GPUs includes:
and screening out nodes with 1 GPU in idle from the node list.
In some embodiments, screening out GPUs including a preset card number computing environment from the node list based on the computing environments in the task list corresponding to the nodes with the free preset number of GPUs includes:
and screening out the GPU containing 1-card computing environment from the node list based on the computing environment in the task list corresponding to the nodes of the idle 1 GPU.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 1 GPU and the GPU containing 1 card computing environment, screening out nodes with 2 free GPUs from the node list;
and screening the GPU containing 2-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the idle 2 GPUs, and generating a screening list of the idle 2 GPUs.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 2 GPUs which comprise 2-card computing environments, screening out nodes with free 4 GPUs from the node list;
and screening the GPU containing 4-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the free 4 GPUs, and generating a screening list of the free 4 GPUs.
In some implementations, the 2-card computing environment includes a 1-2 or 2-1-card computing environment, and the 4-card computing environment includes a 4-1-card, or 2-2, or 2-1-and 1-2, or 1-and 1-3, or 1-4-card computing environment.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 2, an embodiment of the present invention further provides a resource consolidation system for a server cluster, including:
an obtaining module 110, configured to obtain a node list of a server cluster and a task list corresponding to each node in the node list, where the node includes a GPU, and the task list includes creation information and a computing environment of each GPU in the node;
a screening module 120 configured to screen the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the GPUs in the remaining GPUs, and generate a screening list;
a sorting module 130 configured to select two nodes from the filter list, delete the creation information corresponding to the remaining GPUs of one of the nodes from the task list of the node, and generate the creation information of the remaining GPUs in the task list of the other node;
a return sorting module 140 configured to delete the two nodes from the filter list and return the step of selecting two nodes from the filter list until there is no node or only one node remaining in the filter list.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
screening out nodes with idle preset number of GPUs from the node list;
and screening out the GPU containing the preset card number computing environment from the node list based on the computing environment in the task list corresponding to the nodes with the idle preset number of GPUs.
In some embodiments, screening the node list for a preset number of free GPUs includes:
and screening out nodes with 1 GPU in idle from the node list.
In some embodiments, screening out GPUs including a preset card number computing environment from the node list based on the computing environments in the task list corresponding to the nodes with the free preset number of GPUs includes:
and screening out the GPU containing 1-card computing environment from the node list based on the computing environment in the task list corresponding to the nodes of the idle 1 GPU.
In some embodiments, the system further comprises a sorting submodule configured to:
in response to finishing the arrangement of the nodes of the remaining 1 GPU and the GPU containing 1 card computing environment, screening out nodes with 2 free GPUs from the node list;
and screening the GPU containing 2-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the idle 2 GPUs, and generating a screening list of the idle 2 GPUs.
In some embodiments, the collation sub-module is further configured to:
in response to finishing the arrangement of the nodes of the remaining 2 GPUs which comprise 2-card computing environments, screening out nodes with free 4 GPUs from the node list;
and screening the GPU containing 4-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the free 4 GPUs, and generating a screening list of the free 4 GPUs.
In some implementations, the 2-card computing environment includes a 1-2 or 2-1-card computing environment, and the 4-card computing environment includes a 4-1-card, or 2-2, or 2-1-and 1-2, or 1-and 1-3, or 1-4-card computing environment.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, an embodiment of the present invention further provides a computer device 20, in which the computer device 20 includes a processor 210 and a memory 220, the memory 220 stores a computer program 221 executable on the processor, and the processor 210 executes the program to perform the following method steps:
acquiring a node list of a server cluster and a task list corresponding to each node in the node list, wherein the node comprises a GPU, and the task list comprises creation information and a computing environment of each GPU in the node;
screening the nodes in the node list according to the number of the residual GPUs and the computing environments in the task list corresponding to the GPUs in the residual GPUs, and generating a screening list;
selecting two nodes from the screening list, deleting the creating information corresponding to the residual GPU of one node from the task list of the node, and generating the creating information of the residual GPU in the task list of the other node;
and deleting the two nodes from the screening list, and returning to the step of selecting two nodes from the screening list until no node or only one node remains in the screening list.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
screening out nodes with idle preset number of GPUs from the node list;
and screening out the GPU containing the preset card number computing environment from the node list based on the computing environment in the task list corresponding to the nodes with the idle preset number of GPUs.
In some embodiments, screening the node list for a preset number of free GPUs includes:
and screening out nodes with 1 GPU in idle from the node list.
In some embodiments, screening out GPUs including a preset card number computing environment from the node list based on the computing environments in the task list corresponding to the nodes with the free preset number of GPUs includes:
and screening out the GPU containing 1-card computing environment from the node list based on the computing environment in the task list corresponding to the nodes of the idle 1 GPU.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 1 GPU and the GPU containing 1 card computing environment, screening out nodes with 2 free GPUs from the node list;
and screening the GPU containing 2-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the idle 2 GPUs, and generating a screening list of the idle 2 GPUs.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 2 GPUs which comprise 2-card computing environments, screening out nodes with free 4 GPUs from the node list;
and screening the GPU containing 4-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the free 4 GPUs, and generating a screening list of the free 4 GPUs.
In some implementations, the 2-card computing environment includes a 1-2 or 2-1-card computing environment, and the 4-card computing environment includes a 4-1-card, or 2-2, or 2-1-and 1-2, or 1-and 1-3, or 1-4-card computing environment.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, an embodiment of the present invention further provides a computer-readable storage medium 30, the computer-readable storage medium 30 storing a computer program 310 which, when executed by a processor, performs the following method:
acquiring a node list of a server cluster and a task list corresponding to each node in the node list, wherein the node comprises a GPU, and the task list comprises creation information and a computing environment of each GPU in the node;
screening the nodes in the node list according to the number of the residual GPUs and the computing environments in the task list corresponding to the GPUs in the residual GPUs, and generating a screening list;
selecting two nodes from the screening list, deleting the creating information corresponding to the residual GPU of one node from the task list of the node, and generating the creating information of the residual GPU in the task list of the other node;
and deleting the two nodes from the screening list, and returning to the step of selecting two nodes from the screening list until no node or only one node remains in the screening list.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
screening out nodes with idle preset number of GPUs from the node list;
and screening out the GPU containing the preset card number computing environment from the node list based on the computing environment in the task list corresponding to the nodes with the idle preset number of GPUs.
In some embodiments, screening the node list for a preset number of free GPUs includes:
and screening out nodes with 1 GPU in idle from the node list.
In some embodiments, screening out GPUs including a preset card number computing environment from the node list based on the computing environments in the task list corresponding to the nodes with the free preset number of GPUs includes:
and screening out the GPU containing 1-card computing environment from the node list based on the computing environment in the task list corresponding to the nodes of the idle 1 GPU.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 1 GPU and the GPU containing 1 card computing environment, screening out nodes with 2 free GPUs from the node list;
and screening the GPU containing 2-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the idle 2 GPUs, and generating a screening list of the idle 2 GPUs.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 2 GPUs which comprise 2-card computing environments, screening out nodes with free 4 GPUs from the node list;
and screening the GPU containing 4-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the free 4 GPUs, and generating a screening list of the free 4 GPUs.
In some implementations, the 2-card computing environment includes a 1-2 or 2-1-card computing environment, and the 4-card computing environment includes a 4-1-card, or 2-2, or 2-1-and 1-2, or 1-and 1-3, or 1-4-card computing environment.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.
Claims (10)
1. A resource arrangement method for a server cluster is characterized by comprising the following steps:
acquiring a node list of a server cluster and a task list corresponding to each node in the node list, wherein the node comprises a GPU, and the task list comprises creation information and a computing environment of each GPU in the node;
screening the nodes in the node list according to the number of the residual GPUs and the computing environments in the task list corresponding to the residual GPUs, and generating a screening list;
selecting two nodes from the screening list, deleting the creation information corresponding to the residual GPU of one node from the task list of the node, and generating the creation information of the residual GPU in the task list of the other node;
and deleting the two nodes from the screening list, and returning to the step of selecting two nodes from the screening list until no node or only one node remains in the screening list.
2. The method of claim 1, wherein the step of filtering the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs comprises:
screening out nodes with idle preset number of GPUs from the node list;
and screening out the GPU containing the preset card number computing environment from the node list based on the computing environment in the task list corresponding to the nodes with the idle preset number of GPUs.
3. The method of claim 2, wherein screening the node list for a preset number of free GPUs comprises:
and screening out nodes with 1 GPU in idle from the node list.
4. The method of claim 3, wherein screening out the GPU containing a preset number of GPU computing environments from the node list based on the computing environments in the task list corresponding to the nodes with the preset number of free GPUs comprises:
and screening out the GPU containing 1-card computing environment from the node list based on the computing environment in the task list corresponding to the nodes of the idle 1 GPU.
5. The method of claim 4, wherein the step of filtering the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs comprises:
in response to finishing the arrangement of the nodes of the remaining 1 GPU and the GPU containing 1 card computing environment, screening out nodes with 2 free GPUs from the node list;
and screening the GPU containing 2-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the idle 2 GPUs, and generating a screening list of the idle 2 GPUs.
6. The method of claim 5, wherein the step of filtering the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs comprises:
in response to finishing the arrangement of the nodes of the remaining 2 GPUs which comprise 2-card computing environments, screening out nodes with free 4 GPUs from the node list;
and screening the GPU containing 4-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the free 4 GPUs, and generating a screening list of the free 4 GPUs.
7. The method of claim 6, wherein the 2-card computing environment comprises a 1-2-card or 2-1-card computing environment, and wherein the 4-card computing environment comprises a 4-1-card, or 2-card, or 2-1-card and 1-2-card, or 1-card and 1-3-card, or 1-4-card computing environment.
8. A resource collating system for a server cluster, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is configured to acquire a node list of a server cluster and a task list corresponding to each node in the node list, the node comprises a GPU, and the task list comprises creation information and a computing environment of each GPU in the node;
the screening module is configured to screen the nodes in the node list according to the number of the remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs, and generate a screening list;
a sorting module configured to select two nodes from the filter list, delete the creation information corresponding to the remaining GPUs of one of the nodes from the task list of the node, and generate the creation information of the remaining GPUs in the task list of the other node;
a return sorting module configured to delete the two nodes from the filter list and return to the step of selecting two nodes from the filter list until there is no node or only one node remains in the filter list.
9. A computer device, comprising:
at least one processor; and
memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of the method according to any of claims 1-7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110904366.1A CN113742064B (en) | 2021-08-06 | 2021-08-06 | Resource arrangement method, system, equipment and medium of server cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110904366.1A CN113742064B (en) | 2021-08-06 | 2021-08-06 | Resource arrangement method, system, equipment and medium of server cluster |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113742064A true CN113742064A (en) | 2021-12-03 |
CN113742064B CN113742064B (en) | 2023-08-04 |
Family
ID=78730587
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110904366.1A Active CN113742064B (en) | 2021-08-06 | 2021-08-06 | Resource arrangement method, system, equipment and medium of server cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113742064B (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160314008A1 (en) * | 2013-12-31 | 2016-10-27 | Huawei Technologies Co., Ltd. | Method for implementing gpu virtualization and related apparatus, and system |
CN107577534A (en) * | 2017-08-31 | 2018-01-12 | 郑州云海信息技术有限公司 | A kind of resource regulating method and device |
CN107590002A (en) * | 2017-09-15 | 2018-01-16 | 东软集团股份有限公司 | Method for allocating tasks, device, storage medium, equipment and distributed task scheduling system |
CN108363623A (en) * | 2018-02-27 | 2018-08-03 | 郑州云海信息技术有限公司 | GPU resource dispatching method, device, equipment and computer readable storage medium |
CN109144710A (en) * | 2017-06-16 | 2019-01-04 | 中国移动通信有限公司研究院 | Resource regulating method, device and computer readable storage medium |
CN109726008A (en) * | 2017-10-31 | 2019-05-07 | 阿里巴巴集团控股有限公司 | Resource allocation methods and equipment |
CN110413412A (en) * | 2019-07-19 | 2019-11-05 | 苏州浪潮智能科技有限公司 | A kind of method and apparatus based on GPU cluster resource allocation |
CN110688218A (en) * | 2019-09-05 | 2020-01-14 | 广东浪潮大数据研究有限公司 | Resource scheduling method and device |
CN111324457A (en) * | 2020-02-15 | 2020-06-23 | 苏州浪潮智能科技有限公司 | Method, device, equipment and medium for issuing inference service in GPU cluster |
CN112272203A (en) * | 2020-09-18 | 2021-01-26 | 苏州浪潮智能科技有限公司 | Cluster service node selection method, system, terminal and storage medium |
CN112463349A (en) * | 2021-01-28 | 2021-03-09 | 北京睿企信息科技有限公司 | Load balancing method and system for efficiently scheduling GPU (graphics processing Unit) capability |
CN112486689A (en) * | 2020-12-10 | 2021-03-12 | 苏州浪潮智能科技有限公司 | Resource management platform resource recovery method, device, equipment and readable medium |
CN112860396A (en) * | 2021-01-28 | 2021-05-28 | 福建紫辰信息科技有限公司 | GPU (graphics processing Unit) scheduling method and system based on distributed deep learning |
CN113204428A (en) * | 2021-05-28 | 2021-08-03 | 北京市商汤科技开发有限公司 | Resource scheduling method, device, electronic equipment and computer readable storage medium |
-
2021
- 2021-08-06 CN CN202110904366.1A patent/CN113742064B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160314008A1 (en) * | 2013-12-31 | 2016-10-27 | Huawei Technologies Co., Ltd. | Method for implementing gpu virtualization and related apparatus, and system |
CN109144710A (en) * | 2017-06-16 | 2019-01-04 | 中国移动通信有限公司研究院 | Resource regulating method, device and computer readable storage medium |
CN107577534A (en) * | 2017-08-31 | 2018-01-12 | 郑州云海信息技术有限公司 | A kind of resource regulating method and device |
CN107590002A (en) * | 2017-09-15 | 2018-01-16 | 东软集团股份有限公司 | Method for allocating tasks, device, storage medium, equipment and distributed task scheduling system |
CN109726008A (en) * | 2017-10-31 | 2019-05-07 | 阿里巴巴集团控股有限公司 | Resource allocation methods and equipment |
CN108363623A (en) * | 2018-02-27 | 2018-08-03 | 郑州云海信息技术有限公司 | GPU resource dispatching method, device, equipment and computer readable storage medium |
CN110413412A (en) * | 2019-07-19 | 2019-11-05 | 苏州浪潮智能科技有限公司 | A kind of method and apparatus based on GPU cluster resource allocation |
CN110688218A (en) * | 2019-09-05 | 2020-01-14 | 广东浪潮大数据研究有限公司 | Resource scheduling method and device |
CN111324457A (en) * | 2020-02-15 | 2020-06-23 | 苏州浪潮智能科技有限公司 | Method, device, equipment and medium for issuing inference service in GPU cluster |
CN112272203A (en) * | 2020-09-18 | 2021-01-26 | 苏州浪潮智能科技有限公司 | Cluster service node selection method, system, terminal and storage medium |
CN112486689A (en) * | 2020-12-10 | 2021-03-12 | 苏州浪潮智能科技有限公司 | Resource management platform resource recovery method, device, equipment and readable medium |
CN112463349A (en) * | 2021-01-28 | 2021-03-09 | 北京睿企信息科技有限公司 | Load balancing method and system for efficiently scheduling GPU (graphics processing Unit) capability |
CN112860396A (en) * | 2021-01-28 | 2021-05-28 | 福建紫辰信息科技有限公司 | GPU (graphics processing Unit) scheduling method and system based on distributed deep learning |
CN113204428A (en) * | 2021-05-28 | 2021-08-03 | 北京市商汤科技开发有限公司 | Resource scheduling method, device, electronic equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113742064B (en) | 2023-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110162413B (en) | Event-driven method and device | |
US20220066813A1 (en) | Dynamically generating an optimized processing pipeline for tasks | |
US20200364258A1 (en) | Container Image Size Reduction Via Runtime Analysis | |
CN104699363B (en) | A kind of window interface shows method and system | |
CN106557307B (en) | Service data processing method and system | |
CN110633135A (en) | Asynchronous task allocation method and device, computer equipment and storage medium | |
CN111400005A (en) | Data processing method and device and electronic equipment | |
CN111158800B (en) | Method and device for constructing task DAG based on mapping relation | |
CN109977168A (en) | The method for synchronizing data of database and equipment preloaded based on data page | |
CN112650449B (en) | Method and system for releasing cache space, electronic device and storage medium | |
CN112068812B (en) | Micro-service generation method and device, computer equipment and storage medium | |
CN113742064A (en) | Resource arrangement method, system, equipment and medium for server cluster | |
CN114721801A (en) | Dynamic scheduling method and device for batch task execution time | |
CN114978686A (en) | Digital asset chaining method and device | |
CN112395081B (en) | Online automatic resource recycling method, system, server and storage medium | |
CN114237902A (en) | Service deployment method and device, electronic equipment and computer readable medium | |
CN114595047A (en) | Batch task processing method and device | |
CN112905223A (en) | Method, device and equipment for generating upgrade package | |
CN111782363A (en) | Method and flow system for supporting multi-service scene calling | |
CN113742052B (en) | Batch task processing method and device | |
CN117539451B (en) | Flow execution method, device, electronic equipment and storage medium | |
CN111414162B (en) | Data processing method, device and equipment thereof | |
CN114553700A (en) | Equipment grouping method and device, computer equipment and storage medium | |
US20210158644A1 (en) | Peer partitioning to reduce strategy-driven bias in automated peer-selection systems | |
CN116501445A (en) | Virtual machine creation scheduling method, system, equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |