CN113742064A - Resource arrangement method, system, equipment and medium for server cluster - Google Patents

Resource arrangement method, system, equipment and medium for server cluster Download PDF

Info

Publication number
CN113742064A
CN113742064A CN202110904366.1A CN202110904366A CN113742064A CN 113742064 A CN113742064 A CN 113742064A CN 202110904366 A CN202110904366 A CN 202110904366A CN 113742064 A CN113742064 A CN 113742064A
Authority
CN
China
Prior art keywords
node
list
gpus
nodes
screening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110904366.1A
Other languages
Chinese (zh)
Other versions
CN113742064B (en
Inventor
胡叶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202110904366.1A priority Critical patent/CN113742064B/en
Publication of CN113742064A publication Critical patent/CN113742064A/en
Application granted granted Critical
Publication of CN113742064B publication Critical patent/CN113742064B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a resource sorting method, a system, equipment and a medium of a server cluster, wherein the method comprises the following steps: acquiring a node list of a server cluster and a task list corresponding to each node in the node list; screening the nodes in the node list according to the number of the residual GPUs and the computing environments in the task list corresponding to the GPUs in the residual GPUs, and generating a screening list; selecting two nodes from the screening list, deleting the creating information corresponding to the residual GPUs of one node from the task list of the node, and generating the creating information of the residual GPUs in the task list of the other node; the two nodes are removed from the filter list and the step of selecting two nodes from the filter list is returned until there is no node or only one node remains in the filter list. By the scheme of the invention, the nodes in the server cluster are ensured to have sufficient GPU idle resources, the application of subsequent tasks is supported, the realization is simple, complex operation is not needed, and the original codes are not damaged.

Description

Resource arrangement method, system, equipment and medium for server cluster
Technical Field
The present invention relates to the technical field of computing resource management, and in particular, to a method, a system, a device, and a medium for organizing resources of a server cluster.
Background
With the continuous development of Artificial Intelligence technology and the continuous promotion of industrial AI (Artificial Intelligence), more and more industrial users begin to build their own AI resource management platforms to support the development and operation of the enterprise AI services, and perform resource allocation and computing environment creation in a Docker container binding manner.
In the face of multi-user server cluster resource scheduling allocation, GPU fragment resources often appear after long-time operation. When the server cluster executes the tasks to be processed, the number of GPUs required by the tasks to be processed is smaller than the number of GPUs installed in each GPU server, and GPU fragments occur. In order to solve the problem of GPU fragmentation, the conventional AI computing platform has two types of methods: one type is that the resource application configuration of the user needs to have a uniform specification, the resource application configuration with the uniform specification limits the use of the user, and the problem of waiting for the idle resource application or no resource in the specification can also exist; the other type is that a distributed framework is used, for example, the calculation tasks of two single cards of a single machine are transformed into the distributed tasks of two single cards of a double machine, the mode of the distributed framework needs to be transformed in an intrusive mode, the operation threshold is higher for a user, and meanwhile, the computation precision and performance after transformation are difficult to guarantee. Therefore, the invention provides a GPU fragment resource sorting method.
Disclosure of Invention
In view of this, the invention provides a resource arrangement method, system, device and medium for a server cluster, which ensure that nodes in the server cluster have sufficient GPU idle resources by integrating fragmented resources, effectively support continuous application of subsequent tasks, are simple to implement, do not require complex operations, and do not destroy original codes of the server cluster.
Based on the above object, an aspect of the embodiments of the present invention provides a resource arrangement method for a server cluster, which specifically includes the following steps:
acquiring a node list of a server cluster and a task list corresponding to each node in the node list, wherein the node comprises a GPU, and the task list comprises creation information and a computing environment of each GPU in the node;
screening the nodes in the node list according to the number of the residual GPUs and the computing environments in the task list corresponding to the GPUs in the residual GPUs, and generating a screening list;
selecting two nodes from the screening list, deleting the creating information corresponding to the residual GPU of one node from the task list of the node, and generating the creating information of the residual GPU in the task list of the other node;
and deleting the two nodes from the screening list, and returning to the step of selecting two nodes from the screening list until no node or only one node remains in the screening list.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
screening out nodes with idle preset number of GPUs from the node list;
and screening out the GPU containing the preset card number computing environment from the node list based on the computing environment in the task list corresponding to the nodes with the idle preset number of GPUs.
In some embodiments, screening the node list for a preset number of free GPUs includes:
and screening out nodes with 1 GPU in idle from the node list.
In some embodiments, screening out GPUs including a preset card number computing environment from the node list based on the computing environments in the task list corresponding to the nodes with the free preset number of GPUs includes:
and screening out the GPU containing 1-card computing environment from the node list based on the computing environment in the task list corresponding to the nodes of the idle 1 GPU.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 1 GPU and the GPU containing 1 card computing environment, screening out nodes with 2 free GPUs from the node list;
and screening the GPU containing 2-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the idle 2 GPUs, and generating a screening list of the idle 2 GPUs.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 2 GPUs which comprise 2-card computing environments, screening out nodes with free 4 GPUs from the node list;
and screening the GPU containing 4-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the free 4 GPUs, and generating a screening list of the free 4 GPUs.
In some implementations, the 2-card computing environment includes a 1-2 or 2-1-card computing environment, and the 4-card computing environment includes a 4-1-card, or 2-2, or 2-1-and 1-2, or 1-and 1-3, or 1-4-card computing environment.
In another aspect of the embodiments of the present invention, a resource arrangement system for a server cluster is further provided, including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is configured to acquire a node list of a server cluster and a task list corresponding to each node in the node list, the node comprises a GPU, and the task list comprises creation information and a computing environment of each GPU in the node;
the screening module is configured to screen the nodes in the node list according to the number of the remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs, and generate a screening list;
the sorting module is configured to select two nodes from the screening list, delete the creation information corresponding to the remaining GPUs of one node from the task list of the node, and generate the creation information of the remaining GPUs in the task list of the other node;
a return sorting module configured to delete the two nodes from the filter list and return to the step of selecting two nodes from the filter list until there is no node or only one node remains in the filter list.
In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing a computer program executable on the processor, the computer program when executed by the processor implementing the steps of the method as above.
In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.
The invention has the following beneficial technical effects: the method has the advantages that the nodes in the server cluster are guaranteed to have sufficient GPU idle resources by integrating the fragment resources, follow-up task continuous application is effectively supported, the method is simple to implement, complex operation is not needed, and original codes of the server cluster cannot be damaged.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a block diagram of an embodiment of a resource arrangement method for a server cluster according to the present invention;
FIG. 2 is a schematic diagram of an embodiment of a resource arrangement system of a server cluster provided in the present invention;
FIG. 3 is a schematic structural diagram of an embodiment of a computer device provided in the present invention;
fig. 4 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
Based on the above purpose, a first aspect of the embodiments of the present invention provides an embodiment of a resource arrangement method for a server cluster. As shown in fig. 1, it includes the following steps:
s101, a node list of a server cluster and a task list corresponding to each node in the node list are obtained, wherein the node comprises a GPU, and the task list comprises creation information and a computing environment of each GPU in the nodes;
s103, screening the nodes in the node list according to the number of the residual GPUs and the computing environments in the task list corresponding to the GPUs in the residual GPUs, and generating a screening list;
s105, selecting two nodes from the screening list, deleting the creating information corresponding to the residual GPU of one node from the task list of the node, and generating the creating information of the residual GPU in the task list of the other node;
s107, deleting the two nodes from the screening list, and returning to the step of selecting two nodes from the screening list until no node or only one node remains in the screening list.
Taking a GPU server cluster for AI calculation as an example, each node in the server cluster is 1 GPU server, and each GPU server may include 2, 4, 8, 16, 32, or even more GPUs. Each GPU receives a schedule of a cluster of servers to perform a task to be processed.
Specifically, a resource management interface is called to obtain a node list of a server cluster and a task list corresponding to each node in the node list, wherein the node is a GPU server and comprises GPUs, the task list comprises creation information and computing environments of each GPU in each node, the creation information comprises docker images, the number of CPUs (central processing units), the number of GPUs, directory mounting and the like, and the computing environments refer to the fact that several GPUs are needed to execute computing tasks contained by the current GPU;
assuming that 4 nodes a, b, c, d, a currently contain 8 GPUs and execute a computation task requiring 6 GPUs, b contains 8 GPUs and execute a computation task requiring 8 GPUs, c contains 8 GPUs and execute a computation task requiring 7 GPUs, and d contains 8 GPUs and execute a computation task requiring 16 GPUs, screening out the remaining nodes of the GPUs, namely screening out GPU fragments in the current node, namely a and c, then screening out eligible nodes from a and c after checking the computation tasks contained in the GPU fragments in a and c, and forming a screening list. In an actual application scenario, there are many servers connected in the server cluster, and the number of the servers is not limited to the above 4. Suppose that 11 eligible nodes are screened out from the current screening list, two nodes 1 and 2 are selected from the screening list, the creation information corresponding to the GPU fragmentation in the node 1 is deleted from the task list of the node 1, the creation information of the GPU fragmentation of the node 1 is newly created in the task list of the node 2, then the nodes 1 and 2 are deleted from the screening list, and two nodes are continuously selected from the screening list for sorting until one node remains in the screening list.
According to the embodiment, the GPU fragmented resources in the nodes are eliminated by sorting the GPU fragmented resources, the nodes in the server cluster are guaranteed to have sufficient GPU idle resources, continuous application of subsequent tasks is effectively supported, the implementation is simple, complex operation is not needed, and original codes of the server cluster cannot be damaged.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
screening out nodes with idle preset number of GPUs from the node list;
and screening out the GPU containing the preset card number computing environment from the node list based on the computing environment in the task list corresponding to the nodes with the idle preset number of GPUs.
In some embodiments, screening the node list for a preset number of free GPUs includes:
and screening out nodes with 1 GPU in idle from the node list.
In some embodiments, screening out GPUs including a preset card number computing environment from the node list based on the computing environments in the task list corresponding to the nodes with the free preset number of GPUs includes:
and screening out the GPU containing 1-card computing environment from the node list based on the computing environment in the task list corresponding to the nodes of the idle 1 GPU.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 1 GPU and the GPU containing 1 card computing environment, screening out nodes with 2 free GPUs from the node list;
and screening the GPU containing 2-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the idle 2 GPUs, and generating a screening list of the idle 2 GPUs.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 2 GPUs which comprise 2-card computing environments, screening out nodes with free 4 GPUs from the node list;
and screening the GPU containing 4-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the free 4 GPUs, and generating a screening list of the free 4 GPUs.
In some implementations, the 2-card computing environment includes a 1-2 or 2-1-card computing environment, and the 4-card computing environment includes a 4-1-card, or 2-2, or 2-1-and 1-2, or 1-and 1-3, or 1-4-card computing environment.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 2, an embodiment of the present invention further provides a resource consolidation system for a server cluster, including:
an obtaining module 110, configured to obtain a node list of a server cluster and a task list corresponding to each node in the node list, where the node includes a GPU, and the task list includes creation information and a computing environment of each GPU in the node;
a screening module 120 configured to screen the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the GPUs in the remaining GPUs, and generate a screening list;
a sorting module 130 configured to select two nodes from the filter list, delete the creation information corresponding to the remaining GPUs of one of the nodes from the task list of the node, and generate the creation information of the remaining GPUs in the task list of the other node;
a return sorting module 140 configured to delete the two nodes from the filter list and return the step of selecting two nodes from the filter list until there is no node or only one node remaining in the filter list.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
screening out nodes with idle preset number of GPUs from the node list;
and screening out the GPU containing the preset card number computing environment from the node list based on the computing environment in the task list corresponding to the nodes with the idle preset number of GPUs.
In some embodiments, screening the node list for a preset number of free GPUs includes:
and screening out nodes with 1 GPU in idle from the node list.
In some embodiments, screening out GPUs including a preset card number computing environment from the node list based on the computing environments in the task list corresponding to the nodes with the free preset number of GPUs includes:
and screening out the GPU containing 1-card computing environment from the node list based on the computing environment in the task list corresponding to the nodes of the idle 1 GPU.
In some embodiments, the system further comprises a sorting submodule configured to:
in response to finishing the arrangement of the nodes of the remaining 1 GPU and the GPU containing 1 card computing environment, screening out nodes with 2 free GPUs from the node list;
and screening the GPU containing 2-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the idle 2 GPUs, and generating a screening list of the idle 2 GPUs.
In some embodiments, the collation sub-module is further configured to:
in response to finishing the arrangement of the nodes of the remaining 2 GPUs which comprise 2-card computing environments, screening out nodes with free 4 GPUs from the node list;
and screening the GPU containing 4-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the free 4 GPUs, and generating a screening list of the free 4 GPUs.
In some implementations, the 2-card computing environment includes a 1-2 or 2-1-card computing environment, and the 4-card computing environment includes a 4-1-card, or 2-2, or 2-1-and 1-2, or 1-and 1-3, or 1-4-card computing environment.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, an embodiment of the present invention further provides a computer device 20, in which the computer device 20 includes a processor 210 and a memory 220, the memory 220 stores a computer program 221 executable on the processor, and the processor 210 executes the program to perform the following method steps:
acquiring a node list of a server cluster and a task list corresponding to each node in the node list, wherein the node comprises a GPU, and the task list comprises creation information and a computing environment of each GPU in the node;
screening the nodes in the node list according to the number of the residual GPUs and the computing environments in the task list corresponding to the GPUs in the residual GPUs, and generating a screening list;
selecting two nodes from the screening list, deleting the creating information corresponding to the residual GPU of one node from the task list of the node, and generating the creating information of the residual GPU in the task list of the other node;
and deleting the two nodes from the screening list, and returning to the step of selecting two nodes from the screening list until no node or only one node remains in the screening list.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
screening out nodes with idle preset number of GPUs from the node list;
and screening out the GPU containing the preset card number computing environment from the node list based on the computing environment in the task list corresponding to the nodes with the idle preset number of GPUs.
In some embodiments, screening the node list for a preset number of free GPUs includes:
and screening out nodes with 1 GPU in idle from the node list.
In some embodiments, screening out GPUs including a preset card number computing environment from the node list based on the computing environments in the task list corresponding to the nodes with the free preset number of GPUs includes:
and screening out the GPU containing 1-card computing environment from the node list based on the computing environment in the task list corresponding to the nodes of the idle 1 GPU.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 1 GPU and the GPU containing 1 card computing environment, screening out nodes with 2 free GPUs from the node list;
and screening the GPU containing 2-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the idle 2 GPUs, and generating a screening list of the idle 2 GPUs.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 2 GPUs which comprise 2-card computing environments, screening out nodes with free 4 GPUs from the node list;
and screening the GPU containing 4-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the free 4 GPUs, and generating a screening list of the free 4 GPUs.
In some implementations, the 2-card computing environment includes a 1-2 or 2-1-card computing environment, and the 4-card computing environment includes a 4-1-card, or 2-2, or 2-1-and 1-2, or 1-and 1-3, or 1-4-card computing environment.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, an embodiment of the present invention further provides a computer-readable storage medium 30, the computer-readable storage medium 30 storing a computer program 310 which, when executed by a processor, performs the following method:
acquiring a node list of a server cluster and a task list corresponding to each node in the node list, wherein the node comprises a GPU, and the task list comprises creation information and a computing environment of each GPU in the node;
screening the nodes in the node list according to the number of the residual GPUs and the computing environments in the task list corresponding to the GPUs in the residual GPUs, and generating a screening list;
selecting two nodes from the screening list, deleting the creating information corresponding to the residual GPU of one node from the task list of the node, and generating the creating information of the residual GPU in the task list of the other node;
and deleting the two nodes from the screening list, and returning to the step of selecting two nodes from the screening list until no node or only one node remains in the screening list.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
screening out nodes with idle preset number of GPUs from the node list;
and screening out the GPU containing the preset card number computing environment from the node list based on the computing environment in the task list corresponding to the nodes with the idle preset number of GPUs.
In some embodiments, screening the node list for a preset number of free GPUs includes:
and screening out nodes with 1 GPU in idle from the node list.
In some embodiments, screening out GPUs including a preset card number computing environment from the node list based on the computing environments in the task list corresponding to the nodes with the free preset number of GPUs includes:
and screening out the GPU containing 1-card computing environment from the node list based on the computing environment in the task list corresponding to the nodes of the idle 1 GPU.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 1 GPU and the GPU containing 1 card computing environment, screening out nodes with 2 free GPUs from the node list;
and screening the GPU containing 2-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the idle 2 GPUs, and generating a screening list of the idle 2 GPUs.
In some embodiments, the screening the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs includes:
in response to finishing the arrangement of the nodes of the remaining 2 GPUs which comprise 2-card computing environments, screening out nodes with free 4 GPUs from the node list;
and screening the GPU containing 4-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the free 4 GPUs, and generating a screening list of the free 4 GPUs.
In some implementations, the 2-card computing environment includes a 1-2 or 2-1-card computing environment, and the 4-card computing environment includes a 4-1-card, or 2-2, or 2-1-and 1-2, or 1-and 1-3, or 1-4-card computing environment.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A resource arrangement method for a server cluster is characterized by comprising the following steps:
acquiring a node list of a server cluster and a task list corresponding to each node in the node list, wherein the node comprises a GPU, and the task list comprises creation information and a computing environment of each GPU in the node;
screening the nodes in the node list according to the number of the residual GPUs and the computing environments in the task list corresponding to the residual GPUs, and generating a screening list;
selecting two nodes from the screening list, deleting the creation information corresponding to the residual GPU of one node from the task list of the node, and generating the creation information of the residual GPU in the task list of the other node;
and deleting the two nodes from the screening list, and returning to the step of selecting two nodes from the screening list until no node or only one node remains in the screening list.
2. The method of claim 1, wherein the step of filtering the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs comprises:
screening out nodes with idle preset number of GPUs from the node list;
and screening out the GPU containing the preset card number computing environment from the node list based on the computing environment in the task list corresponding to the nodes with the idle preset number of GPUs.
3. The method of claim 2, wherein screening the node list for a preset number of free GPUs comprises:
and screening out nodes with 1 GPU in idle from the node list.
4. The method of claim 3, wherein screening out the GPU containing a preset number of GPU computing environments from the node list based on the computing environments in the task list corresponding to the nodes with the preset number of free GPUs comprises:
and screening out the GPU containing 1-card computing environment from the node list based on the computing environment in the task list corresponding to the nodes of the idle 1 GPU.
5. The method of claim 4, wherein the step of filtering the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs comprises:
in response to finishing the arrangement of the nodes of the remaining 1 GPU and the GPU containing 1 card computing environment, screening out nodes with 2 free GPUs from the node list;
and screening the GPU containing 2-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the idle 2 GPUs, and generating a screening list of the idle 2 GPUs.
6. The method of claim 5, wherein the step of filtering the nodes in the node list according to the number of remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs comprises:
in response to finishing the arrangement of the nodes of the remaining 2 GPUs which comprise 2-card computing environments, screening out nodes with free 4 GPUs from the node list;
and screening the GPU containing 4-card computing environments from the node list based on the computing environments in the task list corresponding to the nodes of the free 4 GPUs, and generating a screening list of the free 4 GPUs.
7. The method of claim 6, wherein the 2-card computing environment comprises a 1-2-card or 2-1-card computing environment, and wherein the 4-card computing environment comprises a 4-1-card, or 2-card, or 2-1-card and 1-2-card, or 1-card and 1-3-card, or 1-4-card computing environment.
8. A resource collating system for a server cluster, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is configured to acquire a node list of a server cluster and a task list corresponding to each node in the node list, the node comprises a GPU, and the task list comprises creation information and a computing environment of each GPU in the node;
the screening module is configured to screen the nodes in the node list according to the number of the remaining GPUs and the computing environments in the task list corresponding to the remaining GPUs, and generate a screening list;
a sorting module configured to select two nodes from the filter list, delete the creation information corresponding to the remaining GPUs of one of the nodes from the task list of the node, and generate the creation information of the remaining GPUs in the task list of the other node;
a return sorting module configured to delete the two nodes from the filter list and return to the step of selecting two nodes from the filter list until there is no node or only one node remains in the filter list.
9. A computer device, comprising:
at least one processor; and
memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of the method according to any of claims 1-7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 7.
CN202110904366.1A 2021-08-06 2021-08-06 Resource arrangement method, system, equipment and medium of server cluster Active CN113742064B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110904366.1A CN113742064B (en) 2021-08-06 2021-08-06 Resource arrangement method, system, equipment and medium of server cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110904366.1A CN113742064B (en) 2021-08-06 2021-08-06 Resource arrangement method, system, equipment and medium of server cluster

Publications (2)

Publication Number Publication Date
CN113742064A true CN113742064A (en) 2021-12-03
CN113742064B CN113742064B (en) 2023-08-04

Family

ID=78730587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110904366.1A Active CN113742064B (en) 2021-08-06 2021-08-06 Resource arrangement method, system, equipment and medium of server cluster

Country Status (1)

Country Link
CN (1) CN113742064B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160314008A1 (en) * 2013-12-31 2016-10-27 Huawei Technologies Co., Ltd. Method for implementing gpu virtualization and related apparatus, and system
CN107577534A (en) * 2017-08-31 2018-01-12 郑州云海信息技术有限公司 A kind of resource regulating method and device
CN107590002A (en) * 2017-09-15 2018-01-16 东软集团股份有限公司 Method for allocating tasks, device, storage medium, equipment and distributed task scheduling system
CN108363623A (en) * 2018-02-27 2018-08-03 郑州云海信息技术有限公司 GPU resource dispatching method, device, equipment and computer readable storage medium
CN109144710A (en) * 2017-06-16 2019-01-04 中国移动通信有限公司研究院 Resource regulating method, device and computer readable storage medium
CN109726008A (en) * 2017-10-31 2019-05-07 阿里巴巴集团控股有限公司 Resource allocation methods and equipment
CN110413412A (en) * 2019-07-19 2019-11-05 苏州浪潮智能科技有限公司 A kind of method and apparatus based on GPU cluster resource allocation
CN110688218A (en) * 2019-09-05 2020-01-14 广东浪潮大数据研究有限公司 Resource scheduling method and device
CN111324457A (en) * 2020-02-15 2020-06-23 苏州浪潮智能科技有限公司 Method, device, equipment and medium for issuing inference service in GPU cluster
CN112272203A (en) * 2020-09-18 2021-01-26 苏州浪潮智能科技有限公司 Cluster service node selection method, system, terminal and storage medium
CN112463349A (en) * 2021-01-28 2021-03-09 北京睿企信息科技有限公司 Load balancing method and system for efficiently scheduling GPU (graphics processing Unit) capability
CN112486689A (en) * 2020-12-10 2021-03-12 苏州浪潮智能科技有限公司 Resource management platform resource recovery method, device, equipment and readable medium
CN112860396A (en) * 2021-01-28 2021-05-28 福建紫辰信息科技有限公司 GPU (graphics processing Unit) scheduling method and system based on distributed deep learning
CN113204428A (en) * 2021-05-28 2021-08-03 北京市商汤科技开发有限公司 Resource scheduling method, device, electronic equipment and computer readable storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160314008A1 (en) * 2013-12-31 2016-10-27 Huawei Technologies Co., Ltd. Method for implementing gpu virtualization and related apparatus, and system
CN109144710A (en) * 2017-06-16 2019-01-04 中国移动通信有限公司研究院 Resource regulating method, device and computer readable storage medium
CN107577534A (en) * 2017-08-31 2018-01-12 郑州云海信息技术有限公司 A kind of resource regulating method and device
CN107590002A (en) * 2017-09-15 2018-01-16 东软集团股份有限公司 Method for allocating tasks, device, storage medium, equipment and distributed task scheduling system
CN109726008A (en) * 2017-10-31 2019-05-07 阿里巴巴集团控股有限公司 Resource allocation methods and equipment
CN108363623A (en) * 2018-02-27 2018-08-03 郑州云海信息技术有限公司 GPU resource dispatching method, device, equipment and computer readable storage medium
CN110413412A (en) * 2019-07-19 2019-11-05 苏州浪潮智能科技有限公司 A kind of method and apparatus based on GPU cluster resource allocation
CN110688218A (en) * 2019-09-05 2020-01-14 广东浪潮大数据研究有限公司 Resource scheduling method and device
CN111324457A (en) * 2020-02-15 2020-06-23 苏州浪潮智能科技有限公司 Method, device, equipment and medium for issuing inference service in GPU cluster
CN112272203A (en) * 2020-09-18 2021-01-26 苏州浪潮智能科技有限公司 Cluster service node selection method, system, terminal and storage medium
CN112486689A (en) * 2020-12-10 2021-03-12 苏州浪潮智能科技有限公司 Resource management platform resource recovery method, device, equipment and readable medium
CN112463349A (en) * 2021-01-28 2021-03-09 北京睿企信息科技有限公司 Load balancing method and system for efficiently scheduling GPU (graphics processing Unit) capability
CN112860396A (en) * 2021-01-28 2021-05-28 福建紫辰信息科技有限公司 GPU (graphics processing Unit) scheduling method and system based on distributed deep learning
CN113204428A (en) * 2021-05-28 2021-08-03 北京市商汤科技开发有限公司 Resource scheduling method, device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN113742064B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
CN110162413B (en) Event-driven method and device
US20220066813A1 (en) Dynamically generating an optimized processing pipeline for tasks
US20200364258A1 (en) Container Image Size Reduction Via Runtime Analysis
CN104699363B (en) A kind of window interface shows method and system
CN106557307B (en) Service data processing method and system
CN110633135A (en) Asynchronous task allocation method and device, computer equipment and storage medium
CN111400005A (en) Data processing method and device and electronic equipment
CN111158800B (en) Method and device for constructing task DAG based on mapping relation
CN109977168A (en) The method for synchronizing data of database and equipment preloaded based on data page
CN112650449B (en) Method and system for releasing cache space, electronic device and storage medium
CN112068812B (en) Micro-service generation method and device, computer equipment and storage medium
CN113742064A (en) Resource arrangement method, system, equipment and medium for server cluster
CN114721801A (en) Dynamic scheduling method and device for batch task execution time
CN114978686A (en) Digital asset chaining method and device
CN112395081B (en) Online automatic resource recycling method, system, server and storage medium
CN114237902A (en) Service deployment method and device, electronic equipment and computer readable medium
CN114595047A (en) Batch task processing method and device
CN112905223A (en) Method, device and equipment for generating upgrade package
CN111782363A (en) Method and flow system for supporting multi-service scene calling
CN113742052B (en) Batch task processing method and device
CN117539451B (en) Flow execution method, device, electronic equipment and storage medium
CN111414162B (en) Data processing method, device and equipment thereof
CN114553700A (en) Equipment grouping method and device, computer equipment and storage medium
US20210158644A1 (en) Peer partitioning to reduce strategy-driven bias in automated peer-selection systems
CN116501445A (en) Virtual machine creation scheduling method, system, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant