CN117472570A - Method, apparatus, electronic device and medium for scheduling accelerator resources - Google Patents

Method, apparatus, electronic device and medium for scheduling accelerator resources Download PDF

Info

Publication number
CN117472570A
CN117472570A CN202311413945.1A CN202311413945A CN117472570A CN 117472570 A CN117472570 A CN 117472570A CN 202311413945 A CN202311413945 A CN 202311413945A CN 117472570 A CN117472570 A CN 117472570A
Authority
CN
China
Prior art keywords
accelerator
container group
rescheduled
node
resources
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311413945.1A
Other languages
Chinese (zh)
Inventor
杨子夜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Volcano Engine Technology Co Ltd
Original Assignee
Beijing Volcano Engine Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Volcano Engine Technology Co Ltd filed Critical Beijing Volcano Engine Technology Co Ltd
Priority to CN202311413945.1A priority Critical patent/CN117472570A/en
Publication of CN117472570A publication Critical patent/CN117472570A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present disclosure relate to methods, apparatuses, electronic devices, and media for scheduling accelerator resources. The method includes obtaining an accelerator resource allocation request related to a machine learning model, wherein the accelerator resource allocation request indicates an amount of accelerator resources required to run a container group of the machine learning model. The method further includes triggering a rescheduling condition in response to the accelerator resource allocation request, determining a set of containers to reschedule according to a rescheduling policy. The method further includes allocating accelerator resources for the accelerator resource allocation request at the first node by migrating the container group to be rescheduled from the first node to the second node. According to the embodiment of the disclosure, when the accelerator resource allocation request triggers a rescheduling condition, determining the container group to be rescheduled is omitted according to the rescheduling policy, and then the container group to be rescheduled is migrated to another node to meet the resource allocation request, so that the utilization efficiency of the accelerator resource can be improved.

Description

Method, apparatus, electronic device and medium for scheduling accelerator resources
Technical Field
The present disclosure relates generally to the field of computers, and more particularly, to methods, apparatus, electronic devices, and media for scheduling accelerator resources.
Background
A machine learning model is a model that predicts new data by learning a large amount of data. It can automatically extract useful features from the data and make predictions based on those features. The learning process of machine learning models typically includes two stages, training and reasoning. During the training phase, the model learns and adjusts parameters based on known data. In the inference phase, the model can use data reasoning with known results to evaluate the accuracy and generalization ability of the model. The range of applications for machine learning models is also very broad, including but not limited to image recognition, speech recognition, natural language processing, recommendation systems, and the like. For example, in image recognition, a machine learning model may automatically extract features in an image by learning a large amount of image data and predict the class of a new image. In natural language processing, machine learning models can automatically extract language features in text by learning a large amount of text data and predict the subject of new text.
The machine learning model brings wider application scenes and higher expressive power, and has longer model training and reasoning time, larger model scale and higher storage cost. In addition, machine learning models require higher computational power, require the use of stronger computers and computing resources, and incur more costs.
Disclosure of Invention
Embodiments of the present disclosure provide a method, apparatus, electronic device, and medium for scheduling accelerator resources.
According to a first aspect of the disclosure, a method for scheduling accelerator resources is provided. The method includes obtaining an accelerator resource allocation request related to a machine learning model, the accelerator resource allocation request indicating an amount of accelerator resources required to run a container group of the machine learning model. The method also includes triggering a rescheduling condition in response to the accelerator resource allocation request, determining a set of containers to reschedule according to a rescheduling policy. The method further includes allocating accelerator resources for the accelerator resource allocation request at the first node by migrating the container group to be rescheduled from the first node to the second node.
In a second aspect of the disclosure, an apparatus for scheduling accelerator resources is provided. The apparatus includes a request acquisition module configured to acquire an accelerator resource allocation request related to a machine learning model, the accelerator resource allocation request indicating an amount of accelerator resources required to run a container group of the machine learning model. The apparatus also includes a container group determination module configured to trigger a rescheduling condition in response to the accelerator resource allocation request, and determine a container group to reschedule according to a rescheduling policy. The apparatus further includes a container group migration module configured to allocate accelerator resources for the accelerator resource allocation request at the first node by migrating the container group to be rescheduled from the first node to the second node.
In a third aspect of the present disclosure, an electronic device is provided. The electronic device comprises a processor and a memory coupled to the processor, the memory having instructions stored therein, which when executed by the processor, cause the electronic device to perform the method according to the first aspect.
In a fourth aspect of the present disclosure, a computer-readable storage medium is provided. The computer-readable storage medium has stored thereon computer-executable instructions, wherein the computer-executable instructions are executed by a processor to implement the method according to the first aspect.
The summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Drawings
The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals denote like or similar elements, in which:
FIG. 1 illustrates a schematic diagram of an example environment in which some embodiments of the present disclosure may be implemented;
FIG. 2 illustrates a flow chart of a method for scheduling accelerator resources in accordance with some embodiments of the present disclosure;
FIG. 3 illustrates a schematic diagram of allocating accelerator resources in accordance with some embodiments of the present disclosure;
FIG. 4 illustrates a schematic diagram of a method workflow of scheduling accelerator resources in accordance with some embodiments of the present disclosure;
FIG. 5 illustrates an architectural diagram of an example migration container group of some embodiments of the present disclosure;
FIG. 6 illustrates a schematic diagram of a scheduler interacting with daemons within a group of containers according to some embodiments of the present disclosure;
FIG. 7 illustrates a block diagram of an apparatus for scheduling accelerator resources in accordance with some embodiments of the present disclosure; and
fig. 8 illustrates a block diagram of an example electronic device of some embodiments of the present disclosure.
The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements.
Detailed Description
It will be appreciated that the data (including but not limited to the data itself, the acquisition or use of the data) involved in the present technical solution should comply with the corresponding legal regulations and the requirements of the relevant regulations.
It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.
For example, upon receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require the acquisition and use of personal information to the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.
As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.
It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
In describing embodiments of the present disclosure, the term "comprising" and its like should be understood to be open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like, may refer to different or the same object unless explicitly stated otherwise. Other explicit and implicit definitions are also possible below.
In order for more accelerator resources to participate in the development of machine learning models, a variety of service modes have evolved. Infrastructure as a service (IaaS), which is a service that provides an information technology infrastructure as a service to the outside through a network, a user can deploy and run arbitrary software including an operating system and application programs in a payment mode. Platform as a service (PaaS), which is a mode of providing a server platform as a service, in PaaS, a user does not need to manage or control the underlying infrastructure, including networks, servers, operating systems, storage, etc., but the user can control deployed applications and possibly also the managed environment configuration running the applications. In the service framework of PaaS or similar service modes, finer granularity of resource management means such as container group or container level fine granularity management may be employed. In the container granularity resource management mode, the user only needs to pay attention to the container group or the service in the container, namely, the user only needs to pay attention to related indexes such as own data or model, training quality, reasoning quality and the like. In IaaS, the user is not using the infrastructure, if he has paid for. I.e. if the infrastructure is in an idle or loaded state, the cloud service provider does not interfere with the user's use of the resources. In the service framework of the model-as-a-service (MaaS), the cloud service provider can only benefit if the user uses the relevant resources. If there are a large number of free or fragmented computing resources in the platform, the resource utilization of the cloud service provider will decrease.
Various algorithms in the traditional cloud service model do not take into account the full utilization of the accelerator resources, resulting in the fragmenting phenomena of the accelerator resources. This phenomenon not only increases the time delay of model training or reasoning, but also wastes computational resources, exacerbating the running cost. An improvement, though, can guarantee that accelerators in a cluster continue to meet computational power demands by means of unlimited configuration of accelerator resources, in this case the problem of accelerator resource fragmentation is not solved and accelerator resources are expensive, adding additional cost. Another improvement may solve the problem of low utilization of accelerator resources by means of accelerator resource virtualization, but accelerator resource virtualization does not dynamically balance the problem of using accelerator resources in the entire cluster. Yet another improvement is the selection of accelerator resources by manual forced intervention, but with added overhead. Therefore, in order to avoid the problem that a large amount of spare accelerator resources in the cluster can meet the calculation power demand but cannot be fully utilized, a method capable of automatically solving the problem of accelerator resource fragmentation in the cluster is needed.
To this end, embodiments of the present disclosure obtain, in scheduling accelerator resources, an accelerator resource allocation request related to a machine learning model, the accelerator resource allocation request indicating a number of accelerator resources required to run a container group of the machine learning model. If the accelerator allocation request is capable of triggering a rescheduling condition, determining the set of containers to reschedule will be omitted according to the rescheduling policy. The set of containers to be rescheduled is then migrated from the original node to another node, whereby accelerator resources are allocated at the original node for the set of containers running the machine learning model. Therefore, according to the embodiment of the disclosure, the idle fragment accelerator resources can be automatically and fully utilized, the running of machine learning model training or reasoning tasks is effectively accelerated, and the utilization efficiency of the accelerator resources is improved.
FIG. 1 illustrates a schematic diagram of an example environment 100 in which some embodiments of the present disclosure may be implemented. In an example environment of the present disclosure, at least a plurality of accelerator clusters and a scheduler are included, wherein the accelerator clusters are configured with a number of accelerators. In some embodiments of the present disclosure, a Graphics Processing Unit (GPU) is taken as an example of an accelerator resource, however, embodiments of the present disclosure may also be used in conjunction with other accelerator resources. Referring to fig. 1, an example environment 100 includes at least a GPU cluster 110, a GPU cluster 120, a GPU cluster 130, and a scheduler 140.
As shown in FIG. 1, GPU cluster 110 includes at least GPU 110-1, GPU 110-2, GPU 110-3, GPU 110-4, GPU 110-5, GPU 110-6, GPU 110-7, and GPU 110-8. It should be appreciated that more or fewer GPUs may be included in a GPU cluster. GPU cluster 120 includes at least GPU 120-1, GPU 120-2, GPU 120-3, GPU 120-4, GPU 120-5, GPU 120-6, GPU 120-7, and GPU 120-8.GPU cluster 130 includes at least GPU 130-1, GPU 130-2, GPU 130-3, GPU 130-4, GPU 130-5, GPU 130-6, GPU 130-7, and GPU 130-8. In some embodiments, GPUs in GPU cluster 110 or GPU cluster 120 or GPU cluster 130 are homogenous, i.e., the same type of GPU. Alternatively, the GPUs in GPU cluster 110 or GPU cluster 120 or GPU cluster 130 are heterogeneous, i.e., multiple GPUs in a GPU cluster that include different types or vendors.
In some embodiments, GPU clusters 110, 120, and 130 are configured at different nodes, where a single container or group of containers may use one or more GPUs within the same node. In a GPU cluster, a container group is the smallest deployment and management base unit in the cluster, with one or more containers packaged in one container group. In general, the container group is normally not rescheduled after binding to the node. The machine learning model file is loaded into the GPU cluster for operation, e.g., a container may occupy 5 GPUs in GPU cluster 110, which 5 GPUs may be GPU 110-1, GPU 110-2, GPU 110-3, GPU 110-4, and GPU 110-5, and may also be GPU 110-5, GPU 110-6, GPU 110-7, and GPU 110-8.
In some embodiments, when allocating GPU resources, the scheduler may prioritize the allocation of GPU resources at the complete machine without restarting the accelerator resources of the complete machine. For example, GPUs in GPU clusters 110, 120, and 130 are all in an idle state, when a container group requests allocation of 3 GPU resources, 3 idle GPUs of GPU cluster 110 may be allocated to the container group. In addition, upon receiving a request from a container group to allocate 1 GPU resource, the scheduler allocates one of the 5 free GPUs of GPU cluster 110 to the container group instead of allocating GPU resources of GPU cluster 120 and GPU cluster 130.
In some embodiments, the entire machine learning task is configured at the same node. For example, accelerator cluster 110 has 3 idle GPUs, accelerator cluster 120 has 2 idle GPUs, accelerator cluster 130 has 1 idle GPU, 4 GPU resources need to be allocated upon receiving a machine learning task request, the task of a container group of 3 GPU resources occupied in GPU cluster 130 is migrated to 3 idle GPUs at GPU cluster 110, and the machine learning task container group is configured in GPU cluster 130. Therefore, time delay generated by cross-node communication between the GPUs can be avoided, and fragmented GPUs in the GPU cluster are fully utilized. In some embodiments, model files for reasoning or training are scheduled by a scheduler into the GPU cluster for operation or to perform some other task.
It should be understood that the architecture and functionality in the example environment 100 are described for illustrative purposes only and are not meant to suggest any limitation as to the scope of the disclosure. Embodiments of the present disclosure may also be applied to other environments having different structures and/or functions.
A process according to an embodiment of the present disclosure will be described in detail below in conjunction with fig. 2 to 8. For ease of understanding, specific data set forth in the following description are intended to be exemplary and are not intended to limit the scope of the disclosure. It will be appreciated that the embodiments described below may also include additional actions not shown and/or may omit shown actions, the scope of the present disclosure being not limited in this respect.
Fig. 2 illustrates a flow chart of a method 200 for scheduling accelerator resources, which may be performed by the scheduler 140 described in fig. 1, according to some embodiments of the present disclosure. At block 202, an accelerator resource allocation request is obtained in connection with a machine learning model, wherein the accelerator resource allocation request indicates an amount of accelerator resources required to run a container group of the machine learning model. For example, referring to FIG. 3, where a container group running a machine learning model requires 16 GPU resources, request 351 requests that the accelerator resource set allocate 16 GPU resources. In some embodiments, an accelerator resource allocation request related to a machine learning model may be obtained from a scheduler.
At block 204, a rescheduling condition is triggered in response to the accelerator resource allocation request, and a set of containers to reschedule is determined according to a rescheduling policy. In some embodiments, it may be determined whether a rescheduling condition is triggered based on the accelerator resource allocation request versus the number of accelerator resources in the accelerator set. For example, referring to FIG. 3, 8 accelerator resources are configured in accelerator resource set 310, accelerator resource set 320, accelerator resource set 330, and accelerator resource set 340, and there are currently 6 free accelerator resources in total. After receiving the request from the accelerator resource request 356 to allocate 4 GPU resources, it is determined whether the request 356 can trigger a rescheduling condition, for example, if the number of GPUs required by the request 356 is less than the number of accelerator resources 6 that are free in the accelerator set and the number of GPUs required by the request 356 is greater than the number of remaining resources configured by each accelerator resource set, then the request 356 is acknowledged as triggering a rescheduling condition.
In some embodiments, the set of containers to be rescheduled is determined from a relationship of the accelerator resources required by the accelerator resource allocation request to the number of accelerator resources in the accelerator set. With continued reference to fig. 3, when the number of free accelerator resources in the accelerator set 320 is 3 or greater and equal to the number of accelerator resources 3 in the accelerator set 340 determined to be occupied by the container group to be scheduled, and the number of free accelerator resources 6 in the accelerator set 340 after releasing the 3 accelerator resources is 4 or greater and equal to the number of accelerator resources required by the accelerator request 356, then it is determined that the container group occupying the 3 accelerator resources in the accelerator set 340 is the container group to be scheduled, and it is determined that the node where the accelerator set 320 is located is the target node of the container group to be rescheduled.
In some embodiments, if there are multiple groups of containers to be rescheduled, then one of the groups of containers that takes the least amount of time may be selected as the group of containers to be rescheduled. For example, referring to fig. 3, when the accelerator resource allocation request 356 triggers a rescheduling condition, a container group occupying 2 accelerator resources in the accelerator set 340 may be selected to be determined as a container group to be rescheduled according to a scheduling policy, or a container group occupying 3 accelerator resources in the accelerator set 340 may be selected to be determined as a container group to be rescheduled. If the time taken to reschedule the container group of 2 accelerator resources in the accelerator set 340 to the target node is shorter than the time taken to reschedule the container group of 3 accelerator resources in the accelerator set 340 to the target node, it is determined that the container group of 2 accelerator resources in the accelerator set 340 is the container group to be rescheduled.
In some embodiments, if there are multiple container groups to be rescheduled, a container group that does not affect the system task process may be selected as the container group to be rescheduled. For example, the set of containers performing the training task may be selected as the set of containers to be rescheduled, and the set of containers performing the backend task may also be selected as the set of containers to be rescheduled, so as not to affect the normal network service provision. In some embodiments, if the plurality of container groups to be rescheduled are container groups that perform inference tasks, a lower priority container group is selected as the container group to be rescheduled. In some embodiments, a container group for hosting a web service or a stateless service may be stopped directly, and then restarted quickly at another node.
At block 206, accelerator resources are allocated for accelerator resource allocation requests at the first node by migrating the container group to be rescheduled from the first node to the second node. Referring to fig. 3, accelerator resource request 356 requests allocation of 4 GPU resources, a container group in accelerator set 340 that occupies 3 GPU resources is scheduled from the node where accelerator set 340 is located to the node where accelerator set 320 is located, and GPU resources are allocated to accelerator resource request 356 at the node where accelerator set 340 is located. According to the scheme disclosed by the invention, the utilization rate of accelerator resources can be improved.
In some embodiments, it is also necessary to stop processes within the container group to be rescheduled and save the state of the relevant operational data before migrating the container group to be rescheduled from the original node to the target node. For example, referring to FIG. 5, when a container group 512 is migrated from host 570 to host 580, the MaaS 514 within the container group needs to be stopped and its operational data state, such as checkpoints (checkpoints), saved to shared file system 590.
In some embodiments, the group of containers to be rescheduled is stopped from a process involving the scheduler interacting with a daemon within the group of containers. For example, referring to FIG. 6, scheduler 620 sends a stop instruction to container group 616, daemon 617 in container group 616, upon receiving the instruction, saves the state file of the operational data in the container group to the file system and then informs the scheduler that the operation has been completed. In some embodiments, the state file of the running data refers to a checkpoint.
In some embodiments, after allocating resources for the container group to be rescheduled and allocating resources for the accelerator resource allocation request, the container group to be rescheduled and the container group running the machine learning model file may also be started simultaneously. In some embodiments, the container group to be rescheduled and the container group running the machine learning model file have equal resource request priorities, and the priorities are higher than the resource requests of any other container group. According to the scheme disclosed by the invention, the fragmented accelerator resources in the cluster can be effectively and fully utilized, and the waiting time is reduced, so that the speed of the whole task process is accelerated.
Fig. 3 illustrates a schematic diagram of allocating accelerator resources 300 in accordance with some embodiments of the present disclosure. As shown in fig. 3, 8 accelerators are each configured in accelerator resource set 310, accelerator resource set 320, accelerator resource set 330, and accelerator resource set 340.
Referring to fig. 3, an accelerator resource allocation request 351 requests allocation of 16 accelerator resources. At this time, all of the accelerators in the accelerator resource set 310, the accelerator resource set 320, the accelerator resource set 330, and the accelerator resource set 340 are in an idle state, and the idle accelerator resources in the accelerator resource set 310 and the accelerator resource set 320 may be allocated to the request 351. After allocation 361, accelerator resource set 310 has 0 free accelerators, accelerator resource set 320 has 0 free accelerators, accelerator resource set 330 has 8 free accelerators, and accelerator resource set 340 has 8 free accelerators.
Next, the next accelerator resource allocation request 352 requests allocation of 6 accelerator resources. At this point, accelerator resource set 310 has 0 free accelerators, accelerator resource set 320 has 0 free accelerators, accelerator resource set 330 has 8 free accelerators, accelerator resource set 340 has 8 free accelerators, and free accelerator resources in accelerator resource set 330 may be allocated to the request 352. After allocation 362, accelerator resource set 310 has 0 free accelerators, accelerator resource set 320 has 0 free accelerators, accelerator resource set 330 has 2 free accelerators, and accelerator resource set 340 has 8 free accelerators.
With continued reference to fig. 3, accelerator resource allocation request 353 requests allocation of 4 accelerator resources. At this time, accelerator resource set 310 has 0 free accelerators, accelerator resource set 320 has 0 free accelerators, accelerator resource set 330 has 2 free accelerators, accelerator resource set 340 has 8 free accelerators, and free accelerator resources in accelerator resource set 340 may be allocated to request 353. After allocation 363, accelerator resource set 310 has 0 free accelerators, accelerator resource set 320 has 0 free accelerators, accelerator resource set 330 has 2 free accelerators, and accelerator resource set 340 has 4 free accelerators.
Referring next to fig. 3, a new one of the accelerator resource allocation requests 354 requests allocation of 3 accelerator resources. At this point, accelerator resource set 310 has 0 free accelerators, accelerator resource set 320 has 0 free accelerators, accelerator resource set 330 has 2 free accelerators, accelerator resource set 340 has 4 free accelerators, and free accelerator resources in accelerator resource set 340 may be allocated to request 354. After allocation 364, accelerator resource set 310 has 0 free accelerators, accelerator resource set 320 has 0 free accelerators, accelerator resource set 330 has 2 free accelerators, and accelerator resource set 340 has 1 free accelerator.
At this time, in the example of fig. 3, the accelerator resource release request 355 requests the accelerator resource set 320 to release 3 accelerator resources. After release 365, at this point, accelerator resource set 310 has 0 free accelerators, accelerator resource set 320 has 3 free accelerators, accelerator resource set 330 has 2 free accelerators, and accelerator resource set 340 has 1 free accelerator.
Next, a new one accelerator resource allocation request 356 requests allocation of 4 accelerator resources. At this time, accelerator resource set 310 has 0 free accelerators, accelerator resource set 320 has 3 free accelerators, accelerator resource set 330 has 2 free accelerators, and accelerator resource set 340 has 1 free accelerator. Although there are 6 accelerator resources in the accelerator resource set, the free accelerator resources are scattered in 3 different accelerator sets, and if these scattered accelerator resources are to be used, it is necessary to establish remote communication between these accelerators, for example, through transmission control protocol/internet protocol (TCP/IP) or remote direct data access (RDMA) or directly using a communication channel between GPUs. This practice reduces the performance of the machine learning model because telecommunication increases communication latency.
According to embodiments of the present disclosure, processes within 3 accelerator resources previously allocated to accelerator resource set 340 may be scheduled to accelerator resource set 320, and accelerators in accelerator resource set 340 may be allocated to requests 356. After 366 rescheduling, accelerator resource set 310 has 0 free accelerators, accelerator resource set 320 has 0 free accelerators, accelerator resource set 330 has 2 free accelerators, and accelerator resource set 340 has 4 free accelerators. At this time, 4 free accelerators in the accelerator resource set 340 are allocated to the request 356, and after 367 allocation, the accelerator resource set 310 has 0 free accelerators, the accelerator resource set 320 has 0 free accelerators, the accelerator resource set 330 has 2 free accelerators, and the accelerator resource set 340 has 0 free accelerators. In this way, fragmented accelerator resources in the cluster may be fully utilized, thereby leveraging the utilization of the accelerator resources and also not degrading the user's experience of using the accelerator resources.
In some embodiments, before a container group is scheduled from accelerator resource set 320 to accelerator resource set 330, the data state of the container group's operation, such as checkpointing to a file system, needs to be saved and the container group's operation stopped. In some embodiments, the container group to be rescheduled has an equal priority of resource requests as the new container group, and the priority is higher than the priority of resource requests of other container groups than the container group to be rescheduled and the new container group.
Fig. 4 illustrates a schematic diagram of a workflow diagram of a method 400 of scheduling accelerator resources in accordance with some embodiments of the present disclosure. At block 402, a process for dynamic allocation of accelerator resources is initiated. At block 404, an allocation of accelerator resources is requested, and an accelerator resource allocation request associated with the machine learning model file is obtained, the request representing the number of accelerator resources required to run the machine learning model file. For example, referring to fig. 3, accelerator resource allocation request 351 requests allocation of 16 GPU resources.
At block 406, a determination is made as to whether there are complete machine accelerator resources that satisfy the acquired accelerator resource allocation request. For example, when 6 accelerator resources are requested to be allocated, 8 free accelerators in the accelerator set 330 can satisfy the request, and the determination result is yes. For another example, 4 accelerator resources are requested to be allocated, and when only 3 idle accelerators in the accelerator set can meet the request, the judgment result is no.
If it is determined at block 406 that there are complete machine accelerator resources that satisfy the acquired accelerator resource allocation request, at block 410 accelerator resources are directly allocated. For example, a request allocates 6 accelerator resources, where 8 free accelerators in accelerator set 330 may satisfy the request, then 6 free accelerators in accelerator resource set 330 are allocated to the request. At block 416, the request task is processed using the accelerator resource. For example, an accelerator resource allocation request is used to request an allocation of 4 accelerator resources, where there are 8 free accelerators in accelerator set 330 that may satisfy the request, then 4 of the free accelerator resources in accelerator set 330 are allocated to the request at block 410, and the allocated accelerator resources are used to process the task request at block 416. For another example, when a request to allocate 3 accelerator resources is made, there are 4 free accelerators in accelerator set 330 that may satisfy the request, then 4 of the free accelerator resources in accelerator set 330 are allocated to the request at block 410 and the allocated accelerator resources are used to process the task request at block 416.
If it is determined at block 406 that there are no complete machine accelerator resources that satisfy the acquired accelerator resource allocation request, at block 408, a determination is made as to whether there are shard accelerator resources that satisfy the request. If not, then at block 414, accelerator resources are increased. If so, at block 412, the group of containers to be rescheduled is found. In some embodiments, a request to allocate accelerator resources is obtained at block 404, e.g., a new container group requests allocation of 8 accelerator resources. Then, at block 406, it is determined whether there are complete accelerator resources in the accelerator set that satisfy the request, e.g., none of accelerator set 310, accelerator set 320, accelerator set 330, and accelerator set 340, can satisfy the request, then at block 408, it is determined whether there are fragmented accelerator resources that satisfy the request, e.g., a total of 6 accelerator resources in accelerator set 310, accelerator set 320, accelerator set 330, and accelerator set 340 are unable to satisfy the request, and after a certain number of times (e.g., 3 times) have been found, there are not enough fragmented accelerator resources released in the process to satisfy the request of the new container group, then at block 414, accelerator resources are added, e.g., one accelerator cluster configured with 8 accelerator resources is added. The accelerator resources are then allocated to the request at block 410, e.g., an increased 8 accelerator resources are allocated to the new container group request. By the accelerator resource allocation method, fragmented accelerator resources in the cluster can be fully utilized without waiting for the release of other accelerator resources, so that the utilization rate of the accelerator resources is exerted to the maximum.
In some embodiments, when the request 356 requests allocation of 4 accelerator resources, the accelerator set 310, the accelerator set 320, the accelerator set 330, and the remaining 6 accelerator resources in the accelerator set 340 may satisfy the request, then the container group to be rescheduled is found at block 412, e.g., the container group occupying 3 accelerator resources in the accelerator set 340 is found to be the container group to be rescheduled. Then at block 418 the operation of the group of containers to be rescheduled is stopped. For example, stopping the operation of a container group occupying 3 accelerator resources in the accelerator set 340, i.e., releasing the accelerator resources in the accelerator set 340 occupied by the container group, and saving the data state of the container group operation to the shared storage/file system. The container group is then dispatched to the target node, such as to accelerator set 320, at block 420. Resources are allocated again at block 410, such as allocating free resources in accelerator set 340 to a new container group and allocating 3 free accelerator resources in accelerator set 320 to a container group to be rescheduled that previously released 3 accelerator resources in accelerator set 340. Next, at block 416 the container group is restarted from the saved data run state or checkpoint and a new container group is simultaneously started. By the accelerator resource allocation method, fragmented accelerator resources in the cluster can be fully utilized without waiting for the release of other accelerator resources, so that the utilization rate of the accelerator resources is exerted to the maximum.
In some embodiments, a request is obtained at block 404 for a new container group request to allocate accelerator resources, e.g., a new container group request allocates 4 accelerator resources. Then, at block 406, it is determined whether there are complete accelerator resources in the accelerator set that satisfy the request, e.g., accelerator set 310, accelerator set 320, accelerator set 330, and accelerator set 340 for which no complete accelerator resources can satisfy the request, then, at block 408, it is determined whether there are fragmented accelerator resources that satisfy the request, e.g., accelerator set 310, accelerator set 320, accelerator set 330, and accelerator set 340 for which a total of 6 accelerator resources remain, are available, and then, at block 412, a container group to be scheduled is sought, e.g., a container group that occupies 1 accelerator resource in accelerator set 330 is sought as a container group to be scheduled. The container group to be rescheduled is then stopped at block 418, for example, stopping the container group occupying 1 accelerator resource in the accelerator set 320, i.e., releasing the accelerator resource in the accelerator set 330 occupied by the container group, and saving the data state of the container group operation to the shared storage/file system. The container group is then dispatched to the target node, such as to accelerator set 330, at block 420. Resources are then allocated at block 410, such as allocating free resources in accelerator set 320 to a new container group and allocating 1 free accelerator resource in accelerator set 330 to a container group that previously released 1 accelerator resource in accelerator set 320. Next, at block 416, the container group is restarted at the saved data run state and a new container group is simultaneously started to execute the task request.
In some embodiments, the set of containers to be scheduled is a set of containers that perform a background task or a training task. Background tasks refer to processes provided by a system that can run in the background, and do not affect the execution and provisioning of online services even if an application has been suspended or is no longer running. In some embodiments, the container group to be scheduled has an equal priority of resource requests as the new container group, and the priority is higher than the priority of resource requests of other container groups than the container group to be scheduled and the new container group.
In some embodiments, the nearby deployment node may be selected as the target node of the container to be scheduled, implementing the nearby routing capability, thereby reducing the scheduling capability of network loss. In some embodiments, when there are multiple container groups to be scheduled, the container group with the smallest scheduling delay is determined to be the container group to be scheduled. For example, if the time to schedule a container group to a node of accelerator set 320 and resume the container group operation is longer than the time to schedule a container group to a node of accelerator set 330 and resume the container group operation, then the node of accelerator set 330 is determined to be the target node of the container group to be scheduled. By the method, the time delay influence of the container service can be reduced, and the experience of the user service is improved.
Fig. 5 illustrates a schematic diagram of an architecture 500 of an example migration container group of the present disclosure. As shown in fig. 5, both host 570 and host 580 are configured with multiple GPUs at the hardware level and some remote direct access network interface controllers (RNICs) that can use Direct Memory Access (DMA) when two or more computers are in communication, allowing direct access to the memory of one host from the memory of another, and RNICs allowing data transfer between service and storage settings. For example, host 580 configured with RNIC 584-1 and RNIC 584-2 may implement memory direct access with host 570 also configured with RNIC 574-1 and RNIC 574-2, avoiding unnecessary latency. In some embodiments, the GPUs may be homogenous or heterogeneous accelerators.
Referring to FIG. 5, the host 570 and the host 580 have a host operating system kernel 550 and a host operating system kernel 560, respectively, at a software level, and a kernel-based virtual machine module (KVM) module 552 and a KVM module 562 are respectively configured in the kernels 550 and 560. Among them, KVM is a full virtualization solution using hardware-assisted virtualization technology based on an operating system kernel, in which a virtual machine is implemented as a regular operating system process, scheduled by a standard operating system scheduler, and each virtual CPU (central processing unit) of the virtual machine is implemented as a regular operating system process, which enables KMV to use the existing functions of the operating system kernel.
With continued reference to fig. 5, on each host is configured an elastic server (ECS), such as ECS 510, ECS 520, ECS 530, and ECS 540, with one or more container groups, such as container group 512, container group 522, and container group 532, within each ECS, and with MaaS, such as MaaS 514, maaS 524, and MaaS 534, within each container group. The ECS is a resource set composed of a CPU, a memory, a cloud disk and the like. MaaS is a model that integrates development, deployment, operation, and management of models into a unified platform, such as a container cluster platform, based on the cloud, and provides users with a convenient way to use and manage models without concern for implementation details and underlying technologies of the models.
In some embodiments, the ECS may be replaced with a Bare Metal Server (BMS), where the bare metal server would occupy the entire host (not shown in the figures). The BMS is hardware equipment with the characteristics of a traditional physical server, has a virtualized service function of a cloud computing technology, and is a product of combining the advantages of hardware and software.
A container cluster platform is an application for managing containerization on multiple hosts in a cloud platform. In a container cluster platform, a container group (POD) is the basic unit of minimal deployment and management in a cluster. A container group encapsulates one or more containers, storage resources, an independent network address, and policy options for managing and controlling the manner in which the containers operate, each container having its own file system. A container contains software packages of applications and their running dependent environments, by means of which programs can be run in relatively independent environments.
Using ECS as a base resource in the container service, a high availability container cluster can be built in cloud one-touch. In some embodiments, a node is an ECS, and a user can flexibly select a service deployment mode according to a service requirement. In some embodiments, the node types in the container service are not limited and may be nodes of 32-bit operating system (X86), heterogeneous, or bare metal types. For example, the node configured in host 570 is a bare metal type node, while the node configured in host 580 is a cloud server type node. In some embodiments, a single container may use one or more accelerator resources of the same node, e.g., a container of container group 512 may use GPU 572-1 on host 570. Containers such as container group 522 may use GPU 572-2 or GPU 572-3 on host 570. Containers such as container group 532 may use GPU 582-1 or GPU 582-2 or GPU 583-1 on host 580 or any combination thereof.
With continued reference to FIG. 5, container group 512 and Maas 514 within container group 512 migrate from ECS 510 to ECS 540, requiring that the state file of the data run within the container group be saved to shared storage/file system 590 before container group 512 is migrated. In some embodiments, the shared storage/file system 590 may be a distributed file system, including a local file system and a remote file system. In some embodiments, the set of containers to be scheduled is a set of containers that perform a background task or a training task. For example, container group 512 is a container group that performs background tasks or a container group that performs training tasks. In this way, the scheduling behavior can be prevented from affecting the execution of the foreground task, disturbing the normal program process. In some embodiments, if the container group 512 is a container group that performs network hosting services or no-service states, it may be turned off directly.
In some embodiments, a side car mode (sidecar) container is configured within the container group. The sidecar is a design model that strips application functions from the application itself as a separate process, allowing non-intrusive addition of functions to the application, avoiding the addition of additional code to meet third party needs. In the software architecture, a sidecar is attached to a host application, or parent application, to extend/enhance the functional features while the sidecar is loosely coupled to the host application. If the sidecar container provides an executable file waiting for the sidecar to be ready, the file may be invoked in the post-start hook of the container to prevent the start of the remaining containers in the container group. By the side car approach, the scheduler can interact with daemons within the container group.
In some embodiments, before scheduling the container group 512 from the ECS 510 to the ECS 540, the scheduler will send an instruction to the daemon within the container group 512 to stop the operation of the container group 512 and save the operating state of the related data to the shared storage/file system 590 after the daemon within the container group 512 receives the instruction. Through the interactive operation, on one hand, the problem that the user experience is reduced when the container group to be rescheduled starts to work from the head is effectively avoided, and on the other hand, fragmented accelerator resources in the cluster can be fully utilized.
FIG. 6 illustrates a schematic diagram of a scheduler interacting 600 with a daemon within a container group according to some embodiments of the present disclosure. As shown in fig. 6, a container group 612, a container group 614, and a container group 616 are configured in the cloud server 610. Wherein a daemon is configured in each container group, which is a process running in the background all the time and automatically exits at the end of the main thread. Daemons may interact with the scheduler to enable control of the container groups, e.g., scheduler 620 may interact with daemons 617 in container group 616, daemons 615 in container group 614, and daemons 613 in container group 612, respectively. The scheduler 620 sends a stop instruction to notify the container group to stop running, and after the daemon in the container group receives the instruction, the daemon instructs to save the data of the running state of the container group to the shared file system and stop running of the container group, and sends a notification message to notify the scheduler that the above operation is completed. The scheduler is the component responsible for application scheduling, which schedules containers to run on the working node by configuring node or container group affinities, etc.
In some embodiments, scheduler 620 sends a stop instruction to notify container group 616 to stop running, daemon 617 within container group 616, after receiving the instruction, saves data of the running state of container group 616 to the shared file system, and stops running of container group 616, and sends a notification message to notify scheduler 620 that the above has been completed. In some embodiments, scheduler 620 monitors information of the status of all accelerator resources in the cluster, i.e., whether the accelerator resources are in an idle state or an occupied state, and may update the status information of the accelerator resources after the accelerator resources are released or occupied.
Fig. 7 illustrates a block diagram of an apparatus 700 for scheduling accelerator resources in accordance with some embodiments of the present disclosure. As shown in fig. 7, the apparatus 700 includes a request acquisition module 702 configured to acquire an accelerator resource allocation request related to a machine learning model, the accelerator resource allocation request indicating an amount of accelerator resources required to run a container group of the machine learning model. The apparatus 700 further includes a container group determination module 704 configured to trigger a rescheduling condition in response to the accelerator resource allocation request, and to determine a container group to reschedule according to a rescheduling policy. The apparatus 700 also includes a container group migration module 706 that allocates accelerator resources for the accelerator resource allocation request on the first node by migrating the container group to be rescheduled from the first node to the second node.
Fig. 8 shows a block diagram of an electronic device 800 of some embodiments of the present disclosure, where the device 800 may be a device or apparatus described by embodiments of the present disclosure. As shown in fig. 8, device 800 includes a Central Processing Unit (CPU) and/or a Graphics Processing Unit (GPU) 801 that may perform various suitable actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM) 802 or loaded from a storage unit 816 into a Random Access Memory (RAM) 803. The CPU/GPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804. Although not shown in fig. 8, device 800 may also include a coprocessor.
Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The various methods or processes described above may be performed by the CPU/GPU 801. For example, in some embodiments, the method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When the computer program is loaded into RAM 803 and executed by CPU/GPU 801, one or more steps or actions in the above-described methods or processes may be performed.
In some embodiments, the methods and processes described above may be implemented as a computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
The computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object-oriented programming language and conventional procedural programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of devices, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two consecutive blocks may in fact be performed substantially in parallel, and they may sometimes be performed in the reverse order, depending on the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvement of the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
Some example implementations of the present disclosure are listed below.
Example 1. A method for scheduling accelerator resources, comprising:
obtaining an accelerator resource allocation request related to a machine learning model, the accelerator resource allocation request indicating an amount of accelerator resources required to run a container group of the machine learning model;
triggering a rescheduling condition in response to the accelerator resource allocation request, and determining a container group to be rescheduled according to a rescheduling strategy; and
the accelerator resource allocation request is allocated accelerator resources on a first node by migrating the container group to be rescheduled from the first node to a second node.
Example 2 the method of example 1, wherein the rescheduling condition comprises at least one of:
the number of accelerator resources required to run the container set of the machine learning model is greater than the remaining amount of accelerator resources on each node; and
the number of accelerator resources required to run the container group of the machine learning model is less than or equal to the total remaining amount of accelerator resources for the respective nodes.
Example 3. The method of any of examples 1-2, wherein determining the group of containers to be scheduled according to the rescheduling policy comprises:
And determining the container group to be rescheduled and the first node in response to the accelerator resource free amount of the one or more nodes being greater than or equal to the accelerator resource amount occupied by the container group to be rescheduled, and in response to the accelerator resource free amount of the one or more nodes being greater than or equal to the accelerator resource amount required to run the container group of the machine learning model after the accelerator resource amount occupied by the container group to be rescheduled is released.
Example 4. The method of any of examples 1-3, determining a candidate plurality of container groups, the plurality of container groups comprising a first container group and a second container group; and
in response to determining that the time period of restarting the second container group is greater than the time period of restarting the first container group, determining that the first container group is the container group to be rescheduled.
Example 5 the method of any of examples 1-4, wherein the group of containers to be rescheduled is not a group of containers performing a front-end task.
Example 6. The method of any one of examples 1-5, further comprising:
one or more new nodes are configured in response to the amount of accelerator resources required to run the container set of the machine learning model being greater than the remaining amount of accelerator resources on the one or more nodes.
Example 7 the method of any one of examples 1-6, wherein by migrating the group of containers to be rescheduled from a first node to a second node comprises:
acquiring a stop instruction for the container group to be rescheduled, wherein the stop instruction indicates the container group to be rescheduled to stop running;
storing the running state data of the container group to be rescheduled in response to the obtaining of the stopping instruction; and
and stopping the operation of the container group to be rescheduled in response to receiving a notification indicating that the operation state data of the container group to be rescheduled has been saved to a file system.
Example 8 the method of any one of examples 1-7, further comprising:
initiating, at the first node, a set of containers running the machine learning model; and
accelerator resources are allocated on the second node for the container group to be rescheduled.
Example 9 the method of any of examples 1-8, wherein allocating accelerator resources for the container group to be rescheduled on the second node is based on a priority of accelerator resource allocation requests of the container group to be rescheduled being higher than a priority of other container groups on the second node.
Example 10 the method of any one of examples 1-9, further comprising:
in response to an amount of accelerator resources required to run the container group of the machine learning model being less than or equal to a remaining amount of accelerator resources on one or more nodes, accelerator resources are allocated for the accelerator resource allocation request in a node of the one or more nodes.
Example 11 an apparatus for scheduling accelerator resources, comprising:
a request acquisition module configured to acquire an accelerator resource allocation request related to a machine learning model, the accelerator resource allocation request indicating an amount of accelerator resources required to run a container group of the machine learning model;
a container group determination module configured to trigger a rescheduling condition in response to the accelerator resource allocation request, determine a container group to be rescheduled according to a rescheduling policy; and
a container group migration module configured to allocate accelerator resources for the accelerator resource allocation request on a first node by migrating the container group to be rescheduled from the first node to a second node.
Example 12 the apparatus of example 11, wherein the rescheduling condition comprises at least one of:
The number of accelerator resources required to run the container set of the machine learning model is greater than the remaining amount of accelerator resources on each node; and
the number of accelerator resources required to run the container group of the machine learning model is less than or equal to the total remaining amount of accelerator resources for the respective nodes.
Example 13 the apparatus of any one of examples 11-12, wherein the container group determination module comprises:
a first node determination module configured to determine a container group to be rescheduled and the first node in response to an accelerator resource free amount of one or more nodes being greater than or equal to an accelerator resource amount occupied by the container group to be rescheduled, and in response to an accelerator resource free amount of the one or more nodes being greater than or equal to an accelerator resource amount required by a container group running the machine learning model after the accelerator resource amount occupied by the container group to be rescheduled is released. .
Example 14 the apparatus of any one of examples 11-13, further comprising:
a candidate container group determination module configured to determine a plurality of container groups of candidates, the plurality of container groups including a first container group and a second container group; and
And a to-be-rescheduled container group determining module configured to determine that the first container group is the to-be-rescheduled container group in response to determining that a time length of a restart of the second container group is greater than a time length of a restart of the first container group.
Example 15 the apparatus of any one of examples 11-14, wherein the group of containers to be rescheduled is not a group of containers to perform a front-end task.
Example 16 the apparatus of any one of examples 11-15, further comprising:
a node configuration module configured to configure one or more new nodes in response to a quantity of accelerator resources required to run a container group of the machine learning model being greater than a remaining quantity of accelerator resources on the one or more nodes.
Example 17 the apparatus of any one of examples 11-16, wherein the container group migration module comprises:
an instruction acquisition module configured to acquire a stop instruction for the group of containers to be rescheduled, the stop instruction indicating that the group of containers to be rescheduled stops running;
the data storage module is configured for storing the running state data of the container group to be rescheduled in response to the acquisition of the stop instruction; and
And an operation stopping module configured to stop the operation of the container group to be rescheduled in response to receiving a notification indicating that the operation state data of the container group to be rescheduled has been saved to a file system.
Example 18 the apparatus of examples 11-17, further comprising:
a launch run module configured to launch a set of containers running the machine learning model on the first node; and
a resource allocation module configured to allocate accelerator resources for the container group to be rescheduled at the second node.
Example 19 the apparatus of any one of examples 11-18, wherein allocating accelerator resources for the container group to be rescheduled on the second node is based on a priority of accelerator resource allocation requests of the container group to be rescheduled being higher than a priority of other container groups on the second node.
Example 20 the apparatus of any one of examples 11-19, further comprising:
a resource allocation module configured to allocate accelerator resources for the accelerator resource allocation request in a node of the one or more nodes in response to an amount of accelerator resources required to run a container group of the machine learning model being less than or equal to a remaining amount of accelerator resources on the one or more nodes.
Example 21. An electronic device, comprising:
a processor; and
a memory coupled with the processor, the memory having instructions stored therein, which when executed by the processor, cause the electronic device to perform actions comprising:
obtaining an accelerator resource allocation request related to a machine learning model, the accelerator resource allocation request indicating an amount of accelerator resources required to run a container group of the machine learning model;
triggering a rescheduling condition in response to the accelerator resource allocation request, and determining a container group to be rescheduled according to a rescheduling strategy; and
the accelerator resource allocation request is allocated accelerator resources on a first node by migrating the container group to be rescheduled from the first node to a second node.
Example 22 the electronic device of example 21, wherein the rescheduling condition comprises at least one of:
the amount of accelerator resources required to run the container group of the machine learning model is greater than the remaining amount of accelerator resources on each node, an
The number of accelerator resources required to run the container group of the machine learning model is less than or equal to the total remaining amount of accelerator resources for the respective nodes. .
Example 23 the electronic device of any of examples 21-22, wherein determining the set of containers to be scheduled according to the rescheduling policy comprises:
and determining the container group to be rescheduled and the first node in response to the accelerator resource free amount of the one or more nodes being greater than or equal to the accelerator resource amount occupied by the container group to be rescheduled, and in response to the accelerator resource free amount of the one or more nodes being greater than or equal to the accelerator resource amount required to run the container group of the machine learning model after the accelerator resource amount occupied by the container group to be rescheduled is released. .
Example 24 the electronic device of any of examples 21-23, the acts further comprising:
determining a candidate plurality of container groups, the plurality of container groups including a first container group and a second container group; and
in response to determining that the time period of restarting the second container group is greater than the time period of restarting the first container group, determining that the first container group is the container group to be rescheduled.
Example 25 the electronic device of any of examples 21-24, wherein the group of containers to be rescheduled is not a group of containers to perform a front-end task.
Example 26 the electronic device of any of examples 21-25, the acts further comprising:
one or more new nodes are configured in response to the amount of accelerator resources required to run the container set of the machine learning model being greater than the remaining amount of accelerator resources on the one or more nodes.
Example 27 the electronic device of any of examples 21-26, wherein by migrating the group of containers to be rescheduled from a first node to a second node comprises:
acquiring a stop instruction for the container group to be rescheduled, wherein the stop instruction indicates the container group to be rescheduled to stop running;
storing the running state data of the container group to be rescheduled in response to the obtaining of the stopping instruction; and
and stopping the operation of the container group to be rescheduled in response to receiving a notification indicating that the operation state data of the container group to be rescheduled has been saved to a file system.
Example 28 the electronic device of any of examples 21-27, wherein the actions further comprise:
initiating, at the first node, a set of containers running the machine learning model; and
accelerator resources are allocated on the second node for the container group to be rescheduled.
Example 29 the electronic device of any of examples 21-28, wherein allocating accelerator resources for the container group to be rescheduled on the second node is based on a priority of accelerator resource allocation requests of the container group to be rescheduled being higher than a priority of other container groups on the second node.
Example 30 the electronic device of any of examples 21-29, wherein the actions further comprise:
in response to an amount of accelerator resources required to run the container group of the machine learning model being less than or equal to a remaining amount of accelerator resources on one or more nodes, accelerator resources are allocated for the accelerator resource allocation request in a node of the one or more nodes.
Example 31. A computer-readable storage medium having stored thereon computer-executable instructions, wherein the computer-executable instructions are executed by a processor to implement the method according to any of examples 1 to 10.
Example 32. A computer program product tangibly stored on a computer-readable medium and comprising computer-executable instructions that, when executed by an apparatus, cause the apparatus to perform the method of any one of examples 1 to 10.
Although the disclosure has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims (13)

1. A method for scheduling accelerator resources, comprising:
obtaining an accelerator resource allocation request related to a machine learning model, the accelerator resource allocation request indicating an amount of accelerator resources required to run a container group of the machine learning model;
triggering a rescheduling condition in response to the accelerator resource allocation request, and determining a container group to be rescheduled according to a rescheduling strategy; and
the accelerator resource allocation request is allocated accelerator resources on a first node by migrating the container group to be rescheduled from the first node to a second node.
2. The method of claim 1, wherein the rescheduling condition comprises at least one of:
the number of accelerator resources required to run the container set of the machine learning model is greater than the remaining amount of accelerator resources on each node; and
The number of accelerator resources required to run the container group of the machine learning model is less than or equal to the total remaining amount of accelerator resources for the respective nodes.
3. The method of claim 1, wherein determining a set of containers to be scheduled according to a rescheduling policy comprises:
and determining the container group to be rescheduled and the first node in response to the accelerator resource free amount of the one or more nodes being greater than or equal to the accelerator resource amount occupied by the container group to be rescheduled, and in response to the accelerator resource free amount of the one or more nodes being greater than or equal to the accelerator resource amount required to run the container group of the machine learning model after the accelerator resource amount occupied by the container group to be rescheduled is released.
4. A method according to claim 3, further comprising:
determining a candidate plurality of container groups, the plurality of container groups including a first container group and a second container group; and
in response to determining that the length of time of restart of the second container group is greater than the length of time of restart of the first container group, the first container group is determined to be the container group to be rescheduled.
5. A method according to claim 3, wherein the set of containers to be rescheduled is not a set of containers performing a front-end task.
6. A method according to claim 3, further comprising:
one or more new nodes are configured in response to the amount of accelerator resources required to run the container set of the machine learning model being greater than the remaining amount of accelerator resources on the one or more nodes.
7. The method of claim 1, wherein migrating the group of containers to be rescheduled from a first node to a second node comprises:
acquiring a stop instruction for the container group to be rescheduled, wherein the stop instruction indicates the container group to be rescheduled to stop running;
storing the running state data of the container group to be rescheduled in response to the obtaining of the stopping instruction; and
and stopping the operation of the container group to be rescheduled in response to receiving a notification indicating that the operation state data of the container group to be rescheduled has been saved to a file system.
8. The method of claim 1, further comprising:
initiating, at the first node, a set of containers running the machine learning model; and
accelerator resources are allocated on the second node for the container group to be rescheduled.
9. The method of claim 8, wherein allocating accelerator resources for the container group to be rescheduled on the second node is based on a priority of accelerator resource allocation requests of the container group to be rescheduled being higher than priorities of other container groups on the second node.
10. The method of claim 1, further comprising:
in response to an amount of accelerator resources required to run the container group of the machine learning model being less than or equal to a remaining amount of accelerator resources on one or more nodes, accelerator resources are allocated for the accelerator resource allocation request in a node of the one or more nodes.
11. An apparatus for scheduling accelerator resources, comprising:
a request acquisition module configured to acquire an accelerator resource allocation request related to a machine learning model, the accelerator resource allocation request indicating an amount of accelerator resources required to run a container group of the machine learning model;
a container group determination module configured to trigger a rescheduling condition in response to the accelerator resource allocation request, determine a container group to be rescheduled according to a rescheduling policy; and
a container group migration module configured to allocate accelerator resources for the accelerator resource allocation request on a first node by migrating the container group to be rescheduled from the first node to a second node.
12. An electronic device, comprising:
a processor; and
a memory coupled with the processor, the memory having instructions stored therein, which when executed by the processor, cause the electronic device to perform the method of any of claims 1-10.
13. A computer readable storage medium having stored thereon computer executable instructions, wherein the computer executable instructions are executed by a processor to implement the method of any of claims 1 to 10.
CN202311413945.1A 2023-10-27 2023-10-27 Method, apparatus, electronic device and medium for scheduling accelerator resources Pending CN117472570A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311413945.1A CN117472570A (en) 2023-10-27 2023-10-27 Method, apparatus, electronic device and medium for scheduling accelerator resources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311413945.1A CN117472570A (en) 2023-10-27 2023-10-27 Method, apparatus, electronic device and medium for scheduling accelerator resources

Publications (1)

Publication Number Publication Date
CN117472570A true CN117472570A (en) 2024-01-30

Family

ID=89630486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311413945.1A Pending CN117472570A (en) 2023-10-27 2023-10-27 Method, apparatus, electronic device and medium for scheduling accelerator resources

Country Status (1)

Country Link
CN (1) CN117472570A (en)

Similar Documents

Publication Publication Date Title
US10467725B2 (en) Managing access to a resource pool of graphics processing units under fine grain control
JP6646114B2 (en) Dynamic virtual machine sizing
US9513962B2 (en) Migrating a running, preempted workload in a grid computing system
JP5939740B2 (en) Method, system and program for dynamically allocating resources
EP3073373B1 (en) Method for interruption affinity binding of virtual network interface card, and computer device
US8954982B2 (en) Resource management using reliable and efficient delivery of application performance information in a cloud computing system
US9875145B2 (en) Load based dynamic resource sets
US9063783B2 (en) Coordinating parallel execution of processes using agents
CN111796908B (en) System and method for automatic elastic expansion and contraction of resources and cloud platform
CN109564528B (en) System and method for computing resource allocation in distributed computing
KR102052964B1 (en) Method and system for scheduling computing
CN112783659B (en) Resource allocation method and device, computer equipment and storage medium
US11169846B2 (en) System and method for managing tasks and task workload items between address spaces and logical partitions
CN109697114B (en) Method and machine for application migration
CN114968567A (en) Method, apparatus and medium for allocating computing resources of a compute node
Wu et al. ABP scheduler: Speeding up service spread in docker swarm
CN116157778A (en) System and method for hybrid centralized and distributed scheduling on shared physical hosts
Xiao et al. Energy-efficiency enhanced virtual machine scheduling policy for mixed workloads in cloud environments
EP3430510B1 (en) Operating system support for game mode
CN117472570A (en) Method, apparatus, electronic device and medium for scheduling accelerator resources
Liu et al. Improving resource utilization of a cloud-based testing platform for android applications
KR101334842B1 (en) Virtual machine manager for platform of terminal having function of virtualization and method thereof
Liu et al. Saving energy consumption for mixed workloads in cloud platforms
CN114168294B (en) Method and device for distributing compiling resources, electronic equipment and storage medium
KR20190061241A (en) Mesos process apparatus for unified management of resource and method for the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination