CN117032935A - Management scheduling system and method for heterogeneous accelerator card based on K8s - Google Patents

Management scheduling system and method for heterogeneous accelerator card based on K8s Download PDF

Info

Publication number
CN117032935A
CN117032935A CN202311182106.3A CN202311182106A CN117032935A CN 117032935 A CN117032935 A CN 117032935A CN 202311182106 A CN202311182106 A CN 202311182106A CN 117032935 A CN117032935 A CN 117032935A
Authority
CN
China
Prior art keywords
card
resource information
resource
module
accelerator card
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311182106.3A
Other languages
Chinese (zh)
Other versions
CN117032935B (en
Inventor
蔚滢璐
杨超
张耒
侯雨希
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transwarp Technology Shanghai Co Ltd
Original Assignee
Transwarp Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Transwarp Technology Shanghai Co Ltd filed Critical Transwarp Technology Shanghai Co Ltd
Priority to CN202311182106.3A priority Critical patent/CN117032935B/en
Publication of CN117032935A publication Critical patent/CN117032935A/en
Application granted granted Critical
Publication of CN117032935B publication Critical patent/CN117032935B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a management scheduling system and method of heterogeneous accelerator cards based on K8s. The resource discovery module is used for detecting heterogeneous accelerator card resource information; the resource management module is used for acquiring heterogeneous acceleration card resource information from the interface service so as to manage the heterogeneous acceleration card resource information; the scheduling module is used for acquiring heterogeneous accelerator card resource information from the resource management module based on the resource request sent by the interface service module, determining a target accelerator card according to the heterogeneous accelerator card resource information, scheduling the container set to a node where the target accelerator card is located, and sending the resource information of the target accelerator card to the resource discovery module for caching; the container running time module is used for acquiring the resource information of the target acceleration card from the resource discovery module and creating containers in the container set according to the resource information of the target acceleration card. The complexity of managing and scheduling the heterogeneous accelerator cards can be reduced.

Description

Management scheduling system and method for heterogeneous accelerator card based on K8s
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a management scheduling system and method of heterogeneous accelerator cards based on K8s.
Background
With the continuous popularization of Kubernetes (K8 s for short) technology, more and more manufacturers have abandoned a way of deploying applications on virtual machines or physical machines, and gradually begin to adopt container technology, and the difficulty of deploying or online application is reduced through a containerization way. In the field of artificial intelligence (Artificial Intelligence, AI), most AI applications require the use of heterogeneous accelerators (AI accelerators for short) to exert maximum service capacity.
Along with the increasing of manufacturers and models of AI accelerator cards, a containerization scheme, a K8s plug-in scheme, an index monitoring scheme and a resource use interface aiming at a specified manufacturer or model need to be deployed in a K8s cluster, so that an effective heterogeneous accelerator card management and scheduling scheme is needed.
Disclosure of Invention
The embodiment of the invention provides a management scheduling system and a management scheduling method for heterogeneous accelerator cards based on K8s, which can reduce the complexity of management and scheduling of the heterogeneous accelerator cards.
In a first aspect, an embodiment of the present invention provides a management scheduling system for a heterogeneous accelerator card based on K8s, including: the system comprises a resource discovery module, a resource management module, a scheduling module, a container runtime module and an interface service module;
the resource discovery module, the resource management module and the scheduling module are all connected with the interface service module, the resource discovery module is connected with the container runtime module, and the resource management module is connected with the scheduling module;
the resource discovery module is used for detecting heterogeneous accelerator card resource information and sending the heterogeneous accelerator card resource information to the interface service module; the heterogeneous accelerator card resource information comprises memory resource information, computing resource information and use resource information;
the resource management module is used for acquiring heterogeneous acceleration card resource information from the interface service so as to manage the heterogeneous acceleration card resource information;
the scheduling module is used for acquiring heterogeneous acceleration card resource information from the resource management module based on the resource request sent by the interface service module, determining a target acceleration card according to the heterogeneous acceleration card resource information, scheduling a container set to a node where the target acceleration card is located, and sending the resource information of the target acceleration card to the resource discovery module for caching;
the container running time module is used for acquiring the resource information of the target acceleration card from the resource discovery module, and creating a container in the container set according to the resource information of the target acceleration card so as to start the container set.
In a second aspect, an embodiment of the present invention further provides a management scheduling method for a heterogeneous accelerator card based on K8s, including:
acquiring a plurality of heterogeneous accelerator card resource information based on a resource request; the heterogeneous accelerator card resource information comprises memory resource information, computing resource information and use resource information;
determining at least one target accelerator card according to the heterogeneous accelerator card resource information;
dispatching the container set to a node where the target accelerator card is located;
and creating a container in the container set according to the resource information of the at least one target accelerator card so as to start the container set.
The embodiment of the invention discloses a management scheduling system and method of heterogeneous accelerator cards based on K8s. The system comprises a resource discovery module, a resource management module, a scheduling module, a container runtime module and an interface service module; the resource discovery module, the resource management module and the scheduling module are all connected with the interface service module, the resource discovery module is connected with the container operation time module, and the resource management module is connected with the scheduling module; the resource discovery module is used for detecting heterogeneous accelerator card resource information and sending the heterogeneous accelerator card resource information to the interface service module; the heterogeneous accelerator card resource information comprises memory resource information, computing resource information and use resource information; the resource management module is used for acquiring heterogeneous acceleration card resource information from the interface service so as to manage the heterogeneous acceleration card resource information; the scheduling module is used for acquiring heterogeneous accelerator card resource information from the resource management module based on the resource request sent by the interface service module, determining a target accelerator card according to the heterogeneous accelerator card resource information, scheduling the container set to a node where the target accelerator card is located, and sending the resource information of the target accelerator card to the resource discovery module for caching; the container run-time module is used for acquiring the resource information of the target accelerator card from the resource discovery module, and creating a container in the container set according to the resource information of the target accelerator card so as to start the container set. The management and scheduling system for the heterogeneous accelerator cards provided by the embodiment of the invention can reduce the complexity of management and scheduling of the heterogeneous accelerator cards.
Drawings
FIG. 1 is a schematic diagram of a management scheduling system of heterogeneous accelerator cards based on K8s according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a resource discovery module according to a first embodiment of the invention;
FIG. 3 is a schematic diagram of a container runtime module in accordance with a first embodiment of the invention;
FIG. 4 is a block diagram of a management scheduling system for heterogeneous accelerator cards based on K8s according to a first embodiment of the present invention;
fig. 5 is a flowchart of a management scheduling method of heterogeneous accelerator cards based on K8s in a second embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
The AI acceleration card is a module for processing a large amount of computation tasks in artificial intelligence, and is widely applied to the fields of face recognition, automatic driving, security protection, unmanned aerial vehicles and the like. The AI accelerator is typically installed in the server as a high-speed serial computer expansion (peripheral component interconnect express, PCIe) device, and the operating system recognizes the corresponding AI accelerator by installing a corresponding driver, and the program uses the AI accelerator in the program by a corresponding software tool (SDK). In the container cloud platform, since the container runtime and Kubernetes management platform are used, the discovery of AI accelerator card resources, AI accelerator card resource management and scheduling and AI accelerator card containerization use are involved.
The high-level Container Runtime module interacts with the underlying low-level container runtime (low-level Container Runtime) module via the OCI specification, which performs the creation and management operations of the container. I.e., bound by the underlying container runtime module and AI accelerator card. To support the use of AI acceleration cards in a container, each AI acceleration card vendor may implement a low-level container runtime module that supports the use of the respective AI acceleration card. Before the low-level container runtime modules execute the actions of creating containers, the configuration of the incoming environment variables and the like is checked, and according to the interface, the AI acceleration card of the manufacturer needs to be used by a certain container, and the container runtime modules inject the configuration information of the AI acceleration card into the container.
In the container cloud platform K8s, memory resources and computing resources can be applied to the K8s bottom layer, and a scheduler in the K8s schedules a container set (pod) to a proper node according to the actual running state of each node in the cluster. To enable K8s to discover and manage AI acceleration cards, K8s provides a device resource management (device plug in) mechanism for exposure of device resource status on the node and injection of device information into the container prior to container startup. Meanwhile, K8s also provides an expansion mechanism of the scheduler, and a scheduling algorithm can be expanded or customized according to scene requirements.
Different enterprises increasingly have the scene that the same cluster uses heterogeneous AI acceleration cards due to internal and external factors, for example: different types of AI accelerator cards of the same manufacturer; different models of AI accelerator cards from different vendors.
Currently, to support heterogeneous AI acceleration cards in a container cloud platform or cluster, the following problems and challenges exist:
1. the containers currently created at K8s cannot dynamically specify low-level container runtime modules, meaning that the pod created on a single node can only use the same default low-level container runtime module. If multiple manufacturer AI accelerator cards are installed on a node, only one AI accelerator card device may be used.
2. Each node of the K8s cluster needs to install different types of device plug in components according to the AI accelerator card to be installed and the functions to be used, and problems of device plug in conflict and difficult management can potentially occur.
3. Cluster AI acceleration card resources lack a global management perspective. There is no intuitive way to obtain the information of different AI accelerator cards at the cluster level, such as the binding relation between the pod or container and the AI accelerator cards, the number of the AI accelerator cards used, the calculation power and the video memory used by each AI accelerator card, and the like.
4. The default K8s scheduler will only be based on AI accelerator resource usage and will not use more information.
5. In declaring the use of AI accelerator card resources, various vendor specific languages need to be used.
6. When multiple types of AI acceleration cards exist in the cluster, the monitoring component needs to interface with monitoring indexes of various manufacturers.
Example 1
Fig. 1 is a schematic structural diagram of a management scheduling system of a heterogeneous accelerator card based on K8s according to a first embodiment of the present invention, as shown in fig. 1, the system includes: a resource discovery module 110, a resource management module 120, a scheduling module 130, a container runtime module 140, and an interface service module 150.
The resource discovery module 110, the resource management module 120, and the scheduling module 130 are all connected to the interface service module 150, the resource discovery module 110 is connected to the container runtime module 140, and the resource management module 120 is connected to the scheduling module 130. The interface service module 150 may be understood as an API server in K8s, which is used as a hub for data interaction and communication between other modules.
In this embodiment, the resource discovery module 110 is configured to detect heterogeneous accelerator card resource information, and send the heterogeneous accelerator card resource information to the interface service module 150. The resource management module 120 is configured to obtain heterogeneous accelerator card resource information from the interface service module 150, so as to manage the heterogeneous accelerator card resource information. The scheduling module 130 is configured to obtain heterogeneous acceleration card resource information from the resource management module 120 based on the resource request sent by the interface service module 150, determine a target acceleration card according to the heterogeneous acceleration card resource information, schedule the container set to a node where the target acceleration card is located, and send the resource information of the target acceleration card to the resource discovery module 110 for caching. The container runtime module 140 is configured to obtain the resource information of the target accelerator card from the resource discovery module 110, and create a container in the container set according to the resource information of the target accelerator card, so as to start the container set.
The heterogeneous accelerator card resource information comprises memory resource information, computing resource information and usage resource information. The memory resource information may be understood as the memory capacity of the accelerator card, the computing resource information may be understood as the number of computing units of the accelerator card, and the usage resource information may include the memory resources and computing resources already used by the accelerator card.
In this embodiment, the resource discovery module 110 is disposed on each node (which may also be referred to as a server) with an AI accelerator card in the K8s cluster, and is configured to monitor the AI accelerator card, report the resource information of the AI accelerator card to the interface service module 150, and respond to a request from the container runtime module 140. In order to better interface with the K8s, the resource discovery module 110 may implement function expansion in a Device plug in (Device plug in) manner, and may perform data interaction with the resource management module 120 through an API Server.
Specifically, fig. 2 is a schematic structural diagram of a resource discovery module in the present embodiment, and as shown in fig. 2, the resource discovery module 110 includes an acceleration card management unit 111, a resource reporting unit 112, a management interface unit 113, and an acceleration card service unit 114.
Wherein, a plurality of acceleration card plug-ins are set in the acceleration card management unit 111, and heterogeneous acceleration card resource information is acquired through the acceleration card plug-ins 111. The resource reporting unit 112 is configured to report the memory resource information and the computing resource information to the interface service module through the management interface unit 113. The accelerator card service unit 114 is configured to provide an interface for the container runtime module 140 to invoke a query.
In this embodiment, the accelerator card management unit 111 adapts to AI accelerator cards of different types or different models through a plug-in mechanism, and when the accelerator card management unit 111 is started, the AI accelerator cards in the nodes are automatically scanned, corresponding plug-ins are loaded according to basic information (such as identification, name and model) of the scanned AI accelerator cards, and the resource information of the accelerator cards is obtained through the loaded plug-ins. The memory resource information and the computing resource information in the accelerator card resource information are sent to the resource reporting unit 112, and the usage resource information is sent to the accelerator card service unit 114. The resource reporting unit 112 includes a memory resource (memory) reporting subunit and a computing resource (core) reporting subunit, which are both embodied in plug-ins. The memory resource reporting subunit is used for reporting the memory resource information, and the computing resource reporting subunit is used for reporting the computing resource information. In this embodiment, the resource reporting unit 112 firstly reports the memory resource information and the computing resource information to the K8s component (Kubelet) through the management interface unit 113, and then the K8s component sends the memory resource information and the computing resource information to the interface service module 150. The management interface unit 113 is configured to enable communication between the resource reporting unit 112 and the K8s components, and may be implemented based on remote procedure call (Remote Procedure Call, PRC) techniques. Accelerator card service unit 114 is operative to provide an interface for container runtime module 140 to call to query the accelerator card for resource information so that the container runtime module binds the AI accelerator card to the newly created container. And simultaneously transmits the usage resource information to the interface service module 150, so that the interface service module 150 transmits the usage resource information to the resource management module 120.
In this embodiment, the resource discovery module may adapt to a plurality of heterogeneous AI acceleration cards (different manufacturers, different models, etc.) through a plug-in mechanism, and expose the memory resource information and the computing resource information of the acceleration card to K8s through the memory resource reporting subunit and the computing resource reporting subunit in a device plug-in manner. In addition, the resource discovery module reports the usage resource information to the resource management module and is invoked by the container runtime module.
Specifically, the resource management module 120 is further configured to group heterogeneous accelerator card resource information according to accelerator cards, obtain a plurality of accelerator card groups, and set group names of the accelerator card groups.
The heterogeneous accelerator card resource information further comprises: the name of the node where the accelerator card is located, the use state of the accelerator card, the identifier of the accelerator card, the path of the accelerator card, the model of the accelerator card, the provider to which the accelerator card belongs, the use mode of the accelerator card and the container set (pod) using the accelerator card.
The accelerating card using state comprises two states of using and not using, the accelerating card path can be understood as a path of the accelerating card on the node host, the accelerating card using mode comprises an exclusive mode, a sharing mode and an unlimited mode, and a container set using the accelerating card comprises information such as a pod name, an affiliated space name and the like. In this embodiment, the resource management module 120 may manage heterogeneous accelerator card resource information in a list form, or construct a resource information model (e.g., a tree model), and manage heterogeneous accelerator card resource information in a model form. Heterogeneous accelerator card resource information may be stored in the storage component ETCD of K8s.
Specifically, the scheduling module 130 is configured to obtain a preset scheduling condition, and determine a target accelerator card based on the scheduling condition and heterogeneous accelerator card resource information.
Wherein the scheduling conditions may include: accelerator card usage pattern, accelerator card model number, and group name of accelerator card group. The accelerator card usage pattern may be understood as screening accelerator cards according to usage patterns preset in the pod; the accelerator card model can be understood to screen accelerator cards according to the model set in advance in the pod; the group name of the accelerator card group can be understood as screening accelerator cards according to the group name set in advance in the pod.
In this embodiment, the scheduling module 130 may be implemented by a filter plug-in (filterplug in), a score plug-in, a reservation plug-in (reserveplug in), and a prebinding plug-in (prebinding plug in). In the scheduling process of the Pod, firstly, a filtering plug-in is called, the main work of the filtering plug-in is to judge whether the node meets the condition of running the Pod, and if the node meets the condition of running the Pod, a successful status code is returned. When all the plugins return success, this indicates that the node is selected as a schedulable node. And then running a scoring plugin, selecting the node with the highest score from the schedulable node list as a suggested scheduling node of the pod according to the score, wherein although the pod is not actually scheduled to the node at the moment, if the scoring plugin makes a decision on scheduling based on certain saved states, in order to prevent the pod in the scheduling stage from competing with the pod in the binding stage for resources, the states of the plugin need to be updated, such as reserving resources needed by the pod. At this time, a reserved plug-in is called, two methods are provided in the plug-in, the Reserve method can be called before the Pod enters the binding stage to update the state of the plug-in, and if the later plug-in is called failed or the Pod is refused to be scheduled, the Unreserve method can be called to rollback the update of the plug-in state by the Reserve method.
In this embodiment, when scheduling multiple pod of a distributed multi-machine multi-card task, a reservation plug-in is needed to ensure that there are sufficient resources allocated to the task. Or when the resources are insufficient, a scene that the resource part is reserved does not appear.
The pre-binding plug-in, through which the AI acceleration card information is injected into the pod, is invoked before the pod is actually bound to the proposed scheduling node. For example: the manner of Annography can be used: "AI-devices/assigned-AI-of-container-x", x represents the sequence number of the container in the pod object configuration file, the value of the annotation is the ID of the accelerator card, if there are multiple accelerator cards, then use "," separate, and finally update pod in K8s cluster with updated pod object.
Scheduling module 130 completes the scheduling decision for the accelerator card and injects the AI accelerator card's resource information into the pod's annotation. For containers using AI acceleration cards, it is of interest if the resource information of the AI acceleration card on the node meets the resource conditions set by the pod.
Specifically, the container runtime module 140 is configured to create a container using an accelerator card, and provides a plug-in mechanism to facilitate the initialization process associated with the accelerator card. Fig. 3 is a schematic structural diagram of a container runtime module in this embodiment, and as shown in fig. 3, the container runtime module includes a call layer interface 141, a configuration unit 142, and an initialization plug-in 142.
Wherein the call layer (Open Container Initiative, OCI) interface 141 is configured to communicate with a container management tool (container) and call the container management tool interface to query configuration information of the container; the configuration unit 142 is configured to adjust configuration information based on resource information of the target accelerator card; the initialization plug-in 142 is used for creating a container in the container set and hanging the running plug-in of the target accelerator card on the container based on the adjusted configuration information, and initializing the container on which the target accelerator card is mounted.
The configuration unit 142 may be understood as a container runtime library, for performing resource configuration. The initialization plug-in 143 may be plural and may be a plug-in used by a manufacturer to mount accelerator cards into a container to execute associated initialization logic.
Optionally, the configuration unit 142 is configured to adjust the identifier of the target accelerator card to an environment variable field of the configuration information; the configuration information further comprises a container set identifier and a container name.
By way of example, fig. 4 is a schematic diagram of a management scheduling system of a heterogeneous acceleration card based on K8s in the present embodiment, and as shown in fig. 4, the process of creating a container may be: after the pod is dispatched to the designated node, the Kubelet component invokes the container engine docket, which invokes the container tool contained, ultimately invoking the container runtime module to create the container for the pod: the method comprises the steps that when a container runs, a module de-sequences a configuration file of the container into a configuration object; then, resolving a container set identifier (pod uuid) and a container name from the configuration object, and requesting a resource information column of the target accelerator card from a resource discovery module; if one pod binds a plurality of AI acceleration cards, calling plug-ins of different manufacturers in turn to complete relevant configuration of environmental parameters, and mounting operation plug-ins of a target acceleration card; the container runtime module writes the configuration information into the original configuration file and finally creates the container.
The management scheduling system of heterogeneous accelerator cards based on K8s of the embodiment comprises: the system comprises a resource discovery module, a resource management module, a scheduling module, a container runtime module and an interface service module; the resource discovery module, the resource management module and the scheduling module are all connected with the interface service module, the resource discovery module is connected with the container operation time module, and the resource management module is connected with the scheduling module; the resource discovery module is used for detecting heterogeneous accelerator card resource information and sending the heterogeneous accelerator card resource information to the interface service module; the heterogeneous accelerator card resource information comprises memory resource information, computing resource information and use resource information; the resource management module is used for acquiring heterogeneous acceleration card resource information from the interface service so as to manage the heterogeneous acceleration card resource information; the scheduling module is used for acquiring heterogeneous accelerator card resource information from the resource management module based on the resource request sent by the interface service module, determining a target accelerator card according to the heterogeneous accelerator card resource information, scheduling the container set to a node where the target accelerator card is located, and sending the resource information of the target accelerator card to the resource discovery module for caching; the container run-time module is used for acquiring the resource information of the target accelerator card from the resource discovery module, and creating a container in the container set according to the resource information of the target accelerator card so as to start the container set. The management and scheduling system for the heterogeneous accelerator cards provided by the embodiment of the invention can reduce the complexity of management and scheduling of the heterogeneous accelerator cards.
Example two
Fig. 5 is a flowchart of a management scheduling method of heterogeneous accelerator cards based on K8s according to a second embodiment of the present invention, as shown in fig. 5, the method includes the following steps:
s510, acquiring a plurality of heterogeneous accelerator card resource information based on the resource request.
The heterogeneous accelerator card resource information comprises memory resource information, computing resource information and use resource information;
s520, determining at least one target accelerator card according to the heterogeneous accelerator card resource information.
The method for determining at least one target accelerator card to be scheduled according to the heterogeneous accelerator card resource information may be to obtain a preset scheduling condition; screening the heterogeneous accelerator card resource information based on the scheduling conditions to obtain candidate heterogeneous accelerator card resource information; and processing the candidate heterogeneous accelerator card resource information based on a set scheduling algorithm to obtain at least one target accelerator card for scheduling.
S530, dispatching the container set to a node where the target accelerator card is located;
s540, creating a container in the container set according to the resource information of the at least one target accelerator card to start the container set.
Specifically, the manner of creating the container according to the resource information of the at least one target accelerator card may be: acquiring configuration information; adjusting an environment variable field of the configuration information according to the identification of the target accelerator card; and creating containers in the container set based on the adjusted configuration information.
In this embodiment, the management scheduling method of the heterogeneous accelerator card based on K8s may refer to the functions of each module of the management scheduling system of the heterogeneous accelerator card based on K8s in the foregoing embodiment, which is not described herein.
According to the technical scheme, a plurality of heterogeneous accelerator card resource information is acquired based on a resource request; the heterogeneous accelerator card resource information comprises memory resource information, computing resource information and use resource information; determining at least one target accelerator card according to the heterogeneous accelerator card resource information; dispatching the container set to a node where the target accelerator card is located; a container is created in the container set according to the resource information of the at least one target accelerator card to launch the container set. The complexity of managing and scheduling the heterogeneous accelerator cards can be reduced. ,
it should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A heterogeneous accelerator card management scheduling system based on K8s, comprising: the system comprises a resource discovery module, a resource management module, a scheduling module, a container runtime module and an interface service module;
the resource discovery module, the resource management module and the scheduling module are all connected with the interface service module, the resource discovery module is connected with the container runtime module, and the resource management module is connected with the scheduling module;
the resource discovery module is used for detecting heterogeneous accelerator card resource information and sending the heterogeneous accelerator card resource information to the interface service module; the heterogeneous accelerator card resource information comprises memory resource information, computing resource information and use resource information;
the resource management module is used for acquiring heterogeneous acceleration card resource information from the interface service module so as to manage the heterogeneous acceleration card resource information;
the scheduling module is used for acquiring heterogeneous acceleration card resource information from the resource management module based on the resource request sent by the interface service module, determining a target acceleration card according to the heterogeneous acceleration card resource information, scheduling a container set to a node where the target acceleration card is located, and sending the resource information of the target acceleration card to the resource discovery module for caching;
the container running time module is used for acquiring the resource information of the target acceleration card from the resource discovery module, and creating a container in the container set according to the resource information of the target acceleration card so as to start the container set.
2. The system of claim 1, wherein the resource discovery module comprises an accelerator card management unit, a resource reporting unit, a management interface unit, and an accelerator card service unit;
the acceleration card management unit is provided with a plurality of acceleration card plug-ins, and heterogeneous acceleration card resource information is acquired through the acceleration card plug-ins;
the resource reporting unit is used for reporting the memory resource information and the computing resource information to the interface service module through the management interface unit;
the acceleration card service unit is used for providing an interface for the module to call and inquire when the container runs.
3. The system of claim 1, wherein the resource management module is further configured to group the heterogeneous accelerator card resource information according to accelerator cards, obtain a plurality of accelerator card groups, and set a group name of each accelerator card group; the heterogeneous accelerator card resource information further includes: the method comprises the steps of node names of accelerator cards, accelerator card use states, accelerator card identification, accelerator card paths, accelerator card models, suppliers of the accelerator cards, accelerator card use modes and container sets using the accelerator cards.
4. The system of claim 3, wherein the scheduling module is configured to obtain a preset scheduling condition, and determine a target accelerator card based on the scheduling condition and the heterogeneous accelerator card resource information.
5. The system of claim 4, wherein the scheduling conditions comprise: accelerator card usage pattern, accelerator card model number, and group name of heterogeneous accelerator card group.
6. The system of claim 3, wherein the container runtime module comprises a call layer interface, a configuration unit, and an initialization plug-in;
the calling layer interface is used for communicating with the container management tool and calling the container management tool interface to inquire the configuration information of the container; the configuration unit is used for adjusting the configuration information based on the resource information of the target accelerator card; the initialization unit is used for creating containers in the container set based on the adjusted configuration information, loading the operation plug-in of the target accelerator card on the containers, and initializing the containers on which the target accelerator card is mounted.
7. The system of claim 6, wherein the configuration unit is configured to adjust an identification of the target accelerator card by an environment variable field of the configuration information; the configuration information further comprises a container set identifier and a container name.
8. A method for managing and scheduling heterogeneous accelerator cards based on K8s, wherein the method is performed by the managing and scheduling system of heterogeneous accelerator cards based on K8s according to any one of claims 1 to 7, and comprises:
acquiring a plurality of heterogeneous accelerator card resource information based on a resource request; the heterogeneous accelerator card resource information comprises memory resource information, computing resource information and use resource information;
determining at least one target accelerator card according to the heterogeneous accelerator card resource information;
dispatching the container set to a node where the target accelerator card is located;
and creating a container in the container set according to the resource information of the at least one target accelerator card so as to start the container set.
9. The method of claim 8, wherein determining at least one target accelerator card for scheduling based on the plurality of heterogeneous accelerator card resource information, comprises:
acquiring preset scheduling conditions;
screening the heterogeneous accelerator card resource information based on the scheduling conditions to obtain candidate heterogeneous accelerator card resource information;
and processing the candidate heterogeneous accelerator card resource information based on a set scheduling algorithm to obtain at least one target accelerator card for scheduling.
10. The method of claim 8, wherein creating a container from the resource information of the at least one target accelerator card comprises:
acquiring configuration information;
adjusting an environment variable field of the configuration information according to the identification of the target accelerator card;
and creating containers in the container set based on the adjusted configuration information.
CN202311182106.3A 2023-09-13 2023-09-13 Management scheduling system and method for heterogeneous accelerator card based on K8s Active CN117032935B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311182106.3A CN117032935B (en) 2023-09-13 2023-09-13 Management scheduling system and method for heterogeneous accelerator card based on K8s

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311182106.3A CN117032935B (en) 2023-09-13 2023-09-13 Management scheduling system and method for heterogeneous accelerator card based on K8s

Publications (2)

Publication Number Publication Date
CN117032935A true CN117032935A (en) 2023-11-10
CN117032935B CN117032935B (en) 2024-05-31

Family

ID=88633900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311182106.3A Active CN117032935B (en) 2023-09-13 2023-09-13 Management scheduling system and method for heterogeneous accelerator card based on K8s

Country Status (1)

Country Link
CN (1) CN117032935B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117971906A (en) * 2024-04-02 2024-05-03 山东浪潮科学研究院有限公司 Multi-card collaborative database query method, device, equipment and storage medium
CN118426971A (en) * 2024-07-03 2024-08-02 中国人民解放军96901部队 AI acceleration card resource scheduling method based on double optimization models
CN118502965A (en) * 2024-07-16 2024-08-16 苏州元脑智能科技有限公司 Acceleration card distribution method and device and artificial intelligent platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377529A (en) * 2021-05-24 2021-09-10 阿里巴巴新加坡控股有限公司 Intelligent accelerator card and data processing method based on intelligent accelerator card
WO2022095348A1 (en) * 2020-11-06 2022-05-12 浪潮(北京)电子信息产业有限公司 Remote mapping method and apparatus for computing resources, device and storage medium
CN116260876A (en) * 2023-01-31 2023-06-13 苏州浪潮智能科技有限公司 AI application scheduling method and device based on K8s and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022095348A1 (en) * 2020-11-06 2022-05-12 浪潮(北京)电子信息产业有限公司 Remote mapping method and apparatus for computing resources, device and storage medium
CN113377529A (en) * 2021-05-24 2021-09-10 阿里巴巴新加坡控股有限公司 Intelligent accelerator card and data processing method based on intelligent accelerator card
CN116260876A (en) * 2023-01-31 2023-06-13 苏州浪潮智能科技有限公司 AI application scheduling method and device based on K8s and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117971906A (en) * 2024-04-02 2024-05-03 山东浪潮科学研究院有限公司 Multi-card collaborative database query method, device, equipment and storage medium
CN118426971A (en) * 2024-07-03 2024-08-02 中国人民解放军96901部队 AI acceleration card resource scheduling method based on double optimization models
CN118502965A (en) * 2024-07-16 2024-08-16 苏州元脑智能科技有限公司 Acceleration card distribution method and device and artificial intelligent platform

Also Published As

Publication number Publication date
CN117032935B (en) 2024-05-31

Similar Documents

Publication Publication Date Title
CN117032935B (en) Management scheduling system and method for heterogeneous accelerator card based on K8s
CN111324571B (en) Container cluster management method, device and system
CN112269640B (en) Method for realizing life cycle management of container cloud component
KR102419704B1 (en) Security protection methods and devices
US10025630B2 (en) Operating programs on a computer cluster
CN113296792A (en) Storage method, device, equipment, storage medium and system
CN110838939B (en) Scheduling method based on lightweight container and edge Internet of things management platform
CN115328529B (en) Application management method and related equipment
CN111443984B (en) Container deployment method and device of network function virtualization NVF system
CN111522623B (en) Modularized software multi-process running system
CN117519972A (en) GPU resource management method and device
CN117234741A (en) Resource management and scheduling method and device, electronic equipment and storage medium
CN111447076B (en) Container deployment method and network element of network function virtualization (NVF) system
CN116974689A (en) Cluster container scheduling method, device, equipment and computer readable storage medium
CN115525391A (en) Embedded cloud platform management monitoring system
CN115422277A (en) Data source connection pool control method and device and server
CN115480910A (en) Multi-cluster resource management method and device and electronic equipment
CN112379867B (en) Embedded operating system, method and storage medium based on modular development
CN118413573B (en) Resource management method, device, computer equipment, storage medium and product
CN113127257A (en) Software upgrading method
CN115484231B (en) Pod IP distribution method and related device
CN112558991B (en) Mirror image management method and system, cloud management platform and storage medium
US20230325267A1 (en) Selective privileged container augmentation
CN118827566A (en) Communication architecture, communication method, device and medium on AUTOSAR
CN117632351A (en) Container deployment method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant