CN116661979A - Heterogeneous job scheduling system and method - Google Patents

Heterogeneous job scheduling system and method Download PDF

Info

Publication number
CN116661979A
CN116661979A CN202310962913.0A CN202310962913A CN116661979A CN 116661979 A CN116661979 A CN 116661979A CN 202310962913 A CN202310962913 A CN 202310962913A CN 116661979 A CN116661979 A CN 116661979A
Authority
CN
China
Prior art keywords
job
plug
driver
information
virtual node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310962913.0A
Other languages
Chinese (zh)
Other versions
CN116661979B (en
Inventor
王易围
高翔
潘爱民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202310962913.0A priority Critical patent/CN116661979B/en
Publication of CN116661979A publication Critical patent/CN116661979A/en
Application granted granted Critical
Publication of CN116661979B publication Critical patent/CN116661979B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44521Dynamic linking or loading; Link editing at or after load time, e.g. Java class loading
    • G06F9/44526Plug-ins; Add-ons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The application relates to a heterogeneous job scheduling system and a heterogeneous job scheduling method. The system comprises: a plug-in bus, a plug-in driver, and a plurality of computing clusters; each computing cluster is connected with a plug-in bus through plug-in drive; the driving controller is used for acquiring the registration information of the plug-in drivers and carrying out registration authentication on all the plug-in drivers according to the registration information; the virtual node controller is used for acquiring the software and hardware information of the computing cluster corresponding to the plug-in driver and the partition list of the computing cluster, distributing virtual nodes for each partition according to the partition list and the software and hardware information, and determining the software and hardware information corresponding to each virtual node; the computing job controller is used for acquiring the job uploaded by the user and the resources required by the job; the scheduler is used for acquiring software and hardware information corresponding to each virtual node and resources required by the job corresponding to each job and determining scheduling results of the virtual nodes and the jobs. By adopting the system, the cluster deployment operation simplification and the job scheduling automation can be realized, and the job scheduling efficiency is improved.

Description

Heterogeneous job scheduling system and method
Technical Field
The application relates to the technical field of computers, in particular to a heterogeneous job scheduling system and a heterogeneous job scheduling method.
Background
With the development of information technology, artificial intelligence and cloud computing have been vigorously developed, and the construction direction of a computing center is gradually changed from meeting capacity requirements to meeting various computing requirements. However, the diverse computing requirements mean that there are diverse hardware and software requirements for the computing clusters.
In the conventional technology, if the computing clusters are to be deployed to the management platform, an operator is required to perform deployment operation on each node, and in the operation process, the operator is also required to identify the availability of the bottom resources of each computing cluster and allocate the jobs to the corresponding computing clusters based on the computing requirements of the jobs. The operation of the computing cluster access management platform is complex and involves operations with high repeatability, so the efficiency of cluster access and job scheduling is still low.
Therefore, the problems of complex operation and low efficiency of the access management platform of the computing cluster and the job scheduling still exist in the traditional technology.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a heterogeneous job scheduling system and method that can simplify the operation flow of a computing cluster access management platform and job scheduling, and improve the processing efficiency.
In a first aspect, the present embodiment provides a heterogeneous job scheduling system, the system including: a plug-in bus, a plug-in driver, and a plurality of computing clusters; each computing cluster is connected to the plug-in bus through the plug-in driver; the card bus includes: a drive controller, a virtual node controller, a scheduler, and a computational job controller;
the driving controller is connected with the plug-in driver and is used for acquiring registration information of the plug-in driver and carrying out registration authentication on all the plug-in drivers according to the registration information;
the virtual node controller is connected with the plug-in driver and is used for acquiring the software and hardware information of the corresponding computing cluster of the plug-in driver and the partition list of the computing cluster, distributing virtual nodes for each partition according to the partition list and the software and hardware information, and determining the software and hardware information corresponding to each virtual node;
the computing job controller is connected with the scheduler and is used for acquiring the jobs uploaded by the user and resources required by the jobs;
the scheduler is also connected with the virtual node controller; and the method is used for acquiring the software and hardware information corresponding to each virtual node and the resources required by the job corresponding to each job, and determining the scheduling results of the virtual nodes and the jobs.
In one embodiment, the driver controller is further configured to perform health detection on the plug-in driver for which registration authentication is completed; when the feedback information of the plug-in driver based on the health detection is health information, issuing a safety certificate for the plug-in driver, and setting the driving state of the plug-in driver to be in operation; and when the feedback information of the plug-in driver based on the health detection is timeout information or non-health information, setting the driving state of the plug-in driver as failure, and carrying out queue isolation on the plug-in driver.
In one embodiment, the driver controller is further configured to perform health detection on all the card drivers with a preset time interval, and delete the card driver when feedback information of the card driver based on the health detection is timeout information or non-health information and exceeds a preset number of times.
In one embodiment, the scheduler comprises a primary scheduler and the computing cluster comprises a secondary scheduler;
the primary scheduler is used for acquiring software and hardware information corresponding to each virtual node and resources required by the job corresponding to each job and determining a first scheduling result of the virtual node and the job;
The secondary scheduler is used for acquiring software and hardware information corresponding to each virtual node and resources required by the job corresponding to each job and determining a second scheduling result of the virtual node and the job;
the primary scheduler is further configured to generate a final scheduling result according to the first scheduling result and the second scheduling result.
In one embodiment, the virtual node controller is connected with the scheduler, and is configured to obtain a scheduling result of a virtual node and a job, and issue the job to a plug-in driver corresponding to the virtual node according to the scheduling result, so that the job is run by a computing cluster corresponding to the plug-in driver.
In one embodiment, the virtual node controller is further configured to obtain, through the plug-in driver, job operation information of a corresponding computing cluster, where the job operation information includes: job log, job alert record, and job completion progress.
In one embodiment, the computing job controller is connected with the virtual node controller, and is further configured to obtain a job query instruction of a user, and obtain, according to the job query instruction, the job running information of the corresponding job from the virtual node controller.
In one embodiment, the plug-in bus is deployed through a Kubernetes container cluster.
In a second aspect, the present embodiment provides a heterogeneous job scheduling method, where the method is applied to a heterogeneous job scheduling system as described above, and the method includes:
the method comprises the steps that a driving controller obtains registration information of plug-in drivers, and all the plug-in drivers are registered and authenticated according to the registration information;
the virtual node controller obtains software and hardware information of the plug-in driver corresponding to the computing cluster and a partition list of the computing cluster, distributes virtual nodes for each partition according to the partition list and the software and hardware information, and determines the software and hardware information corresponding to each virtual node;
the method comprises the steps that a computing job controller obtains a job uploaded by a user and resources required by the job;
the dispatcher acquires software and hardware information corresponding to each virtual node and resources required by the job corresponding to each job, and determines dispatching results of the virtual nodes and the jobs.
In one embodiment, the driver controller performs health detection on the plug-in driver for which registration authentication is completed; when the feedback information of the plug-in driver based on the health detection is health information, issuing a safety certificate for the plug-in driver, and setting the driving state of the plug-in driver to be in operation; and when the feedback information of the plug-in driver based on the health detection is timeout information or non-health information, setting the driving state of the plug-in driver as failure, and carrying out queue isolation on the plug-in driver.
According to the heterogeneous job scheduling system and method, the plug-in bus, the plug-in drivers and the plurality of computing clusters are arranged in the system, wherein the plug-in bus comprises the driving controller, the virtual node controller, the scheduler and the computing job controller, registration information of the plug-in drivers is acquired through the driving controller, registration and authentication are carried out on the plug-in drivers based on the registration information, and automation of computing cluster deployment can be achieved; the virtual node controller is used for acquiring the software and hardware information and the partition list of the computing cluster to carry out virtual node matching and determination of the software and hardware information, and the scheduler is used for determining the scheduling result of the virtual node and the job according to the software and hardware information of each virtual node and the resource required by the job, so that the heterogeneous computing cluster can be plugged in, the computing cluster can be accessed more simply, the hot plug of the computing cluster can be realized, the operation of computing cluster access is simplified, and the access efficiency is improved; by automating job scheduling, the labor cost of job scheduling can be reduced, the job scheduling efficiency can be improved, and the overall computing efficiency of the computing cluster can be further improved.
Drawings
FIG. 1 is a block diagram of a heterogeneous job scheduling system in one embodiment;
FIG. 2 is a timing diagram of a heterogeneous job scheduling system in one embodiment;
FIG. 3 is a timing diagram of a heterogeneous job scheduling system in one embodiment;
FIG. 4 is a timing diagram of a heterogeneous job scheduling system in one embodiment;
FIG. 5 is a flow diagram of a heterogeneous job scheduling method in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
As shown in fig. 1, the present embodiment provides a heterogeneous job scheduling system, which includes: a card bus 100, a card driver 200, and a plurality of computing clusters 300; each of the computing clusters 300 accesses the card bus 100 through the card driver 200; the card bus includes: a drive controller 110, a virtual node controller 120, a scheduler 130, and a computation job controller 140;
the driver controller 110 is connected to the plug-in driver 200, and is configured to obtain registration information of the plug-in driver 200, and perform registration authentication on all the plug-in drivers 200 according to the registration information;
The virtual node controller 120 is connected with the plug-in driver 200, and is configured to obtain software and hardware information of the computing cluster 300 corresponding to the plug-in driver 200 and a partition list of the computing cluster 300, allocate a virtual node for each partition according to the partition list and the software and hardware information, and determine software and hardware information corresponding to each virtual node;
the computing job controller 140 is connected to the scheduler 130, and is configured to obtain a job uploaded by a user and a resource required by the job;
the schedulers 130 are respectively connected with the virtual node controllers 120; and the method is used for acquiring the software and hardware information corresponding to each virtual node and the resources required by the job corresponding to each job, and determining the scheduling results of the virtual nodes and the jobs.
The plug-in bus 100 may be a common communication trunk for transmitting information with each functional module in the heterogeneous job scheduling system, and the functional modules may include the above-mentioned driving controller 110, virtual node controller 120, scheduler 130, and computation job controller 140. The card bus 100 may be comprised of wires for transmitting data and associated control signals.
The plug-in driver 200 may be a plug-in driver of a cluster device corresponding to the computing cluster 300, where the plug-in driver 200 may be written by a vendor and implemented using any programming language, and the computing cluster 300 is connected to the plug-in bus 100 through the plug-in driver 200, so that a controller or a scheduler in the plug-in bus 100 can control or schedule the operation of the computing cluster 300 corresponding to the plug-in driver 200.
The computing cluster 300 may be a computer system, may be a single computer, or may be composed of multiple computers. The computing clusters of the embodiment can be applied to the fields of high-performance computing, big data processing, cloud computing and the like, including but not limited to application scenes such as scientific computing, data analysis, image processing, video processing and the like.
The driver controller 110 may be a controller that controls the plug-in driver 200, including acquiring registration information of the plug-in driver 200, and registering the plug-in driver 200 in the driver controller according to the registration information. Illustratively, the computing set corresponding to the plug-in driver 200 may be listed in the available cluster device list.
Virtual node controller 120 may be a controller that performs synchronization and job assignment processing on virtual nodes. The virtual node may be a resource minimum partitioning unit defined by the plug-in driver. In this embodiment, the number of virtual nodes corresponds to the number of partitions of the computing cluster, i.e., each virtual node corresponds to each partition one-to-one. It may be understood that the computing cluster corresponding to each plug-in driver may include a plurality of partitions, and corresponding software and hardware information, and the virtual node controller 120 may further store the software and hardware information of the partition corresponding to each virtual node through one-to-one correspondence between the virtual node and the partition.
The computation job controller 140 is connected to the scheduler 130, and may send the job and the resource required for the job to the scheduler 130 for scheduling after receiving the job and the resource required for the job uploaded by the user.
The scheduler 130 is configured to schedule the job uploaded by the user, specifically, the scheduler 130 is respectively connected to the computing job controller 140 and the virtual node controller 120, the scheduler 130 receives the job and the resource required by the job from the computing job controller 140, and the virtual node and the software and hardware information corresponding to the virtual node from the virtual node controller 120, matches the job with the virtual node, obtains the scheduling result of the virtual node and the job, then sends the scheduling result to the virtual node controller 120, and the virtual node controller 120 distributes the job to the computing cluster partition corresponding to the virtual node according to the scheduling result to perform the job processing.
The steps performed by the driving controller 110, the virtual node controller 120, the scheduler 130, and the computation job controller 140 may be performed in one overall controller or may be performed in a plurality of sub-controllers, respectively.
According to the heterogeneous job scheduling system provided by the embodiment, a plug-in bus, a plug-in driver and a plurality of computing clusters are arranged in the system, wherein the plug-in bus comprises a driving controller, a virtual node controller, a scheduler and a computing job controller, registration information of the plug-in driver is obtained through the driving controller, registration and authentication are carried out on the plug-in driver based on the registration information, and automation of computing cluster deployment can be achieved; the virtual node controller is used for acquiring the software and hardware information and the partition list of the computing cluster to carry out virtual node matching and determination of the software and hardware information, and the scheduler is used for determining the scheduling result of the virtual node and the job according to the software and hardware information of each virtual node and the resources required by the job, so that the heterogeneous computing cluster can be plugged in, the computing cluster can be accessed more simply, the hot plug of the computing cluster can be realized, the system deployment operation is simplified, and the deployment efficiency is improved; by automating the job scheduling, the labor cost of the job scheduling can be reduced, and the effect of improving the overall computing efficiency of the computing cluster is achieved.
In one embodiment, the driver controller is further configured to perform health detection on the plug-in driver for which registration authentication is completed; when the feedback information of the plug-in driver based on the health detection is health information, issuing a safety certificate for the plug-in driver, and setting the driving state of the plug-in driver to be in operation; and when the feedback information of the plug-in driver based on the health detection is timeout information or non-health information, setting the driving state of the plug-in driver as failure, and carrying out queue isolation on the plug-in driver.
The health state may be an operation state of the computing cluster, including a connection state, a data processing state, and the like, and if the operation state is normal, the operation state is regarded as healthy. If the health detection call returns health information, the drive manager generates a security certificate and issues the security certificate to the plug-in driver of the cluster device, and sets the state of the plug-in driver of the cluster device to be running. When the feedback information is overtime information or unhealthy information, queue isolation is performed on the plug-in driver, that is, the plug-in driver is isolated to a failure cluster device plug-in queue, and the state is set to be failed.
It can be understood that, when the plug-in driver is running, the plug-in driver requests registration from the driver controller, the driver controller sets the state of the computing cluster corresponding to the plug-in driver to be ready, and meanwhile, the driver controller can inform a cluster administrator of plug-in authentication, that is, authentication is performed on the computing cluster, if the authentication is successful, health detection is performed, and a security certificate is generated based on a health detection result and sent to the plug-in driver. Thereafter, the drive controller sets the state of the computing cluster to run, and the computing cluster may begin receiving the corresponding job assignments.
Furthermore, a plurality of similar plug-in drivers can be installed on a plurality of login nodes of the same computing cluster, so that high availability and balanced load are realized. The same security certificate can be shared by a plurality of similar plug-in drivers, and when the operation is distributed, the operation load can be balanced to any other plug-in driver.
According to the heterogeneous job scheduling system, health detection is carried out on plug-in drivers after registration and authentication are completed, safety certificates are issued or queue isolation is carried out based on detection results, real-time update of the running state of a computing cluster can be achieved, computing jobs are prevented from being distributed into wrong computing clusters, and the effects of improving the computing efficiency of the jobs and the stability of the heterogeneous job scheduling system can be achieved.
In one embodiment, the driver controller is further configured to perform health detection on all the card drivers with a preset time interval, and delete the card driver when feedback information of the card driver based on the health detection is timeout information or non-health information and exceeds a preset number of times.
The preset time may be set based on actual requirements, and may be determined based on a detection duration of the preset times. And when the feedback information of the plug-in driver based on the health detection is timeout information or non-health information and exceeds the preset times, deleting the plug-in driver, namely deleting the computing cluster corresponding to the plug-in driver from the available cluster equipment list.
According to the heterogeneous job scheduling system provided by the embodiment, when the feedback information is timeout information or non-health information and exceeds the preset times, the plug-in driver is deleted, the current plug-in driver can be determined to be in a long-term non-health or long-term timeout state, repeated scheduling in a job scheduling matching process can be reduced by deleting the plug-in driver, and the effect of improving the job scheduling efficiency can be achieved.
In one embodiment, the scheduler comprises a primary scheduler and the computing cluster comprises a secondary scheduler;
The primary scheduler is used for acquiring software and hardware information corresponding to each virtual node and resources required by the job corresponding to each job and determining a first scheduling result of the virtual node and the job;
the secondary scheduler is used for acquiring software and hardware information corresponding to each virtual node and resources required by the job corresponding to each job and determining a second scheduling result of the virtual node and the job;
the primary scheduler is further configured to generate a final scheduling result according to the first scheduling result and the second scheduling result.
The primary scheduler may schedule the job based on the resources required by the job and the software and hardware information of the virtual node, and allocate the job to the virtual node with more idle or higher matching degree of the computing resources.
The secondary scheduler may be a virtual node scheduler not perceived by the primary scheduler, and is configured to feed back a scheduling result based on a running condition of each virtual node, and may determine whether the allocated virtual node is suitable for calculation by determining information such as resource occupation, priority, and the like in the current virtual node.
Generating a final scheduling result according to the first scheduling result and the second scheduling result, wherein a certain weight value is respectively set for the first scheduling result and the second scheduling result, and a corresponding comprehensive score is calculated to determine the final scheduling result; or when the second scheduling result is inappropriate, the scheduling of the virtual node is canceled, and the first scheduling result is redetermined to generate a final scheduling result. It can be appreciated that the scheduler collects the software and hardware information and the constraints required for job scheduling from the virtual nodes, screens the appropriate virtual nodes according to the resources and constraints of the job request, and binds them with the job. If there is no suitable virtual node, the job state is set in queue and moved to the wait queue for retry. When the computing cluster further comprises a secondary scheduler, the primary scheduler executes weak scheduling, does not sense and influence the secondary scheduler, but guarantees the final scheduling result of the job in the modes of pre-allocation, drive suggestion, overtime rescheduling, dynamic priority and the like.
Further, the software and hardware information may include storage supported by the computing cluster, computing software, acceleration card signals, and the like, and may also include network topology information, computing resource usage conditions, computing power information, and the like. The computing resource usage may include usage of a CPU, a memory, a network card, an accelerator card, and the like. The plug-in driver may also provide additional constraints, such as requiring only stand-alone operations, etc. The virtual node controller may update this information to the virtual node's tag for reading by other control components.
In one embodiment, the primary scheduler is a plug-in bus scheduler, the secondary scheduler is a task scheduler of a slurry cluster, and the primary scheduler screens virtual nodes, i.e., slurry partitions, according to hard conditions, e.g., tasks requiring the use of a GPU, and does not schedule to GPU-free partitions. The primary scheduler may also score according to partition status, e.g., partition a is idle, partition B and partition C have jobs queued, but partition B queues smaller and queues less, and the score of partition a is highest and the score of partition C is lowest. The job is most likely to be dispatched to the a partition. Before scheduling, the scheduler interacts with a plug-in driver of a virtual node corresponding to the A partition to obtain a second scheduling result of the Slur cluster job scheduler, and whether the job is suitable for being distributed to the A partition is judged. In addition, after the job is scheduled and allocated, the job is not processed in a preset time, and the scheduler can cancel and reschedule the job.
According to the heterogeneous job scheduling system provided by the embodiment, through the first scheduling result and the second scheduling result of the primary scheduler and the secondary scheduler, reasonable distribution of jobs can be performed according to the software and hardware information of the virtual node, and rescheduling can be performed in time under the condition that scheduling jobs are queued overtime, so that the effect of improving job scheduling efficiency can be achieved.
In one embodiment, the virtual node controller is connected with the scheduler, and is configured to obtain a scheduling result of a virtual node and a job, and issue the job to a plug-in driver corresponding to the virtual node according to the scheduling result, so that the job is run by a computing cluster corresponding to the plug-in driver.
When the scheduler determines to issue the job to a certain virtual node, the virtual node controller drives communication to a plug-in corresponding to the virtual node based on a scheduling result, and the job is rendered and executed by the computing cluster after issuing the job information.
In one particular embodiment, as shown in FIG. 3, the partition list of the computing cluster may be a Slur partition list. The virtual node controller is further configured to:
acquiring a healthy cluster device plug-in: the virtual node controller acquires information such as a port, a resource, characteristic information, a limiting condition and the like from an available equipment plug-in buffer provided by the driving manager, and acquires a driving operator so as to call a node management interface to acquire node information.
Synchronizing virtual nodes: the virtual node controller is interacted with the plug-in driver through the driving operator, a Slurm partition list is obtained, the partition list is compared with virtual node information existing in the cluster, then virtual nodes with corresponding numbers and types are created and deleted, and the final consistency of the numbers of the virtual nodes and the Slurm partitions in the cluster is ensured.
Synchronizing virtual node resources and feature information: and interacting with a plug-in driver, acquiring the information of the Slur partition, namely the resource information of the Slur partition, synchronizing the information to a resource manager of the virtual node, synchronizing the characteristic information and the limiting condition to a label of the virtual node, and synchronizing the resource information to a resource field of the virtual node as a basis of scheduling of a scheduler.
And (3) resource allocation: when a job is scheduled to a certain virtual node, the resource manager of the virtual node logically pre-deducts the resources of the virtual node according to the job requirement, and at this time, other jobs cannot use the pre-deducted resources, but the resources are not really allocated yet. Then the virtual node controller can call the job management interface driven by the plug-in to allocate the Slur resources.
According to the heterogeneous job scheduling system provided by the embodiment, the virtual node controller obtains the scheduling result and drives the next job based on the scheduling result, so that job distribution based on the scheduling result can be realized, and reasonable distribution of the jobs can be realized.
In one embodiment, the virtual node controller is further configured to obtain, through the plug-in driver, job operation information of a corresponding computing cluster, where the job operation information includes: job log, job alert record, and job completion progress.
In one particular embodiment, the computing cluster generation job ID is returned to the virtual node controller when the job is executed. The virtual node controller can continuously interact with the computing cluster through the job ID to track the information of the job, obtain the job state and the alarm information returned by the computing cluster, synchronously modify the job state based on the job state and the alarm information, and record the alarm information into the job information. When the job fails or is completed, the virtual node controller may also collect job logs and job results to the management platform of the plug-in bus for subsequent retrieval and viewing.
According to the heterogeneous job scheduling system, job operation information of the computing clusters is acquired through plug-in drive, real-time tracking of job execution completion degree can be achieved, corresponding adjustment can be timely made when abnormal conditions occur, and the effect of improving job scheduling efficiency is achieved.
In one embodiment, the computing job controller is connected with the virtual node controller, and is further configured to obtain a job query instruction of a user, and obtain, according to the job query instruction, the job running information of the corresponding job from the virtual node controller.
According to the heterogeneous job scheduling system provided by the embodiment, the job inquiry instruction is acquired by the calculation job controller, the job running information of the corresponding job is acquired from the virtual node controller, the user side feedback of the job running state can be realized, the user can also perform corresponding scheduling adjustment based on the job running state, and the flexibility of job scheduling is improved.
In a specific embodiment, as shown in fig. 4, the calculation job issuing process in this embodiment is as follows:
the computing job controller receives unified job definition from a user and checks whether the job definition is legal or not; if the label is legal, a label to be scheduled is set.
The scheduler acquires the software and hardware information and the state of the virtual node according to the resources, the characteristic information and the limiting conditions required by the job, screens proper virtual nodes, binds the job load and the selected virtual nodes, and schedules the job to the corresponding virtual nodes; the compute job controller will set the status of the job to scheduled.
The selected virtual node invokes the under-job information of the job management interface of the plug-in driven workload management service, and the plug-in driven renders and runs the executable job script under the Slurm. The compute job controller will set the status of the job to issued.
The selected virtual node continuously monitors the status of the executed job load by invoking a job management interface of the plug-in driven workload management service, such as invoking a plug-in driven query Slurm job to monitor the job status, to synchronize the compute job controllers. Upon completion or error reporting, the job log and job results are synchronized to the container cluster. The computing job controller queries the job status of the virtual node and sets the job status.
When a log query request of a user is received, the computing job controller determines a log query request route, invokes a virtual node to perform log query, invokes a plug-in driving query Slurm log by the virtual node, and finally returns a log query result.
In one embodiment, the plug-in bus is deployed through a Kubernetes container cluster.
The Kubernetes is simply called k8s, is a container cluster management system, is a fact standard in the field of container arrangement and a key item in the field of cloud primordia, and can help users to construct application services crossing containers and continuously manage the health conditions of the containers for a long time. It is the most widely used container cluster management system in the general field. But there is no mature solution in the area of dedicated computing.
Accordingly, the computing job of the present embodiment may be a CRD (Custom Resource Definitions, custom resource definition) of Kubernetes, which includes information required for computing the job, such as resource information required for the job, job script directory, work directory, operation parameters, and the like. When a Kubernetes cluster is created, the computing job controller will check if its parameters are legal, set a label to be scheduled for it, and set the status of the computing job in queue. The computational job would then be taken over by the scheduler and the virtual node, but the computational job controller is responsible for interacting with the scheduler and the virtual node to synchronize job states. When a user queries a job log, the compute job controller is also responsible for routing log query requests onto the corresponding virtual nodes.
Correspondingly, the embodiment also comprises the step of creating a CRD which can be managed and scheduled in the Kubernetes and is used as a directly scheduled job object.
Accordingly, the virtual Node of the present embodiment may be a Kubernetes Node, which is configured to put the related process into a Kubernetes Pod running thereon to execute the workload. Each Kubernetes Node may be responsible for management by the control plane.
Accordingly, in this embodiment, the security certificate generated by the driving manager may be compatible with the certificate of Kubernetes, and provided to the Kubernetes cluster administrator, where the administrator operates on the corresponding computing clusters, so that exposure of information of all computing clusters may be avoided.
Further, the computing cluster of the present embodiment may include a Slurm job scheduling framework. Accordingly, the Partition of the computing cluster may be a Slurm Partition, i.e., a Slurm Partition, which may be the smallest unit of the Slurm computing cluster at the time of resource allocation.
In this embodiment, the drive controller, the virtual node controller, and the computation job controller may be located in the same controller manager program and deployed by Kubernetes Pod. The scheduler may be a separate program and deployed by Kubernetes Pod.
According to the heterogeneous job scheduling system provided by the embodiment, the scheme of the heterogeneous job scheduling system can be realized by adopting the Kubernetes and corresponding settings to deploy the plug-in buses.
In one embodiment, as shown in fig. 2, the cluster device plug-in status includes in preparation, running, failed, and deleted. In the cluster device plugin state from ready to deleted, the plugin driver may be to continuously provide workload management services.
Acquiring the registration information of the plug-in drivers and performing registration authentication on all the plug-in drivers according to the registration information can comprise the following steps:
and the plug-in driver calls a registration interface of the registration service of the driving manager and reports the port running per se, the supported computing resources, the characteristic information and the limiting conditions to the driving manager. The drive manager will set the status of the cluster device plug-in to ready. For example, the computing resource may be a hardware resource with an exclusive attribute in the Slurm, such as the number of CPUs and threads, the memory size, the number of GPUs, and the like, and the characteristic information may be an attribute that does not change in the Slurm partition desired by the user, such as a Slurm version, a CPU signal, a GPU signal, and the like. The constraint may be a variable attribute that the user wants that the Slurm cluster has, such as the number of available nodes, resource pool information, etc. The resource, the characteristic information and the limiting condition are collected by the scheduler and serve as the basis of scheduling.
The driver manager sends confirmation information to the cluster manager, the cluster manager authenticates the cluster device plug-in to be registered manually, then the driver manager carries out health check, namely health check, on the plug-in driver, and when the health check result is health information, a safety certificate is generated and issued to the plug-in driver.
After the authentication is successful, the plug-in state can be changed into the running state, and the plug-in driver is listed in a plug-in queue of the available cluster equipment.
In the running process, the drive manager continuously carries out health check on the plug-in drive, and when the health information fed back by the plug-in drive is unhealthy, the health information can be listed into a failure cluster plug-in queue according to a preset rule, and the plug-in state is changed into failure.
And when the number of times of the unhealthy state exceeds the preset number of times, judging that the retry is overtime, deleting the plug-in driver from the list, and changing the plug-in state to be deleted.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
The various controllers and schedulers in the heterogeneous job scheduling system described above may be implemented in whole or in part in software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
Based on the same inventive concept, the embodiment of the application also provides a heterogeneous job scheduling method for realizing the heterogeneous job scheduling system. The implementation of the solution to the problem provided by the method is similar to that described in the above system, so the specific limitation in the embodiment of the method for scheduling heterogeneous jobs provided below may be referred to the limitation of the heterogeneous job scheduling system hereinabove, and will not be described herein.
In one embodiment, as shown in fig. 5, there is provided a heterogeneous job scheduling method, the method comprising:
step S100, a driving controller acquires registration information of plug-in drivers, and carries out registration authentication on all the plug-in drivers according to the registration information;
step S200, a virtual node controller obtains software and hardware information of a corresponding computing cluster of the plug-in driver and a partition list of the computing cluster, distributes virtual nodes for each partition according to the partition list and the software and hardware information, and determines the software and hardware information corresponding to each virtual node;
Step S300, a job controller is calculated to acquire the job uploaded by a user and resources required by the job;
in step S400, the scheduler obtains the software and hardware information corresponding to each virtual node and the resources required by the job corresponding to each job, and determines the scheduling results of the virtual nodes and the jobs.
In one embodiment, the driver controller performs health detection on the plug-in driver for which registration authentication is completed; when the feedback information of the plug-in driver based on the health detection is health information, issuing a safety certificate for the plug-in driver, and setting the driving state of the plug-in driver to be in operation; and when the feedback information of the plug-in driver based on the health detection is timeout information or non-health information, setting the driving state of the plug-in driver as failure, and carrying out queue isolation on the plug-in driver.
In one embodiment, the driver controller performs health detection on all the plug-in drivers with a preset time as an interval, and deletes the plug-in driver when feedback information of the plug-in driver based on health detection is timeout information or non-health information and exceeds a preset number of times.
In one embodiment, the scheduler comprises a primary scheduler and the computing cluster comprises a secondary scheduler;
The primary scheduler acquires software and hardware information corresponding to each virtual node and resources required by the job corresponding to each job, and determines a first scheduling result of the virtual node and the job;
the secondary scheduler acquires software and hardware information corresponding to each virtual node and resources required by the job corresponding to each job, and determines a second scheduling result of the virtual node and the job;
and the primary scheduler generates a final scheduling result according to the first scheduling result and the second scheduling result.
In one embodiment, the virtual node controller is connected with the scheduler, and the virtual node controller obtains a scheduling result of the virtual node and the job, and issues the job to the plug-in driver corresponding to the virtual node according to the scheduling result, so that the plug-in driver corresponding to the computing cluster runs the job.
In one embodiment, the virtual node controller obtains job operation information of a corresponding computing cluster through the plug-in driver, where the job operation information includes: job log, job alert record, and job completion progress.
In one embodiment, the computing job controller is connected with the virtual node controller, and the computing job controller obtains a job query instruction of a user, and obtains the job running information of the corresponding job from the virtual node controller according to the job query instruction.
In one embodiment, the plug-in bus is deployed through a Kubernetes container cluster.
The user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric RandomAccess Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can take many forms, such as static Random access memory (Static Random Access Memory, SRAM) or Dynamic Random access memory (Dynamic Random AccessMemory, DRAM), among others. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (10)

1. A heterogeneous job scheduling system, the system comprising: a plug-in bus, a plug-in driver, and a plurality of computing clusters; each computing cluster is connected to the plug-in bus through the plug-in driver; the card bus includes: a drive controller, a virtual node controller, a scheduler, and a computational job controller;
the driving controller is connected with the plug-in driver and is used for acquiring registration information of the plug-in driver and carrying out registration authentication on all the plug-in drivers according to the registration information;
The virtual node controller is connected with the plug-in driver and is used for acquiring the software and hardware information of the corresponding computing cluster of the plug-in driver and the partition list of the computing cluster, distributing virtual nodes for each partition according to the partition list and the software and hardware information, and determining the software and hardware information corresponding to each virtual node;
the computing job controller is connected with the scheduler and is used for acquiring the jobs uploaded by the user and resources required by the jobs;
the scheduler is also connected with the virtual node controller; and the method is used for acquiring the software and hardware information corresponding to each virtual node and the resources required by the job corresponding to each job, and determining the scheduling results of the virtual nodes and the jobs.
2. The heterogeneous job scheduling system of claim 1, wherein,
the drive controller is also used for carrying out health detection on the plug-in drive with the registration authentication completed; when the feedback information of the plug-in driver based on the health detection is health information, issuing a safety certificate for the plug-in driver, and setting the driving state of the plug-in driver to be in operation; and when the feedback information of the plug-in driver based on the health detection is timeout information or non-health information, setting the driving state of the plug-in driver as failure, and carrying out queue isolation on the plug-in driver.
3. The heterogeneous job scheduling system of claim 2, wherein,
and the drive controller is also used for carrying out health detection on all the plug-in drivers by taking preset time as an interval, and deleting the plug-in drivers when the feedback information of the plug-in drivers based on the health detection is overtime information or non-health information and exceeds preset times.
4. The heterogeneous job scheduling system of claim 1, wherein the scheduler comprises a primary scheduler and the computing cluster comprises a secondary scheduler;
the primary scheduler is used for acquiring software and hardware information corresponding to each virtual node and resources required by the job corresponding to each job and determining a first scheduling result of the virtual node and the job;
the secondary scheduler is used for acquiring software and hardware information corresponding to each virtual node and resources required by the job corresponding to each job and determining a second scheduling result of the virtual node and the job;
the primary scheduler is further configured to generate a final scheduling result according to the first scheduling result and the second scheduling result.
5. The heterogeneous job scheduling system of claim 1, wherein,
The virtual node controller is connected with the scheduler and is used for acquiring a scheduling result of the virtual node and the job and issuing the job to the plug-in driver corresponding to the virtual node according to the scheduling result so as to enable the plug-in driver corresponding to the computing cluster to run the job.
6. The heterogeneous job scheduling system of claim 1, wherein,
the virtual node controller is further configured to obtain, through the plug-in driver, job operation information of a corresponding computing cluster, where the job operation information includes: job log, job alert record, and job completion progress.
7. The heterogeneous job scheduling system of claim 6, wherein,
the computing job controller is connected with the virtual node controller, and is further configured to obtain a job query instruction of a user, and obtain the job operation information of the corresponding job from the virtual node controller according to the job query instruction.
8. The heterogeneous job scheduling system of claim 1, wherein,
the plug-in bus is deployed through a Kubernetes container cluster.
9. A heterogeneous job scheduling method, wherein the method is applied to the heterogeneous job scheduling system of any one of claims 1 to 8, the method comprising:
The method comprises the steps that a driving controller obtains registration information of plug-in drivers, and all the plug-in drivers are registered and authenticated according to the registration information;
the virtual node controller obtains software and hardware information of the plug-in driver corresponding to the computing cluster and a partition list of the computing cluster, distributes virtual nodes for each partition according to the partition list and the software and hardware information, and determines the software and hardware information corresponding to each virtual node;
the method comprises the steps that a computing job controller obtains a job uploaded by a user and resources required by the job;
the dispatcher acquires software and hardware information corresponding to each virtual node and resources required by the job corresponding to each job, and determines dispatching results of the virtual nodes and the jobs.
10. The heterogeneous job scheduling method of claim 9, wherein,
the driver controller carries out health detection on the plug-in driver with the registered authentication completed; when the feedback information of the plug-in driver based on the health detection is health information, issuing a safety certificate for the plug-in driver, and setting the driving state of the plug-in driver to be in operation; and when the feedback information of the plug-in driver based on the health detection is timeout information or non-health information, setting the driving state of the plug-in driver as failure, and carrying out queue isolation on the plug-in driver.
CN202310962913.0A 2023-08-02 2023-08-02 Heterogeneous job scheduling system and method Active CN116661979B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310962913.0A CN116661979B (en) 2023-08-02 2023-08-02 Heterogeneous job scheduling system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310962913.0A CN116661979B (en) 2023-08-02 2023-08-02 Heterogeneous job scheduling system and method

Publications (2)

Publication Number Publication Date
CN116661979A true CN116661979A (en) 2023-08-29
CN116661979B CN116661979B (en) 2023-11-28

Family

ID=87714025

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310962913.0A Active CN116661979B (en) 2023-08-02 2023-08-02 Heterogeneous job scheduling system and method

Country Status (1)

Country Link
CN (1) CN116661979B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103297543A (en) * 2013-06-24 2013-09-11 浪潮电子信息产业股份有限公司 Job scheduling method based on computer cluster
CN108427604A (en) * 2018-02-06 2018-08-21 华为技术有限公司 Resource adjusting method, device and the cloud platform of cluster
CN113382074A (en) * 2021-06-10 2021-09-10 东南大学 Micro-service load balancing optimization method based on dynamic feedback
CN114741207A (en) * 2022-06-10 2022-07-12 之江实验室 GPU resource scheduling method and system based on multi-dimensional combination parallelism
CN115237547A (en) * 2022-09-21 2022-10-25 之江实验室 Unified container cluster hosting system and method for non-intrusive HPC computing cluster
CN115292026A (en) * 2022-10-10 2022-11-04 济南浪潮数据技术有限公司 Management method, device and equipment of container cluster and computer readable storage medium
CN115729924A (en) * 2022-12-13 2023-03-03 苏银凯基消费金融有限公司 Method for transmitting warehouse-counting mass data based on plug-in heterogeneous data source
WO2023109650A1 (en) * 2021-12-15 2023-06-22 深圳先进技术研究院 Micro-service-based intelligent space concurrent service process execution method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103297543A (en) * 2013-06-24 2013-09-11 浪潮电子信息产业股份有限公司 Job scheduling method based on computer cluster
CN108427604A (en) * 2018-02-06 2018-08-21 华为技术有限公司 Resource adjusting method, device and the cloud platform of cluster
CN113382074A (en) * 2021-06-10 2021-09-10 东南大学 Micro-service load balancing optimization method based on dynamic feedback
WO2023109650A1 (en) * 2021-12-15 2023-06-22 深圳先进技术研究院 Micro-service-based intelligent space concurrent service process execution method and system
CN114741207A (en) * 2022-06-10 2022-07-12 之江实验室 GPU resource scheduling method and system based on multi-dimensional combination parallelism
CN115237547A (en) * 2022-09-21 2022-10-25 之江实验室 Unified container cluster hosting system and method for non-intrusive HPC computing cluster
CN115292026A (en) * 2022-10-10 2022-11-04 济南浪潮数据技术有限公司 Management method, device and equipment of container cluster and computer readable storage medium
CN115729924A (en) * 2022-12-13 2023-03-03 苏银凯基消费金融有限公司 Method for transmitting warehouse-counting mass data based on plug-in heterogeneous data source

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
程振京;李海波;黄秋兰;程耀东;陈刚;: "高能物理云平台中的弹性计算资源管理机制", 计算机工程与应用, no. 08 *

Also Published As

Publication number Publication date
CN116661979B (en) 2023-11-28

Similar Documents

Publication Publication Date Title
US11652706B2 (en) System and method for providing dynamic provisioning within a compute environment
US11630704B2 (en) System and method for a workload management and scheduling module to manage access to a compute environment according to local and non-local user identity information
US20220075643A1 (en) Unified resource management for containers and virtual machines
CN110287019B (en) Resource control stack-based system for multi-domain representation of cloud computing resource control
US9176785B2 (en) System and method for providing multi-resource management support in a compute environment
CN109565515B (en) System, apparatus, and process for dynamic tenant fabric adjustment in a distributed resource management system
CN105049268B (en) Distributed computing resource distribution system and task processing method
US20200174844A1 (en) System and method for resource partitioning in distributed computing
CN104050042B (en) The resource allocation methods and device of ETL operations
US9800650B2 (en) Resource management for multiple desktop configurations for supporting virtual desktops of different user classes
US20190007410A1 (en) Quasi-agentless cloud resource management
CN104040485A (en) PAAS hierarchial scheduling and auto-scaling
Sun et al. Towards distributed machine learning in shared clusters: A dynamically-partitioned approach
CN104040486A (en) Decoupling PAAS resources, jobs, and scheduling
CA2831359A1 (en) System and method of co-allocating a reservation spanning different compute resources types
CN112114950A (en) Task scheduling method and device and cluster management system
CN111464659A (en) Node scheduling method, node pre-selection processing method, device, equipment and medium
US20060048161A1 (en) Resource allocation method and system
US9559914B1 (en) Computing instance placement
CN109992373B (en) Resource scheduling method, information management method and device and task deployment system
US8743387B2 (en) Grid computing system with virtual printer
US20220229695A1 (en) System and method for scheduling in a computing system
US11488115B1 (en) Efficient meeting room reservation and scheduling
CN116661979B (en) Heterogeneous job scheduling system and method
US20230155958A1 (en) Method for optimal resource selection based on available gpu resource analysis in large-scale container platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant