WO2024037173A1 - 一种调度器、作业调度方法及相关设备 - Google Patents

一种调度器、作业调度方法及相关设备 Download PDF

Info

Publication number
WO2024037173A1
WO2024037173A1 PCT/CN2023/101278 CN2023101278W WO2024037173A1 WO 2024037173 A1 WO2024037173 A1 WO 2024037173A1 CN 2023101278 W CN2023101278 W CN 2023101278W WO 2024037173 A1 WO2024037173 A1 WO 2024037173A1
Authority
WO
WIPO (PCT)
Prior art keywords
duration
job
cluster
target cluster
target
Prior art date
Application number
PCT/CN2023/101278
Other languages
English (en)
French (fr)
Inventor
丁肇辉
朱波
王飞
周风帆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024037173A1 publication Critical patent/WO2024037173A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of Internet technology, and in particular to a scheduler, a job scheduling method and related equipment.
  • a computing network usually includes multiple cross-regional data centers interconnected through a wide area network, which can flexibly schedule computing resources, storage resources and network resources in the computing network according to needs.
  • each data center will be equipped with a cluster scheduler, such as simple linux utility for resource management (SLURM) and load sharing facilities in high performance computing (HPC) scenarios.
  • SLURM resource management
  • HPC high performance computing
  • a cross-data center scheduler is deployed in the computing power network. This scheduler realizes resource scheduling across data centers by dynamically connecting with cluster schedulers in each data center.
  • the scheduler will schedule the user's job and the data required to process the job to a cluster in a data center for processing. During this process, if the waiting time from submission to execution of the user's job is too long, the user experience will be affected because the user's job has not been processed for a long time. Therefore, how to improve job processing efficiency and thereby improve user experience has become an important issue that needs to be solved urgently.
  • This application provides a scheduler to reduce the waiting time from job submission to execution, thereby improving job processing efficiency and thereby improving user experience.
  • this application also provides a job scheduling method, a computer-readable storage medium, and a computer program product.
  • this application provides a scheduler, which is used to manage multiple clusters and includes a management module and an agent module.
  • the management module is used to obtain pending jobs, for example, the user can use a terminal or Jobs submitted by the client, etc., and determine the target cluster for processing the job from multiple clusters, thereby delivering the job to the agent module; the agent module is used to instruct the target cluster to schedule resources for the job, and schedule resources for the job.
  • the target data is obtained during the job scheduling resource process, and the target data is the data needed to execute the job.
  • the target cluster can execute the process of transmitting target data and scheduling resources in parallel under the scheduling of the scheduler, which can effectively reduce the waiting time of the job from submission to execution, thereby improving the processing efficiency of the job, thereby improving the processing efficiency of the job. user experience.
  • it can also avoid the resources allocated to the job in the target cluster from being idle for a long time, thereby alleviating the problem of resource waste.
  • the management module when the management module determines the target cluster for processing the job from multiple clusters, it specifically estimates the first duration for each cluster in the multiple clusters to schedule resources for the job, And calculate the second duration for each cluster in the multiple clusters to obtain the target data, and then take the larger value of the first duration and the second duration as the waiting duration for the cluster to process the job, so that the waiting duration among the multiple clusters is The cluster with the smallest duration is used as the target cluster.
  • the management module helps reduce the time it takes to process the job by scheduling the job to the target cluster with the smallest waiting time, that is, it can further improve the efficiency of processing the job.
  • the management module can also schedule jobs to the cluster with the smallest load for processing based on the load of each cluster, thereby achieving load balancing in the computing power network.
  • the management module is also used to estimate a first duration for the target cluster to schedule resources for the job, calculate a second duration for the target cluster to obtain the target data, and combine the first duration and the second duration Sent to the agent module, in this way, the agent module instructs the target cluster to schedule resources for the job, and obtains the target data during the process of scheduling resources for the job.
  • the agent module instructs the target cluster to schedule resources for the job, and obtains the target data during the process of scheduling resources for the job.
  • the first duration is greater than the second duration (that is, the consumption of resource scheduling time is greater than the time it takes to obtain the data)
  • the target cluster can execute the process of resource scheduling and obtaining target data in parallel, so that after the target cluster completes resource scheduling, the scheduled resources can be used to execute the job, thereby improving the processing efficiency of the business.
  • the agent module can also estimate the first time period for the target cluster to schedule resources for the job, and calculate the second time period for the target cluster to obtain the target data, so as to obtain the target data according to the second time period.
  • the first duration and the second duration instruct the target cluster to obtain target data in the process of scheduling resources for the job.
  • the management module is also used to estimate the first duration for the target cluster to schedule resources for the job, and calculate the target The cluster obtains the second duration of the target data, and sends the first duration and the second duration to the agent module.
  • the agent module instructs the target cluster to schedule resources for the job, and obtains the target data during the process of scheduling resources for the job. , specifically, when the first duration is not greater than the second duration, instruct the target cluster to start acquiring target data at the first moment before scheduling resources for the job, and instruct the target cluster to schedule resources for the job at the second moment, where, The duration of the interval between one moment and the second moment is the second duration minus the first duration.
  • the target cluster can first perform the process of acquiring data, and perform the process of resource scheduling during the acquisition process. This allows the target cluster to end the process of resource scheduling and data acquisition at the same time or at a similar time, so as to utilize the scheduled resources. Executing this job not only improves the efficiency of job processing, but also avoids the waste of resources scheduled by the target cluster for the job due to waiting for the target cluster to obtain the target data.
  • the management module may specifically determine the target category to which the job belongs, and calculate the duration of the historical job scheduling resource of the target cluster as the target category. average duration, and obtain the current available resource ratio and historical available resource ratio of the target cluster, where the current available resource ratio is the ratio of the current available resource amount of the target cluster to the total resource amount, and the historical available resource ratio is The resource proportion is the ratio of the available resources of the target cluster in the past period to the total resources.
  • the management module estimates based on the average duration, the current available resource proportion and the historical available resource proportion.
  • the target cluster schedules resources for the first duration of the job.
  • the management module can use the time it takes for each cluster to schedule resources for the same category of jobs over a period of time, and estimate the time it takes for the cluster to schedule resources for the job submitted by the current user, thereby improving the accuracy of the estimate and reliability.
  • the proportion of currently available resources when the available resources include one kind of resource, such as computing resources, the proportion of currently available resources, specifically, it can be the available processors (or processor cores) in the target cluster.
  • the proportion of currently available resources can be calculated based on the amount of each type of available resources and the amount of total resources of that type. The ratios can be calculated by weighted summation, or can be calculated in other ways.
  • the management module can count the amount of resources available when the target cluster allocates resources to historical jobs at multiple times in the past period. Then, the management module can calculate the amount of resources available at the multiple times. The average value of the corresponding available resources can be used to further calculate the proportion of the average value of the available resources relative to the total resources of the target cluster. Alternatively, the management module may also use the ratio of the amount of available resources to the total resources of the target cluster when the target cluster allocates resources to historical jobs at a certain time in the past as the proportion of historical available resources.
  • the target category to which the job submitted by the user belongs is based on one of the resource application category, the number of resource applications, the amount of job dependent data, the application to which the job belongs, the priority of the job, the queue to which the job is located, and the calculation case to which the job belongs. Specify one or more species. Furthermore, the target category may also be determined based on the job owner, where the job owner is the subject to which the job belongs, which may be a tenant or a user.
  • the agent module is also configured to monitor the third duration for the target cluster to schedule resources for the job, and instruct the management module to send an adjustment instruction to the target cluster based on the third duration and the estimated first duration.
  • this adjustment instruction is used to instruct the network controller to adjust the network bandwidth of the target cluster, or adjust the number of network channels of the target cluster. In this way, when the actual time for the target cluster to schedule resources for the job is less than the estimated time, the network bandwidth of the target cluster or the number of network channels can be adjusted so that the target cluster can end resource scheduling and target data at the same time or at a similar time. The process of acquisition.
  • the management module sends an adjustment instruction to the network controller to instruct the network controller to increase the network bandwidth of the target cluster or increase the number of network channels of the target cluster.
  • the rate at which the target cluster obtains target data can be increased, thereby reducing the time it takes for the target cluster to obtain target data. This can not only further speed up the processing efficiency of the job, but also avoid the target cluster from scheduling resources for the job because it takes a long time. Waiting for the target cluster to obtain the target data causes a waste of resources.
  • the management module sends an adjustment instruction to the network controller to instruct the network controller to reduce the network bandwidth of the target cluster or reduce the number of network channels of the target cluster.
  • the target cluster can also complete the reception of the target data at a similar time, thereby reducing the network bandwidth of the target cluster while avoiding idle resources allocated by the target cluster for the job. Bandwidth resources in computing power network consumption.
  • the management module and the agent module can be deployed on a computing device, and the computing device is connected to multiple clusters to manage the multiple clusters.
  • the management module is deployed on the computing device, and the agent module is deployed on the target cluster, so that the management module can control the target cluster to execute the process of obtaining data and resource scheduling in parallel through the agent module on the target cluster.
  • this application provides a job scheduling method, which is applied to a scheduler.
  • the scheduler includes a management module and an agent module.
  • the management module obtains the jobs to be processed and determines the jobs to be processed from multiple clusters. Process the target cluster of the job and deliver the job to the agent module; the agent module instructs the target cluster to schedule resources for the job, and obtains target data in the process of scheduling resources for the job, where the target data is the data needed to execute the job.
  • the management module determines a target cluster for processing the job from multiple clusters, including: the management module estimates the first duration of scheduling resources for the job for each cluster in the multiple clusters, and calculates The second duration for each cluster in the plurality of clusters to obtain the target data; then, the management module takes the larger value of the first duration and the second duration as the waiting duration of the cluster processing job, and calculates the waiting duration of the multiple clusters.
  • the smallest cluster is used as the target cluster.
  • the management module can also schedule jobs to the cluster with the smallest load for processing based on the load of each cluster, thereby achieving load balancing in the computing power network.
  • the management module before the agent module instructs the target cluster to obtain target data in the process of scheduling resources for the job, the management module also estimates the first duration of time for the target cluster to schedule resources for the job, and calculates the target cluster's acquisition target The second duration of the data, and sends the first duration and the second duration to the agent module; in this way, when the agent module instructs the target cluster to obtain the target data in the process of scheduling resources for the job, specifically, it can be when the first duration is longer than the second duration. time, the agent module instructs the target cluster to schedule resources for the job, and obtains the target data in the process of scheduling resources for the job.
  • the agent module can also estimate the first time period for the target cluster to schedule resources for the job, and calculate the second time period for the target cluster to obtain the target data, so as to obtain the target data according to the second time period.
  • the first duration and the second duration instruct the target cluster to obtain target data in the process of scheduling resources for the job.
  • the management module before the agent module instructs the target cluster to obtain target data in the process of scheduling resources for the job, the management module also estimates the first duration of time for the target cluster to schedule resources for the job, and calculates the target cluster's acquisition target The second duration of the data, and sends the first duration and the second duration to the agent module; in this way, when the agent module instructs the target cluster to obtain the target data in the process of scheduling resources for the job, specifically, it can be when the first duration is not greater than the second duration.
  • the target cluster is instructed to start acquiring target data at the first moment before instructing the target cluster to schedule resources for the job, and instructs the target cluster to schedule resources for the job at the second moment.
  • the interval between the first moment and the second moment is The duration is the second duration minus the first duration.
  • the management module when estimating the first duration of the target cluster's job scheduling resources, may specifically determine the target category to which the job belongs, and calculate the average of historical job scheduling resources for the target cluster's target category. duration, and then obtain the current available resource ratio and the historical available resource ratio of the target cluster.
  • the current available resource ratio is the ratio of the current available resource amount of the target cluster to the total resource amount.
  • the historical available resource ratio is The ratio of the amount of available resources in the target cluster to the total amount of resources in the past period, so that the management module estimates the target cluster's job scheduling resources based on the average duration, the proportion of current available resources, and the proportion of historical available resources. of the first duration.
  • the proportion of currently available resources when the available resources include one kind of resource, such as computing resources, the proportion of currently available resources, specifically, it can be the available processors (or processor cores) in the target cluster.
  • the proportion of currently available resources can be calculated based on the amount of each type of available resources and the amount of total resources of that type. The ratios can be calculated by weighted summation, or can be calculated in other ways.
  • the management module can count the amount of resources available when the target cluster allocates resources to historical jobs at multiple times in the past period. Then, the management module can calculate the amount of resources available at the multiple times. The average value of the corresponding available resources can be used to further calculate the proportion of the average value of the available resources relative to the total resources of the target cluster. Alternatively, the management module may also use the ratio of the amount of available resources to the total resources of the target cluster when the target cluster allocates resources to historical jobs at a certain time in the past as the proportion of historical available resources.
  • the target category is determined based on the resource application category, resource application quantity, job dependency data amount, job location Determine one or more of the application, job priority, queue where the job is located, and the calculation case to which the job belongs.
  • the target category can also be determined based on the job owner.
  • the agent module can also monitor the third duration for the target cluster to schedule resources for the job; and instruct the management module to send an adjustment instruction to the network controller of the target cluster according to the first duration and the third duration, and adjust the instruction. Used to instruct the network controller to adjust the network bandwidth of the target cluster or adjust the number of network channels of the target cluster.
  • the adjustment instruction is used to instruct the network controller to increase the network of the target cluster. bandwidth, or increase the number of network channels for the target cluster.
  • the adjustment instruction is used to instruct the network controller to reduce the network of the target cluster. bandwidth, or reduce the number of network channels for the target cluster.
  • the management module and the agent module are deployed on a computing device, and the computing device is connected to multiple clusters.
  • the management module is deployed on the computing device, and the agent module is deployed on the target cluster.
  • the job scheduling method provided in the second aspect corresponds to the scheduler provided in the first aspect. Therefore, the technical effects of the job scheduling method in the second aspect and any possible implementation of the second aspect can be referred to the first aspect and The technical effects of the corresponding implementation methods in the first aspect will not be described in detail here.
  • this application provides a scheduler, which includes a processor and a memory; the memory is used to store instructions, and when the scheduler runs, the processor executes the instructions stored in the memory, so that the The scheduler performs the steps performed by the management module and the agent module in the above second aspect or any possible implementation of the second aspect.
  • the memory can be integrated into the processor or independent of the processor.
  • the scheduler may also include a bus. Among them, the processor is connected to the memory through a bus.
  • the memory may include readable memory and random access memory.
  • the present application provides a scheduler, which includes a processor and a memory; the memory is used to store instructions, and when the scheduler runs, the processor executes the instructions stored in the memory so that the The scheduler performs the steps performed by the management module in the above second aspect or any possible implementation manner of the second aspect.
  • the memory can be integrated into the processor or independent of the processor.
  • the scheduler may also include a bus. Among them, the processor is connected to the memory through a bus.
  • the memory may include readable memory and random access memory.
  • this application provides a scheduler, which includes a processor and a memory; the memory is used to store instructions, and when the scheduler runs, the processor executes the instructions stored in the memory, so that the The scheduler performs the steps performed by the agent module in the above second aspect or any possible implementation manner of the second aspect.
  • the memory can be integrated into the processor or independent of the processor.
  • the scheduler may also include a bus. Among them, the processor is connected to the memory through a bus.
  • the memory may include readable memory and random access memory.
  • the present application provides a computer-readable storage medium that stores instructions that, when run on a scheduler, cause the scheduler to execute the above-mentioned second aspect or any of the second aspects.
  • the present application provides a computer program product containing instructions that, when run on a scheduler, cause the scheduler to execute the method described in the above second aspect or any implementation of the second aspect.
  • Figure 1 is a schematic diagram of an exemplary computing power network provided by an embodiment of this application.
  • Figure 2 is a schematic diagram of another exemplary computing power network provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of another exemplary computing power network provided by the embodiment of this application.
  • Figure 4 is a schematic flowchart of a job scheduling method provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of another job scheduling method provided by an embodiment of the present application.
  • Figure 6 is a schematic flowchart of a network bandwidth adjustment method provided by an embodiment of the present application.
  • Figure 7 is a schematic structural diagram of a scheduler provided by an embodiment of the present application.
  • Figure 8 is a schematic structural diagram of a scheduler provided by an embodiment of the present application.
  • a centralized networking mode can be used to construct a computing power network between multiple clusters.
  • cluster A, cluster B, and cluster C have a strong management and control relationship, that is, clusters A performs strong management and control on cluster B and cluster C respectively.
  • multiple clusters can use a decentralized networking mode to construct a computing power network, where cluster X, cluster Y, and cluster Z have a peer-to-peer relationship with each other.
  • multiple clusters can use a networking model that mixes strong management and control relationships with peer-to-peer relationships to build a computing power network.
  • the computing power network includes multiple regions, where , multiple clusters in each partition have a strong management and control relationship, and clusters in different partitions have a peer relationship with each other.
  • cross-cluster schedulers and cluster schedulers deployed in each cluster are usually used to implement cross-cluster resource scheduling (and data scheduling).
  • resource scheduling in the computing power network 100 shown in Figure 1 as an example (other types of computing power networks are similar)
  • one or more devices may be configured in cluster B and cluster C respectively.
  • the computing power network 100 also includes a scheduler, which includes a management module 101 and a plurality of agent modules 102 .
  • the agent module 102 is deployed on cluster A to cluster C respectively, as shown in Figure 1.
  • the management module 101 can be deployed on a computing device in cluster A, or can be deployed independently on the computing power network 100.
  • Computing devices for each cluster are usually used to implement cross-cluster resource scheduling (and data scheduling).
  • the management module 101 and multiple agent modules 102 can be deployed on the same computing device, such as a computing device that is independent of each cluster.
  • the computing device is connected to multiple clusters, so that the scheduler can perform different tasks based on the connection. Schedule jobs for each cluster.
  • the management module 101 in the scheduler can schedule the job to cluster B whose available resources meet the resources required to process the job (or can schedule the job to Cluster A and Cluster C), specifically, the job can be scheduled first to the agent module 102 on cluster B, and then the agent module 102 instructs cluster B to schedule corresponding resources (such as computing resources, storage resources, network resources, etc.) for the job. ), where cluster B can use the internally configured cluster scheduler to schedule resources for the job.
  • the data required to execute the job (hereinafter referred to as target data) may be stored in cluster C. Therefore, the scheduler can also instruct cluster C to transmit the target data to cluster B.
  • the target data can be uploaded to the scheduler by the user 201, so that the scheduler can deliver the target data to cluster B.
  • the scheduler can send the storage address of the target data to cluster B, such as a uniform resource locator (URL), etc., so that cluster B can access cluster C according to the received storage address in order to obtain it from cluster C. target data.
  • URL uniform resource locator
  • cluster B first queues up resources for the job and then obtains the target data from cluster C or the scheduler
  • the time it takes from job submission to execution is mainly the time it takes for the job to queue up and allocate resources in cluster B.
  • the sum of the waiting time and the transmission time of target data between cluster C and cluster B results in a long waiting time from submission to execution of user 201's job, which is affected by the fact that user 201's job has not been executed for a long time. user experience.
  • the agent module 102 instructs the target cluster to be the job.
  • Schedule resources, and in the process of scheduling resources for the job obtain the target data required to execute the job.
  • the waiting time from job submission to execution can be effectively reduced, thereby improving job processing efficiency and thus improving user experience.
  • it can also prevent the resources allocated to the job by cluster B from being idle for a long time, thereby mitigating the problem of resource waste.
  • the above implementation example takes the deployment of the management module 101 and the agent module 102 in the computing power network shown in Figure 1 as an example.
  • a corresponding scheduler can be deployed, and the scheduler can be used to schedule jobs to the corresponding cluster for execution in the computing power network.
  • the computing power network shown in FIGS. 1 to 3 above is only used as an exemplary illustration, in which the management module 101 in the scheduler and The agent module 102 is deployed in different locations.
  • the management module 101 and the agent module 102 can be integrated and deployed, such as integrated and deployed on a computing device, and the computing device connects multiple computers through a wired network or a wireless network.
  • Cluster, different agent modules 102 in the scheduler can be responsible for scheduling jobs in different clusters.
  • different clusters in each partition may have a peer relationship with each other, and different partitions may have a strong management and control relationship, etc. This embodiment does not limit this.
  • FIG 4 a schematic flow chart of a job scheduling method provided by an embodiment of the present application is shown. This method can be applied to the computing power network shown in Figures 1 to 3, or can be used in other computing power networks. As shown in Figure 4, the method may specifically include:
  • S401 The management module 101 in the scheduler obtains the jobs to be processed.
  • users can submit jobs to the computing network and request the computing network to process the job.
  • the user can remotely log in to the scheduler through a terminal or a client provided externally by the management module 101, and submit a job on the terminal or client.
  • the terminal or client can generate a data processing request, and the data processing request includes the data submitted by the user.
  • the terminal or client can send the job to be processed to the management module 101; accordingly, the management module 101 can obtain the job submitted by the user by parsing the received data processing request.
  • the job submitted by the user may be an HPC job, an AI model training job, etc.
  • the computing power network receives the job through the scheduler, and uses the scheduler to schedule the job to the corresponding cluster in the computing power network for processing.
  • the user when submitting a job, the user can also specify the resources required to process the job, including resource types, resource specifications, etc.
  • the resources specified by the user may include one or more of computing resources, storage resources, and network resources.
  • computing resources can be central processing unit (CPU), data processing unit (DPU), infrastructure processing unit (IPU) and other resources with computing power; storage resources, It can be resources with storage capabilities such as cache, memory, and storage devices; network resources can be, for example, uplink bandwidth, downlink bandwidth, network card type, and other network transmission resources.
  • the scheduler can also automatically determine the resources required to process the job based on the type of job, the amount of data the job depends on, and other information, which is not limited in this embodiment.
  • the management module 101 determines a target cluster for processing the job from multiple clusters.
  • the computing power network may include multiple clusters, management modules 101 and agent modules 102, for example, as shown in Figures 1 to 3, and each cluster may include a data center device (such as a computing device, etc.).
  • a cluster can also be built based on multiple data centers. For example, when the physical distance between two data centers does not exceed the preset range, a cluster can be built based on the two data centers. Jobs submitted to the computing power network can be executed by the cluster in the computing power network. Therefore, after obtaining the job submitted by the user, the scheduler can determine the cluster used to process the job from multiple clusters (hereinafter referred to as the target cluster for ease of differentiation).
  • this embodiment provides the following implementation methods for determining the target cluster.
  • the management module 101 may determine the target cluster based on the load balancing policy. Specifically, the management module 101 can obtain the current load of each cluster, or predict the load of the cluster in a future period of time (such as the next 2 seconds, etc.), and determine the cluster with the smallest load as the target cluster for processing the job submitted by the user. .
  • the management module 101 estimates the waiting time for each of the multiple clusters to process the job, where the waiting time corresponding to each cluster is based on the estimated job submitted by the cluster for the user.
  • the first duration for allocating resources and the second duration for the cluster to receive the target data on which the job depends are determined, so that the management module 101 can determine the cluster with the smallest waiting duration among multiple clusters as the target cluster that actually processes the job.
  • each cluster may receive multiple jobs, and thus multiple jobs are typically queued waiting for the cluster to allocate the resources required to process the job. Therefore, the first duration for the cluster to schedule resources for a job submitted by the user is the sum of the queuing duration for the job and the duration for the cluster to allocate resources to the job.
  • the management module 101 can perform step S4021, specifically based on the time it takes for each cluster to schedule resources for jobs (hereinafter referred to as historical jobs) in the past time period, it is estimated that the cluster will be submitted by the user.
  • the first duration of the job scheduling resource is the sum of the queuing duration for the job and the duration for the cluster to allocate resources to the job.
  • the target data relied on to execute the job may be stored only in some clusters of the computing power network, which makes it usually impossible to store the target data when the cluster processing the job does not store the target data.
  • the target data needs to be obtained from other clusters, which may be, for example, the cluster closest to the cluster and which stores the target data; or, when submitting a job, the user also uploads the targets required to execute the job to the management module 101 data, so that the management module 101 can send the target data uploaded by the user to The cluster; or, the management module 101 can send the storage address of the target data in the computing power network to the cluster, so that the cluster can obtain the target data from the computing power network based on the storage address.
  • the management module 101 can execute step S4022, specifically calculating the second duration for each cluster to obtain the target data.
  • the second duration can be, for example, the ratio of the data volume of the target data to the transmission bandwidth.
  • the corresponding second duration is 0, that is, there is no need to perform the data transmission process.
  • the second duration corresponding to the cluster is the time taken to transmit the target data between the two clusters and the user's job queued up on the cluster to wait for the data to be transmitted. The sum of the time taken.
  • the management module 101 may perform steps S4023 and S4024 to determine the target cluster. Specifically, for each cluster, the management module 101 can compare the first duration and the second duration corresponding to the cluster, and determine the larger value thereof as the length of time that the job needs to wait to be processed when it is delivered to the cluster. , that is, the above waiting time. In this way, the management module 101 can determine the target cluster with the smallest job waiting time from multiple clusters based on the waiting time corresponding to each cluster. For example, the management module 101 can sort the waiting time corresponding to each cluster in ascending order, and determine the waiting time with the smallest value, so that the cluster corresponding to the waiting time with the smallest value can be determined as the target cluster.
  • the management module 101 determines In the process of selecting the target cluster, you can first filter the clusters whose remaining available resources in the computing network do not meet the resource conditions, and further determine the target cluster from the multiple clusters obtained after filtering. In this way, the amount of calculation required by the management module 101 to determine the target cluster can be reduced.
  • the management module 101 may use the cluster's performance in the past period (such as the past Determine the length of time to allocate resources to historical operations within one month, etc.).
  • the management module 101 can pre-divide the jobs in the computing network into multiple categories. For example, it can be based on the resource application category corresponding to the job, the number of resource requests, the amount of job dependent data, the application to which the job belongs, the job priority, the job Classify jobs by using one or more information from the owner (such as the user or tenant to which the job belongs), the queue where the job is located, and the calculation case to which the job belongs.
  • the categories of operations in the computing power network can be configured in the management module 101 in advance by technical personnel.
  • the management module 101 can calculate the average length of time that the cluster has dispatched resources for multiple historical jobs under each category in the past period of time (if the number of historical jobs under a certain category is one, then the average duration is the duration that the cluster schedules resources for this historical job). In this way, the management module 101 can calculate the average duration for each cluster to schedule resources for each category of historical jobs.
  • the management module 101 also obtains the current available resource proportion and the historical available resource proportion of the historical jobs under each category on the cluster, where the current available resource proportion refers to the cluster
  • the ratio of the current available resources to the total resources of the cluster For example, when the available resources include one kind of resource, such as computing resources, the proportion of the currently available resources, which can be the number of processors (or processor cores) available in the cluster and the total number of processors (or processors) total number of cores). When available resources include multiple resources, such as computing resources, storage resources, and network resources, the proportion of currently available resources can be calculated based on the amount of each type of available resources and the amount of total resources of that type. The ratios can be obtained by weighted summation, or can be calculated in other ways, which is not limited in this embodiment.
  • the proportion of historical available resources refers to the average amount of resources available when the cluster allocates resources to multiple historical jobs under each category during a period of time before scheduling resources for jobs submitted by users (hereinafter referred to as the average The amount of available resources) relative to the total resources of the cluster.
  • the management module 101 can count the amount of available resources when the cluster allocates resources to historical jobs at multiple times in the past period. Then, the management module 101 can calculate the amount of available resources according to the amount of available resources corresponding to the multiple times. , calculate the average amount of available resources, by which the proportion of the average amount of available resources relative to the total resources of the cluster can be further calculated.
  • the proportion of historical available resources may also refer to the relative amount of available resources when the cluster allocates resources to multiple historical jobs under each category at a certain time before scheduling resources for the jobs submitted by the user. This embodiment does not limit the proportion of resources to the total resources of the cluster.
  • the management module 101 can calculate the proportion of current available resources and the proportion of historical available resources corresponding to various historical jobs on each cluster based on the resource usage of each cluster.
  • the management module 101 can determine, for each cluster, the average duration of historical job scheduling resources for the target category according to the target category to which the job belongs, so as to determine the average duration of historical job scheduling resources for the target category according to the target category.
  • the corresponding average duration, proportion of currently available resources, and proportion of historical available resources are estimated to be the first length of time that the cluster will wait in queue for allocating resources for jobs submitted by users.
  • the management module 101 may use the following formula (1) to calculate the first duration.
  • Tq is the estimated first duration for the cluster to schedule resources for jobs submitted by users
  • Tq x is the average duration for the cluster to allocate resources for historical jobs of the target category
  • RP x is the proportion of historical available resources
  • RP current is the currently available resources. proportion.
  • the management module 101 can determine the first duration corresponding to each cluster for various types of job scheduling resources.
  • the management module 101 may also use other methods to estimate the first duration. For example, after the scheduler determines how long the target cluster has allocated resources to multiple historical jobs in the past period, it can use the median of the corresponding durations of the multiple historical jobs, as well as the above-mentioned proportion of currently available resources and historical available resources.
  • the estimated resource ratio of the target cluster is the first duration of job scheduling resources.
  • the management module 101 chooses to calculate the median or the average to calculate the first duration based on the durations corresponding to multiple historical jobs. This may be determined in advance by a technician based on the configuration of the management module 101 .
  • the management module 101 delivers the job to the agent module 102 corresponding to the target cluster.
  • the management module 101 can send an execution instruction to the agent module 102 corresponding to the target cluster.
  • the execution instruction includes a job submitted by the user to instruct the agent module 102 to control the target cluster to execute the job.
  • the management module 101 can also send the target data to the target cluster.
  • the target data can be uploaded by a user, for example, so that the management module 101 determines After selecting the target cluster, deliver the job and target data to the target cluster together.
  • the target data may be stored in other clusters in the computing power network.
  • the management module 101 may instruct the target cluster to request the target data from other clusters that store the target data, or the management module 101 may send the target data to the other clusters. Data sharing request to instruct other clusters to share target data to the target cluster.
  • the agent module 102 instructs the target cluster to schedule resources for the job, and obtains target data during the process of scheduling resources for the job.
  • the target data is the data needed to execute the job.
  • the agent module 102 may instruct the target cluster to obtain target data in the process of scheduling resources for the job. In this way, the target cluster can achieve accelerated job processing by executing in parallel the process of obtaining target data and scheduling resources for the job.
  • the agent module 102 can send the job to the cluster scheduler of the target cluster, so as to Triggers the cluster scheduler to schedule resources for the job.
  • the agent module 102 can trigger the cluster scheduler of the target cluster to schedule resources for the job.
  • the agent module 102 can perform step S4041, specifically receiving the estimated first duration Tq of the target cluster for job scheduling resources sent by the management module 101, and the second duration Tq of the target cluster to obtain the target data. Tt.
  • the management module 101 may, after determining the target cluster, add the calculated first duration Tq and the second duration corresponding to the target cluster.
  • the second duration Tt is sent to the agent module 102 corresponding to the target cluster.
  • the agent module 102 may execute step S4042, step S4043, and step S4044. Specifically, the agent module 102 compares the size between the first duration Tq and the second duration Tt. Furthermore, when the first duration Tq is greater than the second duration Tt, the agent module 102 instructs the target cluster to obtain target data in the process of scheduling resources for the job.
  • the agent module 102 can submit the job to the cluster scheduler inside the target cluster to trigger the cluster scheduler to start allocating resources in the target cluster to the job, and the agent module 102 can instruct the target cluster to obtain the target data, For example, receiving target data sent by the management module 101 or other clusters. In this way, during the process of scheduling resources for the job, the target cluster simultaneously performs the operation of obtaining target data, thereby realizing parallel processing of resource scheduling and data acquisition, so as to speed up job processing efficiency.
  • the agent module 102 can calculate the difference between the first time Tq and the second time Tt, and determine the time 1 and the instruction to trigger the target cluster to schedule resources for the job based on the difference.
  • Time 2 when the target cluster obtains the target data is earlier than time 1, and the length of time between time 2 and time 1 is the difference. Then, the agent module 102 may instruct the target cluster to start acquiring the target data at time 2, and during the process of the target cluster receiving the target data, when time 1 is reached, the agent module 102 Then instruct the target cluster to schedule resources for the job.
  • the target cluster simultaneously performs the operation of obtaining target data, thereby realizing parallel processing of resource scheduling and data acquisition, so as to speed up job processing efficiency. Moreover, the target cluster first starts the process of obtaining target data, and then starts scheduling resources for the job after receiving the target data for a period of time. This makes the target cluster complete the data acquisition and resource scheduling at a similar time, thus preventing the target cluster from being After the job schedules resources, resources are wasted because they are in a waiting state.
  • the management module 101 calculates the first duration Tq and the second duration Tt and sends them to the agent module 102 as an example for illustration.
  • the agent module 102 can Execute the calculation process of estimating the first time period Tq for the target cluster to schedule resources for the job, and the second time period Tt for the target cluster to obtain the target data. For example, when the management module 101 determines the target cluster based on the current load of each cluster (the management module 101 does not calculate the first duration and the second duration corresponding to each cluster), at this time, the agent module 102 can receive the After the job, it is estimated that the target cluster schedules resources for the job for the first duration Tq, and the target cluster obtains the target data for the second duration Tt.
  • the specific implementation process can be found in the above-mentioned relevant descriptions, and will not be described in detail here.
  • the agent module 102 instructs the target cluster to execute the process of obtaining target data and scheduling resources for the job in parallel, which can effectively reduce the waiting time of the job from submission to execution, thereby improving the processing efficiency of the job. Can improve user experience.
  • the process of the scheduler determining the target cluster and instructing the target cluster to obtain data for parallel execution of user jobs and schedule resources for the jobs is mainly introduced.
  • the first duration Tq of the target cluster for job scheduling resources estimated by the management module 101 may be significantly different from the actual duration of the target cluster for scheduling resources for the job, such as queuing on the target cluster
  • Some jobs waiting for resource allocation may be suspended, or the target cluster may schedule resources for jobs with higher priorities first, etc.
  • the scheduler can synchronously adjust the time it takes for the target cluster to receive target data by adjusting the network bandwidth for the target cluster to receive target data.
  • Figure 5 shows a schematic flow chart of another job scheduling method in an embodiment of the present application. As shown in Figure 5, the method may specifically include:
  • S501 The management module 101 obtains the target job to be processed.
  • the management module 101 estimates the first duration of job scheduling resources submitted by the user for each cluster in the power network.
  • S503 The management module 101 calculates the second duration for each cluster to obtain target data.
  • the management module 101 takes the larger value of the first duration and the second duration corresponding to each cluster as the waiting duration for the cluster to process the job.
  • the management module 101 uses the cluster with the smallest waiting time among the multiple clusters as the target cluster.
  • step S501 to step S505 please refer to the relevant description of step S401 to step S402 in the embodiment shown in FIG. 4, which will not be described again in this embodiment.
  • the management module 101 delivers the job to the agent module 102 corresponding to the target cluster.
  • the agent module 102 obtains the first duration Tq for the target cluster to schedule resources for the job, and the second duration Tt for the target cluster to obtain target data.
  • the management module 101 since the management module 101 calculates the first duration and the second duration corresponding to each cluster during the execution of steps S502 and S503, the management module 101 can calculate the first duration corresponding to the target cluster.
  • the first duration Tq and the second duration Tt are delivered (for example, delivered together with the job) to the agent module 102 .
  • the agent module 102 can calculate the first time period Tq for the target cluster to schedule resources for the job and the second time period Tt for the target cluster to obtain the target data, where the agent module 102 calculates the first time period Tq and the second time period Tt.
  • the second duration Tt please refer to the description of the management module 101 calculating the first duration and the second duration corresponding to each cluster in the previous embodiment, and will not be described again here.
  • the agent module 102 compares the size between the first duration Tq and the second duration Tt, and when the first duration Tq is greater than the second duration Tt, continue to execute step S509; when the first duration Tq is not greater than the second duration Tt , continue to execute step S510.
  • the agent module 102 instructs the target cluster to schedule resources for the job, and obtains target data in the process of scheduling resources for the job.
  • S510 The agent module 102 instructs the target cluster to start acquiring target data at time 2, and when reaching time 1, the agent module 102 instructs the target cluster again to obtain target data. Indicates that the target cluster schedules resources for the job. Wherein, time 2 is earlier than time 1, and the duration of the interval between time 2 and time 1 is the difference between the first duration Tq and the second duration Tt.
  • the agent module 102 monitors the target cluster for the third time period Tq' of scheduling resources for the job.
  • S512 Determine whether the difference between the first duration Tq and the third duration Tq' is less than the preset value. If yes, continue to execute step S513. If not, no processing is performed.
  • the agent module 102 instructs the management module 101 to send an adjustment instruction to the network controller of the target cluster.
  • the adjustment instruction is used to instruct the network controller to adjust the network bandwidth of the target cluster or adjust the number of network channels of the target cluster.
  • the scheduler can adjust the target cluster to receive
  • the network bandwidth of the target data is used to synchronously adjust the time it takes for the target cluster to receive the target data, so that the target cluster can end resource scheduling and target data acquisition at the same time or at a similar time.
  • the agent module 102 can monitor in real time the third duration Tq' of the scheduling resources submitted by the target cluster for the user, and when there is a large difference between the third duration Tq' and the estimated first duration Tq (specifically, it can be The difference between the first duration Tq and the third duration Tq' is greater than the preset value), which indicates that the time it takes for the target cluster to schedule resources for the job is greatly shortened relative to the estimated time.
  • Receiving target data based on the current bandwidth of the target cluster may cause the target cluster to wait for the target cluster to receive the target data before it can start executing the job after scheduling resources for the job. This may cause the scheduled job to be idle for a long time. causing a waste of resources.
  • the network bandwidth of the target cluster can be increased.
  • the agent module 102 can send a bandwidth adjustment request to the management module 101 according to the difference between the first duration Tq and the third duration Tq'; accordingly, the management module 101 can send an adjustment instruction to the network controller, Either request the network controller to increase the network bandwidth of the target cluster (that is, increase the network bandwidth of the existing network channel for data transmission), or add a network channel to the target cluster.
  • the added network channel is used to speed up the target cluster to receive the target. Data speed.
  • the second time period Tt for the target cluster to receive the target data can be reduced due to the increase in network bandwidth, so that the target cluster has completed receiving the target data when completing resource scheduling, thereby avoiding the problem of resource waste.
  • the network controller in the process of increasing the network bandwidth of the target cluster, can first increase the network bandwidth on the existing network channel of the target cluster, and if the existing network channel is added to the Before the bandwidth is adjusted to the upper limit, if the current network bandwidth of the target cluster meets the demand, the network controller can end the bandwidth adjustment and does not need to create a new network channel for the target cluster.
  • the network controller can create a new network channel for the target cluster, and The new network channel is allocated to the target cluster to increase the network bandwidth of the target cluster, so that the target cluster can accelerate the reception of target data based on the existing network channel and the newly created network channel.
  • the network controller may directly create a new network channel for the target cluster, and when the number of network channels reaches the upper limit, the network controller may increase the network bandwidth of each network channel, etc. In this embodiment, there is no limitation on the specific implementation method of the network controller increasing the network bandwidth for the target cluster.
  • the agent module 102 can also instruct the management module 101 Request the network controller to reduce the network bandwidth of the target cluster. In this way, the waste of bandwidth resources in the computing power network can be reduced.
  • the agent module 102 may instruct the management module 101 to request the network controller to reduce the network bandwidth for the target cluster to receive target data, or to reduce the number of network channels of the target cluster.
  • the time it takes for the target cluster to receive target data will increase accordingly.
  • the target cluster can also complete the reception of the target data at a similar time, thereby reducing the network bandwidth of the target cluster while avoiding idle resources allocated by the target cluster for the job. Bandwidth resource consumption in the computing network.
  • the scheduler is mainly introduced to dynamically adjust the network bandwidth of the target cluster for a single job.
  • multiple jobs may be submitted to the target cluster, and different jobs may have different network bandwidths for the target cluster.
  • Bandwidth may have different requirements.
  • the scheduler can comprehensively consider the network bandwidth requirements of multiple jobs to determine the network bandwidth requested from the network controller to allocate to the target cluster. The following is a detailed introduction to the process of the scheduler adjusting the network bandwidth of the target cluster for target jobs in multiple jobs.
  • the method may specifically include:
  • the management module 101 obtains multiple bandwidth adjustment requests sent by the agent module 102 for different jobs on the target cluster.
  • Each bandwidth adjustment request includes instruction information for adjusting the bandwidth of the target cluster.
  • multiple jobs may exist on the target cluster, and the target cluster may execute data transmission processes in parallel for multiple jobs at the same time, so as to simultaneously receive data required for executing multiple jobs respectively.
  • the amount of data required to execute different jobs is different, which results in different jobs having different network bandwidth requirements for the target cluster.
  • the amount of data required to execute job A is 10 megabytes (MB).
  • MB megabytes
  • the amount of data required to execute job B is 1 gigabyte (GB), etc.
  • GB gigabyte
  • the P job after the P job completes the data transmission, it can request to reduce the network bandwidth of the target cluster to reduce the waste of bandwidth resources, while the Q job can request to increase the target cluster's network bandwidth to reduce the target cluster before data transmission. The time it takes to transmit data, etc.
  • the instruction information for adjusting the bandwidth of the target cluster carried in the bandwidth adjustment request may, for example, be a positive or negative value, and when the instruction information is a positive value, it means that the bandwidth of the target cluster is increased, and the bandwidth increase range is Positive value; when the indication information is a negative value, it means reducing the bandwidth of the target cluster, and the amplitude of bandwidth reduction is the absolute value of the negative value.
  • the indication information in the bandwidth adjustment request may also be implemented by adjusting the direction and size to indicate increasing the bandwidth of the target cluster.
  • the agent module 102 on the target cluster can generate multiple different bandwidth adjustment requests for different jobs, and send the multiple bandwidth adjustment requests to the management module 101, so that the management module 101 requests the network controller Adjust the network bandwidth of the target cluster accordingly.
  • the management module 101 calculates the update amount of the network bandwidth of the target cluster based on multiple bandwidth adjustment requests.
  • the management module 101 may divide the multiple bandwidth adjustment requests into two categories: bandwidth expansion and bandwidth reduction according to the instruction information for adjusting the target cluster bandwidth included in the multiple bandwidth adjustment requests, and adjust the bandwidth adjustment request according to the bandwidth expansion category. Calculate the total expansion value of the bandwidth of the target cluster according to one or more bandwidth adjustment requests under the bandwidth reduction category, and calculate the total reduction value of the bandwidth of the target cluster according to one or more bandwidth adjustment requests under the bandwidth reduction category, so that the management module 101 The difference between the total expansion value and the total contraction value can be calculated.
  • the difference is the update amount of the network bandwidth of the target cluster, and when the difference is greater than 0, the management module 101 determines to increase the network bandwidth of the target cluster, and the increase in bandwidth is the difference; and when When the difference is less than 0, the management module 101 determines to reduce the network bandwidth of the target cluster, and the bandwidth reduction amount is the absolute value of the difference.
  • the management module 101 may receive multiple bandwidth adjustment requests for the same job. For example, when there is a delay or other abnormality in the communication network between the agent module 102 and the management module 101, the agent module 102 may repeatedly send multiple bandwidth adjustment requests to the management module 101 to ensure that the management module 101 can receive the agent module 102 Bandwidth adjustment request sent. Therefore, after obtaining multiple bandwidth adjustment requests, the management module 101 can first filter the multiple bandwidth adjustment requests to filter out repeated bandwidth adjustment requests for the same job, and calculate the target cluster based on the remaining bandwidth adjustment requests. The updated amount of network bandwidth.
  • the management module 101 since the management module 101 is usually responsible for scheduling jobs within the entire computing network, the management module 101 may receive bandwidth adjustment requests for jobs on different clusters. To this end, the management module 101 can also divide the obtained multiple bandwidth adjustment requests according to clusters, determine one or more bandwidth adjustment requests belonging to each cluster, and thereby adjust the network bandwidth of the target cluster based on the bandwidth adjustment requests belonging to the target cluster. Make adjustments.
  • this embodiment may also include:
  • step S603 The management module 101 determines whether to expand or reduce the network bandwidth of the target cluster according to the update amount, and when the network bandwidth of the target cluster is expanded, step S604 is executed; when the network bandwidth of the target cluster is reduced. When appropriate, execute step S610.
  • step S604 The management module 101 determines whether the network bandwidth of the current network channel of the target cluster reaches the upper limit. If so, step S607 is executed. If not, step S605 is continued.
  • the management module 101 requests the network controller to increase the network bandwidth of the current network channel of the target cluster according to the update amount.
  • step S606 The management module 101 determines whether the size of this bandwidth adjustment matches the update amount. If it matches, this bandwidth adjustment ends. If it does not match, step S607 continues.
  • the current network bandwidth of one or more network channels of the target cluster may not reach the upper limit.
  • the overall expansion capacity of the target cluster's bandwidth may still be less than the update amount. In this case, it is difficult to meet the target cluster's network bandwidth adjustment target based on the existing network channel. Therefore, by continuing to execute step S607, the remaining portion of the network bandwidth of the target cluster can be expanded.
  • step S607 The management module 101 requests the network controller to determine whether a network channel can be added to the target cluster. If yes, step S608 is continued. If not, step S609 is executed.
  • the management module 101 requests the network controller to create a new network channel for the target cluster, and returns to step S606.
  • S609 The management module 101 determines that the network bandwidth adjustment for the target cluster has reached the upper limit.
  • the management module 101 can notify the agent module 102 deployed on the target cluster that the network bandwidth of the target cluster has been adjusted to the maximum value and cannot further expand the bandwidth of the target cluster network.
  • the management module 101 determines the network channel that needs to be released based on the update amount.
  • the scheduler You can determine to release network channel 1 to reduce the network bandwidth of the target cluster by 100Mbps.
  • the management module 101 requests the network controller to release the determined network channel.
  • step S612 The management module 101 determines whether the size of this bandwidth adjustment matches the update amount. If it matches, this bandwidth adjustment ends. If it does not match, step S613 continues.
  • step S613 The management module 101 determines whether the network bandwidth of the remaining network channels of the target cluster reaches the lower limit. If so, the current network bandwidth adjustment ends. If not, step S614 continues.
  • the network bandwidth of the remaining network channels of the target cluster reaches the lower limit, it means that the network channels of the target cluster have reached the lower limit of release, and the bandwidth of each remaining network channel reaches the lower limit.
  • the management module 101 requests the network controller to reduce the capacity of the target cluster to reduce the remaining part of the network bandwidth of the target cluster.
  • the target cluster currently includes network channel 1, network channel 2, and network channel 3, whose bandwidths are 100 megabits per second (Mbps), 200Mbps, and 500Mbps respectively, and assume that the update volume is 180Mbps.
  • the scheduler can request the network controller to continue reducing the target cluster, such as reducing the network bandwidth in network channel 2 from 200Mbps to 180Mbps. 120Mbps, or reduce the network bandwidth in network channel 3 from 500Mbps to 420Mbps, so that the total reduction capacity of the network bandwidth of the target cluster reaches 180Mbps.
  • FIG 7 is a schematic structural diagram of a scheduler provided by this application. As shown in Figure 7, the scheduler 700 is used to manage multiple clusters, including a management module 701 and an agent module 702. The management module 701 and the agent module 702 can be used to implement the management in the embodiments shown in Figures 4 to 6. Module 101 and the method executed by the agent module 102.
  • the management module 701 is used for:
  • the agent module 702 is configured to instruct the target cluster to schedule resources for the job, and obtain target data in the process of scheduling resources for the job, where the target data is data required for executing the job.
  • the management module 701 when determining a target cluster for processing the job from the multiple clusters, is specifically used to:
  • the cluster with the smallest waiting time among the multiple clusters is used as the target cluster.
  • the management module 701 may also schedule jobs to the cluster with the smallest load for processing based on the load of each cluster. This is used to achieve load balancing in the computing power network.
  • the management module 701 is also used to:
  • the agent module 702 instructs the target cluster to schedule resources for the job, and when obtaining target data in the process of scheduling resources for the job, is specifically used to:
  • the target cluster is instructed to schedule resources for the job, and target data is obtained in the process of scheduling resources for the job.
  • the agent module 702 can also estimate the first duration for the target cluster to schedule resources for the job, and calculate the second duration for the target cluster to obtain the target data. In order to instruct the target cluster to obtain the target data in the process of scheduling resources for the job according to the first duration and the second duration.
  • the management module 701 is also used to:
  • the agent module 702 instructs the target cluster to schedule resources for the job, and when obtaining target data in the process of scheduling resources for the job, is specifically used to:
  • the target cluster When the first duration is not greater than the second duration, instruct the target cluster to start acquiring the target data at the first moment before instructing the target cluster to schedule resources for the job, and instruct the target cluster at the second moment.
  • the target cluster is the job scheduling resource, and the length of the interval between the first time and the second time is the second time length minus the first time length.
  • the management module 701 when estimating the first duration for the target cluster to schedule resources for the job, is specifically used to:
  • the current available resource proportion is the ratio of the current available resource amount of the target cluster to the total resource amount.
  • the historical available resource proportion is The proportion of available resources is the ratio of the average amount of available resources in the target cluster to the amount of total resources in a period of time before scheduling resources for the job;
  • the target cluster is the first duration of the job scheduling resource.
  • the target category is based on one or more of the resource application category, resource application quantity, job dependency data volume, job application, job priority, job queue, and job calculation case. Make sure.
  • the agent module 702 is also used to:
  • the adjustment instruction is used to instruct the network controller to adjust the network bandwidth of the target cluster or adjust the number of network channels of the target cluster.
  • the adjustment instruction is Instructing the network controller to increase the network bandwidth of the target cluster or increase the number of network channels of the target cluster.
  • the adjustment instruction is Instructing the network controller to reduce the network bandwidth of the target cluster or reduce the number of network channels of the target cluster.
  • the management module 701 and the agent module 702 are deployed on a scheduler, and the scheduler is connected to the multiple clusters.
  • the management module 701 is deployed on the scheduler, and the agent module 702 is deployed on the target cluster.
  • the scheduler 700 shown in Figure 7 corresponds to the job scheduling method in the embodiment shown in Figures 4 to 6, so the functions and technical effects of the scheduler 700 can be found in the embodiment shown in Figures 4 to 6 The relevant descriptions in , will not be repeated here.
  • FIG 8 is a schematic structural diagram of a scheduler provided by this application.
  • the scheduler 800 includes a processor 801, a memory 802, a communication interface 803 and a bus 804.
  • the processor 801, the memory 802 and the communication interface 803 communicate through the bus 804.
  • the bus 804 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 8, but it does not mean that there is only one bus or one type of bus.
  • the communication interface 803 is used to communicate with the outside, such as receiving jobs (and target data) submitted by users through a terminal or client, etc.
  • the processor 801 may be a CPU.
  • the processor 801 may also be other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASICs), or field-programmable gate arrays (FPGAs). ) or other programmable logic devices, discrete gate or transistor logic devices, discrete device components, etc.
  • DSP digital signal processors
  • ASICs application-specific integrated circuits
  • FPGAs field-programmable gate arrays
  • a general-purpose processor can be a microprocessor or any conventional processor, etc.
  • the memory 802 may include read-only memory and random access memory and provides instructions and data to the processor 801 .
  • Memory 802 may also include non-volatile random access memory.
  • memory 802 may also store device type information.
  • the memory 802 may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • Double data rate synchronous dynamic random access memory double data date SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous link dynamic random access memory direct rambus RAM, DR RAM
  • the memory 802 stores executable code
  • the processor 801 executes the executable code to execute the method executed by the foregoing management module 101, or execute the method executed by the foregoing agent module 102, or execute the foregoing management module 101 and the agent module 102. method of execution.
  • the scheduler 800 may correspond to the scheduler 700 in the embodiment of the present application, and may correspond to the management module 101 in executing the method shown in FIGS. 4 to 6 according to the embodiment of the present application and
  • the above and other operations and/or functions implemented by the agent module 102 and the scheduler 800 are respectively intended to implement the corresponding processes of the respective methods in Figures 4 to 6. For the sake of brevity, they will not be described again here.
  • embodiments of the present application also provide a computer-readable storage medium that stores instructions, which when run on the scheduler, cause the scheduler to execute the management module 101 in the above embodiment and The method executed by the agent module 102.
  • embodiments of the present application also provide a computer program product.
  • the computer program product When the computer program product is executed by a scheduler, the one or more schedulers execute any of the foregoing job scheduling methods.
  • the computer program product may be a software installation package. If it is necessary to use any of the foregoing job scheduling methods, the computer program product may be downloaded and executed on the computer.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate.
  • the physical unit can be located in one place, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the connection relationship between modules indicates that there are communication connections between them, which can be specifically implemented as one or more communication buses or signal lines.
  • the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology.
  • the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to cause a computer device (which can be a personal computer, training device, or network device, etc.) to execute the steps described in various embodiments of this application. method.
  • a computer device which can be a personal computer, training device, or network device, etc.
  • the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination.
  • the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server or a data center that contains one or more sets of available media.
  • the usable media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media.
  • the semiconductor medium may be a solid state drive (SSD)

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Multi Processors (AREA)
  • Computer And Data Communications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请提供了一种调度器,该调度器用于管理多个集群,包括管理模块以及代理模块,其中,管理模块,用于获取待处理的作业,并从多个集群中确定用于处理该作业的目标集群,将该作业下发至代理模块;代理模块,用于指示目标集群为作业调度资源,并在目标集群为作业调度资源的过程中获取目标数据,该目标数据为执行该作业时需要的数据。如此,目标集群可以在调度器的调度下,并行执行传输目标数据的过程以及调度资源的过程,可以有效减少作业从提交到执行过程的等待时长,以此提高作业的处理效率。并且,也能避免分配的资源处于长时间的闲置状态,从而可以缓解资源浪费的问题。此外,本申请还提供了对应的作业调度方法及相关设备。

Description

一种调度器、作业调度方法及相关设备
本申请要求于2022年8月17日提交中国国家知识产权局、申请号为202210988472.7、申请名称为“一种调度器、作业调度方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及互联网技术领域,尤其涉及一种调度器、作业调度方法及相关设备。
背景技术
算力网络(computing network),通常包括多个通过广域网互连的跨地域的数据中心,可以按照需求对算力网络中的计算资源、存储资源以及网络资源进行灵活调度。其中,各个数据中心内部会配置有集群调度器,如高性能计算(high performance computing,HPC)场景下的用于资源管理的简单Linux实用程序(simple linux utility for resource management,SLURM)、负载共享设施(load sharing facility,LSF)、便携式批处理系统(portable batch system,PBS)、Kubernetes、另一种资源协调者(yet another resource negotiator,YARN)中的一种或者多种。并且,算力网络中还部署跨数据中心的调度器,该调度器通过动态对接各个数据中心内的集群调度器,实现跨数据中心的资源调度。
目前,调度器会将用户的作业以及处理该作业所需的数据调度到一个数据中心中的集群进行处理。在此过程中,若用户的作业从提交到执行的过程的等待时间过长,会因为用户的作业长时间未被处理而影响用户体验。因此,如何提高作业处理效率、从而提高用户体验,成为亟需解决的重要问题。
发明内容
本申请提供了一种调度器,用于减少作业从提交到执行的等待时长,以此提高作业处理效率,进而提高用户体验。此外,本申请还提供了一种作业调度方法、计算机可读存储介质以及计算机程序产品。
第一方面,本申请提供了一种调度器,该调度器用于管理多个集群,并且包括管理模块以及代理模块,其中,管理模块,用于获取待处理的作业,例如可以是用户通过终端或者客户端提交的作业等,并从多个集群中确定用于处理该作业的目标集群,从而将该作业下发至代理模块;代理模块,用于指示该目标集群为作业调度资源,并在为作业调度资源的过程中获取目标数据,该目标数据为执行该作业时需要的数据。
如此,目标集群可以在调度器的调度下,并行执行传输目标数据的过程以及调度资源的过程,可以有效减少作业从提交到执行的过程的等待时长,以此提高作业的处理效率,从而可以提高用户体验。并且,也能避免目标集群分配给该作业的资源处于长时间的闲置状态,从而可以缓解资源浪费的问题。
在一种可能的实施方式中,管理模块在从多个集群中确定用于处理该作业的目标集群时,具体是预估该多个集群中每个集群为该作业调度资源的第一时长,并计算该多个集群中的每个集群获取目标数据的第二时长,然后取第一时长及第二时长中的较大值作为该集群处理该作业的等待时长,从而将多个集群中等待时长最小的集群作为目标集群。如此,管理模块通过将作业调度至等待时长最小的目标集群,有助于降低该作业被处理的耗时,也即可以进一步提高该作业被处理的效率。
可选地,管理模块也可以是根据各个集群的负载,将作业调度至负载最小的集群中进行处理,以此实现算力网络中的负载均衡。
在一种可能的实施方式中,管理模块还用于预估目标集群为作业调度资源的第一时长,并计算目标集群获取该目标数据的第二时长,并将该第一时长及第二时长发送给代理模块,这样,代理模块指示目标集群为作业调度资源,并在为作业调度资源的过程中获取目标数据时,具体可以是当第一时长大于第二时长时(也即资源调度的耗时大于获取数据的耗时),指示目标集群在为作业调度资源的过程中获取目标数据。如此,目标集群可以并行执行资源调度和获取目标数据的过程,从而在目标集群完成资源调度后,即可利用调度的资源执行该作业,实现提高该业务的处理效率。
可选地,管理模块在将作业下发至代理模块后,也可以由代理模块预估目标集群为作业调度资源的第一时长,并计算目标集群获取该目标数据的第二时长,以便根据第一时长与第二时长指示目标集群在为作业调度资源的过程中获取目标数据。
在一种可能的实施方式中,管理模块还用于预估目标集群为作业调度资源的第一时长,并计算目标 集群获取该目标数据的第二时长,并将该第一时长及第二时长发送给代理模块,这样,代理模块指示目标集群为作业调度资源,并在为作业调度资源的过程中获取目标数据时,具体可以是当第一时长不大于第二时长时,在为作业调度资源之前的第一时刻,指示目标集群开始获取目标数据,并在第二时刻指示目标集群为作业调度资源,其中,第一时刻与第二时刻之间间隔的时长,为第二时长减去第一时长。如此,目标集群可以先执行获取数据的过程,并在获取的过程中执行资源调度的过程,这可以使得目标集群可以在同一时刻或者相近时刻结束资源调度和数据获取的过程,以便利用调度的资源执行该作业,从而不仅可以提高作业处理的效率,而且,也能避免目标集群为作业调度的资源因为等待目标集群获取目标数据而发生资源浪费的问题。
在一种可能的实施方式中,管理模块在预估目标集群为作业调度资源的第一时长时,具体可以是确定作业所属的目标类别,并计算该目标集群为目标类别的历史作业调度资源的平均时长,并获取该目标集群的当前可用资源占比与历史可用资源占比,其中,当前可用资源占比为目标集群当前的可用资源的资源量相对于总资源的资源量的比值,历史可用资源占比为目标集群在过去一段时间内的可用资源的资源量相对于总资源的资源量的比值,从而管理模块根据该平均时长、当前可用资源占比以及历史可用资源占比,预估得到目标集群为作业调度资源的第一时长。如此,管理模块可以利用各个集群在过段一段时间内为同一类别的作业进行资源调度的耗时,预估得到该集群为当前用户提交的作业调度资源的耗时,提高预估的准确性以及可靠性。
其中,在计算当前可用资源占比的过程中,当可用资源包括一种资源时,如包括计算资源,则当前可用资源占比,具体可以是目标集群中可用的处理器(或者处理器核)的数量与处理器的总数(或者处理器核总数)的比值。当可用资源包括多种资源时,如同时包括计算资源、存储资源以及网络资源等,则当前可用资源占比,可以根据每种类型的可用资源的资源量与该类型的总资源的资源量之间比值进行加权求和得到,或者可以通过其它方式计算得到。
在计算历史可用资源占比的过程中,管理模块可以统计目标集群在过去一段时间内的多个时刻分别为历史作业分配资源时的可用资源的资源量,然后,管理模块可以计算该多个时刻分别对应的可用资源的资源量的平均值,以此可以进一步计算出可用资源的资源量的平均值相对于该目标集群的总资源的资源量的占比。或者,管理模块也可以根据目标集群在过去某个时刻为历史作业分配资源时的可用资源的资源量与目标集群的总资源的资源量的占比,并将其作为历史可用资源占比。
在一种可能的实施方式中,用户提交的作业所属的目标类别根据资源申请类别、资源申请数量、作业依赖数据量、作业所属应用、作业优先级、作业所在队列、作业所属算例中的一种或者多种进行确定。进一步地,目标类别也可以是根据作业属主进行确定,其中,作业属主即为作业所属的主体,可以是租户或者用户等。
在一种可能的实施方式中,代理模块还用于监控目标集群为作业调度资源的第三时长,并根据该第三时长以及预估的第一时长,指示管理模块发送调节指令至目标集群的网络控制器,该调节指令用于指示网络控制器调整目标集群的网络带宽,或者调整目标集群的网络通道的数量。如此,可以在目标集群实际为作业调度资源的时长小于预估的时长时,通过调整目标集群的网络带宽或者调整网络通道的数量,使得目标集群可以在同一时刻或者相近时刻结束资源调度以及目标数据获取的过程。
在一种可能的实施方式中,当第三时长小于第一时长,且第一时长与第三时长之间的差值大于预设值时,表征目标集群实际为作业调度资源的耗时小于预估的耗时,此时,管理模块向网络控制器发送的调节指令,用于指示网络控制器增加目标集群的网络带宽,或者增大目标集群的网络通道的数量。如此,可以提高目标集群获取目标数据的速率、从而减小目标集群获取目标数据的耗时,这不仅可以进一步加快作业的处理效率,而且,也能避免目标集群为作业调度的资源,因为长时间等待目标集群获取目标数据而产生资源浪费的问题。
在一种可能的实施方式中,当第三时长大于第一时长,且第三时长与第一时长之间的差值大于预设值时,表征目标集群实际为作业调度资源的耗时大于预估的耗时,此时,管理模块向网络控制器发送的调节指令,用于指示网络控制器减少目标集群的网络带宽,或者减小目标集群的网络通道的数量。如此,目标集群为作业调度资源后,目标集群也能在相近的时刻完成目标数据的接收,从而在避免目标集群为作业分配的资源出现闲置的条件下,通过减少目标集群的网络带宽能够实现降低算力网络中的带宽资源 消耗。
在一种可能的实施方式中,管理模块与代理模块可以部署在一个计算设备上,该计算设备连接多个集群,以便对该多个集群进行管理。
在一种可能的实施方式中,管理模块部署在计算设备上,代理模块部署在目标集群上,从而管理模块可以通过目标集群上的代理模块控制目标集群并行执行获取数据以及资源调度的过程。
第二方面,本申请提供一种作业调度方法,该方法应用于调度器,该调度器包括管理模块和代理模块,具体地,管理模块获取待处理的作业,并从多个集群中确定用于处理作业的目标集群,并将作业下发至代理模块;代理模块指示目标集群为作业调度资源,并在为作业调度资源的过程中获取目标数据,其中,目标数据为执行作业时需要的数据。
在一种可能的实施方式中,管理模块从多个集群中确定用于处理作业的目标集群,包括:管理模块预估多个集群中的每个集群为作业调度资源的第一时长,并计算多个集群中的每个集群获取目标数据的第二时长;然后,管理模块取第一时长及第二时长中的较大值作为该集群处理作业的等待时长,并将多个集群中等待时长最小的集群作为目标集群。
可选地,管理模块也可以是根据各个集群的负载,将作业调度至负载最小的集群中进行处理,以此实现算力网络中的负载均衡。
在一种可能的实施方式中,在代理模块指示目标集群在为作业调度资源的过程中获取目标数据之前,管理模块还预估目标集群为作业调度资源的第一时长,并计算目标集群获取目标数据的第二时长,并发送第一时长及第二时长至代理模块;这样,代理模块在指示目标集群在为作业调度资源的过程中获取目标数据时,具体可以是当第一时长大于第二时长时,代理模块指示目标集群为作业调度资源,并在为作业调度资源的过程中获取目标数据。
可选地,管理模块在将作业下发至代理模块后,也可以由代理模块预估目标集群为作业调度资源的第一时长,并计算目标集群获取该目标数据的第二时长,以便根据第一时长与第二时长指示目标集群在为作业调度资源的过程中获取目标数据。
在一种可能的实施方式中,在代理模块指示目标集群在为作业调度资源的过程中获取目标数据之前,管理模块还预估目标集群为作业调度资源的第一时长,并计算目标集群获取目标数据的第二时长,并发送第一时长及第二时长至代理模块;这样,代理模块在指示目标集群在为作业调度资源的过程中获取目标数据时,具体可以是当第一时长不大于第二时长时,在指示目标集群为作业调度资源之前的第一时刻,指示目标集群开始获取目标数据,并在第二时刻指示目标集群为作业调度资源,第一时刻与第二时刻之间间隔的时长,为第二时长减去第一时长。
在一种可能的实施方式中,管理模块在预估目标集群为作业调度资源的第一时长时,具体可以是确定作业所属的目标类别,并计算目标集群为目标类别的历史作业调度资源的平均时长,然后获取目标集群的当前可用资源占比与历史可用资源占比,当前可用资源占比为目标集群当前的可用资源的资源量相对于总资源的资源量的比值,历史可用资源占比为目标集群在过去一段时间内的可用资源的资源量相对于总资源的资源量的比值,从而管理模块根据平均时长、当前可用资源占比、历史可用资源占比,预估目标集群为作业调度资源的第一时长。
其中,在计算当前可用资源占比的过程中,当可用资源包括一种资源时,如包括计算资源,则当前可用资源占比,具体可以是目标集群中可用的处理器(或者处理器核)的数量与处理器的总数(或者处理器核总数)的比值。当可用资源包括多种资源时,如同时包括计算资源、存储资源以及网络资源等,则当前可用资源占比,可以根据每种类型的可用资源的资源量与该类型的总资源的资源量之间比值进行加权求和得到,或者可以通过其它方式计算得到。
在计算历史可用资源占比的过程中,管理模块可以统计目标集群在过去一段时间内的多个时刻分别为历史作业分配资源时的可用资源的资源量,然后,管理模块可以计算该多个时刻分别对应的可用资源的资源量的平均值,以此可以进一步计算出可用资源的资源量的平均值相对于该目标集群的总资源的资源量的占比。或者,管理模块也可以根据目标集群在过去某个时刻为历史作业分配资源时的可用资源的资源量与目标集群的总资源的资源量的占比,并将其作为历史可用资源占比。
在一种可能的实施方式中,目标类别根据资源申请类别、资源申请数量、作业依赖数据量、作业所 属应用、作业优先级、作业所在队列、作业所属算例中的一种或者多种进行确定。可选地,目标类别也可以根据作业属主进行确定。
在一种可能的实施方式中,代理模块还可以监控目标集群为作业调度资源的第三时长;并根据第一时长与第三时长指示管理模块发送调节指令至目标集群的网络控制器,调节指令用于指示网络控制器调整目标集群的网络带宽,或者调整目标集群的网络通道的数量。
在一种可能的实施方式中,当第三时长小于第一时长,且第一时长与第三时长之间的差值大于预设值时,调节指令用于指示网络控制器增加目标集群的网络带宽,或者增大目标集群的网络通道的数量。
在一种可能的实施方式中,当第三时长大于第一时长,且第三时长与第一时长之间的差值大于预设值时,调节指令用于指示网络控制器减少目标集群的网络带宽,或者减小目标集群的网络通道的数量。
在一种可能的实施方式中,管理模块和代理模块部署在一个计算设备上,该计算设备连接有多个集群。
在一种可能的实施方式中,管理模块部署在计算设备上,代理模块部署在目标集群上。
第二方面提供的作业调度方法,对应于第一方面提供的调度器,故第二方面以及第二方面任一种可能实现方式中的作业调度方法所具有的技术效果,可以参照第一方面以及第一方面中相应实现方式所具有的技术效果,在此不做赘述。
第三方面,本申请提供一种调度器,所述调度器包括处理器和存储器;该存储器用于存储指令,当该调度器运行时,该处理器执行该存储器存储的该指令,以使该调度器执行上述第二方面或第二方面任一种可能实现方式中管理模块及代理模块所执行的步骤。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。调度器还可以包括总线。其中,处理器通过总线连接存储器。其中,存储器可以包括可读存储器以及随机存取存储器。
第四方面,本申请提供一种调度器,所述调度器包括处理器和存储器;该存储器用于存储指令,当该调度器运行时,该处理器执行该存储器存储的该指令,以使该调度器执行上述第二方面或第二方面任一种可能实现方式中管理模块所执行的步骤。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。调度器还可以包括总线。其中,处理器通过总线连接存储器。其中,存储器可以包括可读存储器以及随机存取存储器。
第五方面,本申请提供一种调度器,所述调度器包括处理器和存储器;该存储器用于存储指令,当该调度器运行时,该处理器执行该存储器存储的该指令,以使该调度器执行上述第二方面或第二方面任一种可能实现方式中代理模块所执行的步骤。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。调度器还可以包括总线。其中,处理器通过总线连接存储器。其中,存储器可以包括可读存储器以及随机存取存储器。
第六方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在调度器上运行时,使得调度器执行上述第二方面或第二方面的任一种实现方式所述的方法。
第七方面,本申请提供了一种包含指令的计算机程序产品,当其在调度器上运行时,使得调度器执行上述第二方面或第二方面的任一种实现方式所述的方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其它的附图。
图1为本申请实施例提供的一示例性算力网络示意图;
图2为本申请实施例提供的另一示例性算力网络示意图;
图3为本申请实施例提供的又一示例性算力网络示意图;
图4为本申请实施例提供的一种作业调度方法的流程示意图;
图5为本申请实施例提供的另一种作业调度方法的流程示意图;
图6为本申请实施例提供的一种网络带宽调整方法的流程示意图;
图7为本申请实施例提供的一种调度器的结构示意图;
图8为本申请实施例提供的一种调度器的结构示意图。
具体实施方式
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解,这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。
目前,算力网络中的不同数据中心之间,可能存在多种类型的组织关系,如包括上下级的强管控关系、或者非上下级的互为对等(peer)的关系、或者强管控关系与对等关系的混合。例如,在图1所示的算力网络100中,多个集群之间可以采用中心化组网模式构建算力网络,其中,集群A、集群B、集群C之间属于强管控关系,即集群A分别对集群B、集群C进行强管控。在图2所示的算力网络200中,多个集群之间可以采用非中心化组网模式构建算力网络,其中,集群X、集群Y、集群Z之间互为对等关系。在图3所示的算力网络300中,多个集群之间可以采用强管控关系与对等关系混合的组网模式构建算力网络,该算力网络中包括多个分区(region),其中,每个分区内的多个集群之间属于强管控关系,不同分区内的集群之间互为对等关系。
在每种算力网络中,通常会利用跨集群的调度器以及部署于各个集群内的集群调度器,实现跨集群的资源调度(以及数据调度)。以在图1所示的算力网络100中进行资源调度为例(其余类型的算力网络类似),集群B以及集群C中可以分别配置有一个或者多个设备。并且,算力网络100中还包括调度器,该调度器包括管理模块101以及多个代理模块102。其中,集群A至集群C上分别部署有代理模块102,如图1所示,此时,管理模块101可以部署于集群A中的一个计算设备,或者可以部署于在算力网络100中独立于各个集群的计算设备。而在其它的实现方式中,管理模块101与多个代理模块102可以部署于同一计算设备,如部署于独立于各个集群的计算设备,该计算设备连接多个集群,以便调度器基于该连接分别为各个集群进行作业调度。
调度器在接收到用户201的作业(如训练AI模型的作业),调度器中的管理模块101可以将该作业调度至可用资源满足处理该作业所需资源的集群B(或者可以将作业调度至集群A以及集群C),具体可以是先将作业调度至集群B上的代理模块102,再由代理模块102指示集群B为该作业调度相应的资源(如包括计算资源、存储资源、网络资源等),其中,集群B可以利用内部配置的集群调度器为该作业调度资源。另外,执行该作业所需的数据(以下称之为目标数据)可能存储于集群C,因此,调度器还可以指示集群C将目标数据传输给集群B。或者,目标数据可以是由用户201上传给调度器,从而调度器可以将该目标数据下发给集群B。又或者,调度器可以向集群B发送目标数据的存储地址,如统一资源定位符(uniform resource locator,URL)等,从而集群B可以根据接收到的存储地址访问集群C,以便从集群C中获得目标数据。
在此过程中,如果集群B先排队为该作业调度资源后,再从集群C或者调度器获取目标数据,则作业从提交到执行的耗时主要为该作业在集群B中排队分配到资源的等待时长、目标数据在集群C与集群B之间的传输时长之和,这导致用户201的作业从提交到执行的过程的等待时间较长,从而因为用户201的作业长时间未被执行而影响用户体验。
基于此,本申请实施例提供的调度器中,管理模块101在确定用于处理作业的集群B并将作业下发至集群B上的代理模块102后,代理模块102指示该目标集群为该作业调度资源,并且在为作业调度资源的过程中获取执行作业所需的目标数据。这样,通过让集群B并行执行获取目标数据的过程以及为作业调度资源的过程,可以有效减少作业从提交到执行的过程的等待时长,以此提高作业的处理效率,从而可以提高用户体验。并且,也能避免集群B分配给该作业的资源处于长时间的闲置状态,以此可以缓解资源浪费的问题。
值得注意的是,上述实现示例是以在图1所示的算力网络中部署管理模块101以及代理模块102为例进行示例性说明,在图2以及图3所示的算力网络中,也可以部署相应的调度器,从而利用该调度器在该算力网络中将作业调度至相应的集群上执行。
并且,上述图1至图3所示的算力网络仅作为一种示例性说明,其中,调度器中的管理模块101以及 代理模块102部署于不同的位置,而在其它算力网络中,管理模块101以及代理模块102可以集成部署,如集成部署于一个计算设备上,并且该计算设备通过有线网络或者无线网络连接多个集群,调度器中的不同代理模块102可以负责对不同集群进行作业调度。或者,在其它算力网络中,每个分区内的不同集群之间可以互为对等关系,而不同分区之间可以属于强管控关系等,本实施例对此并不进行限定。
为便于理解,下面结合附图,对调度器为用户的作业进行调度过程进行详细介绍。
参见图4,示出了本申请实施例提供的一种作业调度方法的流程示意图,该方法可以应用于图1至图3所示的算力网络中,或者可以用于其它算力网络中。如图4所示,该方法具体可以包括:
S401:调度器中的管理模块101获取待处理的作业。
实际应用时,用户可以将作业提交至算力网络,并请求算力网络对该作业进行处理。例如,用户可以通过终端或者管理模块101对外提供的客户端远程登录调度器,并在该终端或者客户端提交作业,该终端或者客户端可以生成数据处理请求,该数据处理请求中包括用户提交的待处理的作业,从而该终端或者客户端可以将其发送给管理模块101;相应地,管理模块101可以通过对接收到的数据处理请求进行解析,得到用户提交的作业。示例性地,用户提交的作业,例如可以是HPC作业、AI模型训练作业等。相应的,算力网络通过调度器接收作业,并利用该调度器将作业调度至算力网络中的相应的集群进行处理。
进一步地,用户在提交作业时,还可以指定处理该作业所需的资源,包括资源类型、资源规格等。用户所指定的资源,可以包括计算资源、存储资源、网络资源中的一种或者多种。其中,计算资源,可以是中央处理器(central processing unit,CPU)、数据处理器(data processing unit,DPU)、基础设施处理器(infrastructure processing unit,IPU)等具有算力的资源;存储资源,可以是缓存、内存、存储设备等具有存储能力的资源;网络资源,例如可以是上行带宽、下行带宽、网卡类型等网络传输资源。或者,调度器也可以根据作业的类型、作业依赖的数据量等信息,自动确定处理该作业所需的资源,本实施例对此并不进行限定。
S402:管理模块101从多个集群中确定用于处理该作业的目标集群。
本实施例中,算力网络可以包括多个集群、管理模块101以及代理模块102,例如图1至图3所示,并且,每个集群可以包括一个数据中心的设备(如计算设备等)。实际应用时,也可以是基于多个数据中心构建一个集群,例如,当两个数据中心之间的物理距离不超过预设范围时,可以基于这两个数据中心构建一个集群等。对于提交至算力网络中的作业,可以交由该算力网络中的集群执行。因此,调度器在获取到用户提交的作业后,可以从多个集群中,确定用于处理该作业的集群(为便于区分,以下称之为目标集群)。示例性地,本实施例提供了以下几种确定目标集群的实现方式。
在第一种可能的实施方式中,管理模块101可以基于负载均衡策略确定目标集群。具体地,管理模块101可以获取各个集群当前的负载,或者预测该集群在未来一段时间(如未来2秒等)的负载,并将负载最小的集群确定为用于处理用户提交的作业的目标集群。
在第二种可能的实施方式中,管理模块101预估多个集群中的各个集群分别处理该作业的等待时长,其中,每个集群对应的等待时长根据预估的该集群为用户提交的作业分配资源的第一时长、该集群接收该作业所依赖的目标数据的第二时长进行确定,从而管理模块101可以将多个集群中等待时长最小的集群确定为实际处理该作业的目标集群。
具体地,每个集群可能会接收到多个作业,从而多个作业通常会排队等待该集群为其分配处理作业所需的资源。因此,集群为用户提交的作业调度资源的第一时长,即为该作业的排队时长与集群该为作业分配资源的时长之和。此时,如图4所示,管理模块101可以执行步骤S4021,具体为根据各个集群在过去时间段内为作业(以下称之为历史作业)调度资源的耗时,预估该集群为用户提交的作业调度资源的第一时长。
并且,执行该作业所依赖的目标数据(如AI模型训练作业所依赖的训练样本等)可能仅在算力网络的部分集群中存储,这使得当处理该作业的集群未存储目标数据时,通常需要从其它集群中获取目标数据,该其它集群例如可以是与该集群最近的、并且存储有目标数据的集群;或者,用户在提交作业时,也向管理模块101上传执行该作业所需的目标数据,从而管理模块101可以将用户上传的目标数据发送给 该集群;又或者,管理模块101可以向该集群发送目标数据在算力网络中的存储地址,从而该集群可以基于该存储地址从算力网络中获取该目标数据。此时,如图4所示,管理模块101可以执行步骤S4022,具体为计算各个集群获取目标数据的第二时长,该第二时长例如可以是目标数据的数据量与传输带宽的比值。可以理解,对于存储有目标数据的集群,其对应的第二时长为0,即可以无需执行数据传输过程。在计算第二时长的过程中,若集群当前还存在其它作业进行数据传输,则集群对应的第二时长为目标数据在两个集群之间的传输耗时与用户作业在集群上排队等待传输数据的耗时之和。
然后,管理模块101可以执行步骤S4023以及步骤S4024,以确定目标集群。具体的,针对各个集群,管理模块101可以比较该集群对应的第一时长以及第二时长的大小,并将其中的较大值确定为作业被下发至该集群时所需等待被处理的时长,即上述等待时长。这样,管理模块101可以根据各个集群分别对应的等待时长,从多个集群中确定出作业等待时长最小的目标集群。例如,管理模块101可以对各个集群分别对应的等待时长按照从小到大的顺序进行排序,并确定数值最小的等待时长,从而可以将该数值最小的等待时长所对应的集群确定为目标集群。
实际应用时,由于算力网络中的部分集群上的剩余可用资源可能不满足执行该作业所需的资源条件,如剩余可用资源的资源类型或者资源量不满足等,因此,管理模块101在确定目标集群的过程中,可以先对算力网络中剩余可用资源不满足该资源条件的集群进行过滤,并从过滤后所得到的多个集群中进一步确定出目标集群。如此,可以减少管理模块101确定目标集群所需的计算量。
示例性地,在基于第二种实施方式确定目标集群的过程中,管理模块101在预估各个集群为用户提交的作业分配资源的第一时长时,可以根据该集群在过去一段时间(如过去的一个月等)内为历史作业分配资源的时长进行确定。
具体实现时,管理模块101可以预先将算力网络中的作业划分为多个类别,例如可以根据作业对应的资源申请类别、资源申请数量、作业依赖数据量、作业所属应用、作业优先级、作业的属主(如作业所属的用户或者租户)、作业所在队列、作业所属算例中的一种或者多种信息,对作业进行类别划分。实际应用时,算力网络中的作业所属类别,可以由技术人员预先将其配置于管理模块101中。这样,针对多个集群中的每个集群,管理模块101可以计算该集群在过去一段时间内为每个类别下的多个历史作业调度资源的平均时长(若某个类别下的历史作业的数量为一个,则平均时长即为集群为该历史作业调度资源的时长)。如此,管理模块101可以计算出各个集群为每种类别的历史作业调度资源的平均时长。
并且,针对多个集群中的每个集群,管理模块101还获取各个类别下的历史作业在该集群上的当前可用资源占比与历史可用资源占比,其中,当前可用资源占比是指集群当前的可用资源的资源量相对于该集群的总资源的资源量的占比。比如,当可用资源包括一种资源时,如包括计算资源,则当前可用资源占比,具体可以是该集群中可用的处理器(或者处理器核)的数量与处理器的总数(或者处理器核总数)的比值。当可用资源包括多种资源时,如同时包括计算资源、存储资源以及网络资源等,则当前可用资源占比,可以根据每种类型的可用资源的资源量与该类型的总资源的资源量之间比值进行加权求和得到,或者可以通过其它方式计算得到,本实施例对此并不进行限定。
历史可用资源占比,是指集群在为用户提交的作业调度资源之前的一段时间内,为每种类别下的多个历史作业分配资源时可用资源的资源量的平均值(以下称之为平均可用资源的资源量)相对于该集群的总资源的资源量的占比。比如,管理模块101可以统计集群在过去一段时间内的多个时刻分别为历史作业分配资源时的可用资源的资源量,然后,管理模块101可以根据该多个时刻分别对应的可用资源的资源量,计算平均可用资源的资源量,以此可以进一步计算出该平均可用资源的资源量相对于该集群的总资源的资源量的占比。在其它实施例中,历史可用资源占比,也可以是指集群在为用户提交的作业调度资源之前的某个时刻,为每种类别下的多个历史作业分配资源时可用资源的资源量相对于该集群的总资源的资源量的占比,本实施例对此并不进行限定。
实际应用时,管理模块101可以根据各个集群的资源使用情况,计算出各个集群上的各类历史作业所对应的当前可用资源占比与历史可用资源占比。
这样,管理模块101在接收到用户提交的作业后,针对每个集群,可以根据该作业所属的目标类别,确定该集群为目标类别的历史作业调度资源的平均时长,从而根据目标类别的历史作业对应的平均时长、当前可用资源占比以及历史可用资源占比,预估该集群为用户提交的作业排队等待分配资源的第一时长。 如此,可以预估得到各个集群分别为用户提交的作业调度资源的第一时长。在一个示例中,管理模块101可以采用下述公式(1)计算出第一时长。
其中,Tq为预估集群为用户提交的作业调度资源的第一时长;Tqx为集群为目标类别的历史作业分配资源的平均时长;RPx为历史可用资源占比;RPcurrent为当前可用资源占比。基于上述类似过程,管理模块101可以确定出各个集群为各种类别的作业调度资源所分别对应的第一时长。
需要说明的是,上述预估目标集群为作业调度资源的第一时长的实现方式仅作为一种示例说明,在其它实施例中,管理模块101也可以采用其他方式预估第一时长。比如,调度器在确定出目标集群在过去一段时间内为多个历史作业分配资源的时长后,可以根据该多个历史作业对应的时长的中位数、以及上述当前可用资源占比、历史可用资源占比预估目标集群为作业调度资源的第一时长。实际应用时,管理模块101根据多个历史作业对应的时长选择计算中位数还是平均数来计算第一时长,可以是根据技术人员预先对管理模块101的配置进行确定。
S403:管理模块101将作业下发至目标集群对应的代理模块102。
示例性地,管理模块101在确定目标集群后,可以向目标集群对应的代理模块102发送执行指令,该执行指令包括用户提交的作业,以指示代理模块102控制目标集群执行该作业。
实际应用时,当目标集群上未存储有执行该作业所需的目标数据时,管理模块101还可以向目标集群发送该目标数据,该目标数据例如可以是由用户上传,从而管理模块101在确定目标集群后,将作业与目标数据一并下发给目标集群。或者,目标数据可以存储于算力网络中的其它集群,此时,管理模块101可以指示目标集群向存储有该目标数据的其它集群请求该目标数据,或者,管理模块101可以向该其它集群发送数据共享请求,以指示其它集群将目标数据共享给目标集群。本实施例中,对于目标集群从管理模块101或者其它集群获取目标数据的具体实现方式并不进行限定。
S404:代理模块102指示目标集群为作业调度资源,并在为作业调度资源的过程中获取目标数据,该目标数据为执行该作业时需要的数据。
若目标集群串行执行获取目标数据以及为作业调度资源的过程,则会导致该作业从提交至执行的过程耗时过长,从而影响作业的执行的处理效率。因此,本实施例中,代理模块102可以指示目标集群在为作业调度资源的过程中获取目标数据。这样,目标集群可以通过并行执行获取目标数据以及为作业调度资源的过程,实现作业的加速处理。其中,当管理模块101与代理模块102部署于同一计算设备时,管理模块101在将作业下发给代理模块102后,可以由代理模块102将该作业发送至目标集群的集群调度器中,以触发集群调度器为该作业调度资源。当代理模块102部署于目标集群时,管理模块101在将作业下发至代理模块102后,代理模块102可以触发目标集群的集群调度器为该作业调度资源。
具体实现时,如图4所示,代理模块102可以执行步骤S4041,具体为接收管理模块101发送的预估目标集群为作业调度资源的第一时长Tq,以及目标集群获取目标数据的第二时长Tt。比如,当管理模块101基于各个集群对应的第一时长以及第二时长,确定目标集群时,管理模块101可以在确定目标集群后,可以将已计算出的目标集群对应的第一时长Tq和第二时长Tt发送给目标集群对应的代理模块102。
然后,代理模块102可以执行步骤S4042、步骤S4043、步骤S4044,具体为代理模块102比较第一时长Tq与第二时长Tt之间的大小。并且,当第一时长Tq大于第二时长Tt,代理模块102指示目标集群在为作业调度资源的过程中获取目标数据。具体实现时,代理模块102可以将该作业提交至目标集群内部的集群调度器,以触发该集群调度器开始为作业分配目标集群中的资源,并且,代理模块102可以指示目标集群获取目标数据,如接收管理模块101或者其它集群发送的目标数据。如此,目标集群在为作业调度资源的过程中,同时执行获取目标数据的操作,以此实现资源调度和数据获取的并行化处理,以便加快作业处理效率。
而当等待时长Tq不大于第二时长Tt,代理模块102可以计算第一时长Tq与第二时长Tt之间的差值,并根据该差值确定触发目标集群为作业调度资源的时刻1以及指示目标集群获取目标数据的时刻2,该时刻2早于时刻1,并且,时刻2与时刻1之间间隔的时长即为该差值。然后,代理模块102可以在时刻2指示目标集群开始获取目标数据,并且,在目标集群接收目标数据的过程中,当达到时刻1时,代理模块102 再指示目标集群为该作业调度资源。如此,目标集群在为作业调度资源的过程中,同时执行获取目标数据的操作,以此实现资源调度和数据获取的并行化处理,以便加快作业处理效率。并且,目标集群先开始执行获取目标数据的过程,并且在执行接收目标数据的一段时间后再开始为作业调度资源,这使得目标集群完成数据获取以及资源调度的时刻相近,从而可以避免目标集群为作业调度资源后因为处于等待状态而造成资源浪费。
值得注意的是,上述实现方式中,是以管理模块101计算第一时长Tq与第二时长Tt并将其发送给代理模块102为例进行示例性说明,在其它实施例中,代理模块102可以执行预估目标集群为作业调度资源的第一时长Tq、以及目标集群获取目标数据的第二时长Tt的计算过程。比如,当管理模块101根据各个集群的当前负载确定目标集群时(管理模块101未计算各个集群对应的第一时长和第二时长),此时,代理模块102可以在接收到管理模块101下发的作业后,预估目标集群为作业调度资源的第一时长Tq、以及目标集群获取目标数据的第二时长Tt,其具体实现过程可参见前述相关之处描述,在此不做赘述。
本实施例中,代理模块102通过指示目标集群并行执行获取目标数据的过程以及为作业调度资源的过程,可以有效减少作业从提交到执行的过程的等待时长,以此提高作业的处理效率,从而可以提高用户体验。
上述图4所示实施例中,主要介绍了调度器确定目标集群以及指示目标集群为用户作业并行执行获取数据和为作业调度资源的过程。在其它可能的实施例中,由于管理模块101预估的目标集群为作业调度资源的第一时长Tq,可能与该目标集群实际为该作业调度资源的时长存在较大差异,如目标集群上排队等待分配资源的部分作业可能发生暂停,或者目标集群优先为优先级更高的作业调度资源等,从而导致目标集群为用户提交的作业实际调度资源的耗时可能会减少,也可能增加。此时,调度器可以通过调整目标集群接收目标数据的网络带宽来同步调整目标集群接收目标数据的耗时。
下面,结合图5对调度器调整目标集群的网络带宽的具体实现过程进行详细描述。
参加图5,图5示出了本申请实施例中另一种作业调度方法的流程示意图,如图5所示,该方法具体可以包括:
S501:管理模块101获取待处理的目标作业。
S502:管理模块101预估算力网络中的各个集群为用户提交的作业调度资源的第一时长。
S503:管理模块101计算各个集群获取目标数据的第二时长。
S504:管理模块101取每个集群对应的第一时长以及第二时长中的较大值,作为该集群处理该作业的等待时长。
S505:管理模块101将多个集群中等待时长最小的集群作为目标集群。
其中,步骤S501至步骤S505的具体实现过程,可参见图4所示实施例中的步骤S401至步骤S402的相关之处描述,本实施例在此不做赘述。
S506:管理模块101将作业下发至目标集群对应的代理模块102中。
S507:代理模块102获取目标集群为作业调度资源的第一时长Tq,以及目标集群获取目标数据的第二时长Tt。
在第一种实现示例中,由于管理模块101在执行步骤S502以及步骤S503的过程中计算出各个集群对应的第一时长以及第二时长,因此,管理模块101可以将其中的目标集群对应的第一时长Tq以及第二时长Tt下发(如随作业一起下发)至代理模块102。
在第二种实现示例中,代理模块102可以计算出目标集群为作业调度资源的第一时长Tq以及目标集群获取目标数据的第二时长Tt,其中,代理模块102计算得到第一时长Tq以及第二时长Tt的具体实现过程,可以参见前述实施例中管理模块101计算各个集群对应的第一时长以及第二时长的相关之处描述,在此不做赘述。
S508:代理模块102比较第一时长Tq与第二时长Tt之间的大小,并且,当第一时长Tq大于第二时长Tt时,继续执行步骤S509;当第一时长Tq不大于第二时长Tt时,继续执行步骤S510。
S509:代理模块102指示目标集群为作业调度资源,并在为作业调度资源的过程中获取目标数据。
S510:代理模块102在时刻2指示目标集群开始获取目标数据,并在达到时刻1时,代理模块102再指 示目标集群为该作业调度资源。其中,时刻2早于时刻1,并且,时刻2与时刻1之间间隔的时长即为第一时长Tq与第二时长Tt之间的差值。
S511:代理模块102监控目标集群为作业调度资源的第三时长Tq’。
S512:判断第一时长Tq与第三时长Tq’之间的差值是否小于预设值,若是,则继续执行步骤S513,若否,则不做处理。
S513:代理模块102指示管理模块101发送调节指令至目标集群的网络控制器,该调节指令用于指示网络控制器调整目标集群的网络带宽,或者调整目标集群的网络通道的数量。
本实施例中,当预估的目标集群为作业调度资源的第一时长Tq与该目标集群实际为该作业调度资源的第三时长Tq’存在较大差异时,调度器可以通过调整目标集群接收目标数据的网络带宽来同步调整目标集群接收目标数据的耗时,以使得目标集群可以在同一时刻或者相近时刻结束资源调度以及目标数据获取。
具体地,代理模块102可以实时监控目标集群为用户提交的调度资源的第三时长Tq’,并且,当该第三时长Tq’与预估的第一时长Tq存在较大差异时(具体可以是第一时长Tq与所述第三时长Tq’之间的差值大于预设值),表征目标集群为作业调度资源的耗时相对于预估的耗时得到较大的缩短,此时,如果基于目标集群当前的带宽接收目标数据,则可能会导致目标集群为作业调度完资源后,需要等待目标集群接收完成目标数据后才能开始执行作业,从而导致已调度的作业因为长时间处于闲置状态而造成资源浪费。
为此,本实施例中可以增大目标集群的网络带宽。具体实现时,代理模块102可以根据第一时长Tq与所述第三时长Tq’之间的差值向管理模块101发送带宽调整请求;相应的,管理模块101可以向网络控制器发送调节指令,以向网络控制器请求增大目标集群的网络带宽(也即增大已有的网络通道传输数据的网络带宽),或者为目标集群增加网络通道,所增加的网络通道用于加快目标集群接收目标数据的速度。如此,目标集群接收目标数据的第二时长Tt可以因为网络带宽的增加而得到减少,以便于目标集群在完成资源调度时,已经完成目标数据的接收,以此避免产生资源浪费的问题。
在一种可能的实施方式中,网络控制器在增大目标集群的网络带宽的过程中,可以先在目标集群的已有网络通道上增大网络带宽,并且,如果在将已有网络通道的带宽调整至上限值之前,目标集群当前的网络带宽以满足需求,则网络控制器可以结束此次带宽调整,并可以不用为目标集群创建新的网络通道。而当在将已有网络通道的带宽调整至上限值时,目标集群当前的网络带宽仍然未达到管理模块101所请求的网络带宽,则网络控制器可以为该目标集群创建新的网络通道,并将该新的网络通道分配给目标集群,以增大目标集群的网络带宽,从而目标集群基于已有的网络通道以及新建的网络通道实现加速目标数据的接收。在其它可能的实施方式中,网络控制器也可以是直接为目标集群创建新的网络通道,并且,当网络通道的数量达到上限时,网络控制器再增大每个网络通道的网络带宽等。本实施例中,对于网络控制器为目标集群增大网络带宽的具体实现方式并不进行限定。
进一步地,在目标集群完成目标数据的接收后,若目标集群当前不存在其它数据传输任务,或者目标集群上的其它数据传输任务对于网络带宽要求较低时,代理模块102还可以指示管理模块101请求网络控制器减小目标集群的网络带宽。如此,可以减少算力网络中的带宽资源的浪费。
另外,当第三时长Tq’大于预估的第一时长Tq,并且第三时长Tq’与第一时长Tq之间的差值大于预设值时,表征目标集群为作业调度资源的耗时较长(超出预估的第一时长Tq),此时,代理模块102可以指示管理模块101请求网络控制器减小目标集群接收目标数据的网络带宽,或者减少目标集群的网络通道数量。相应地,目标集群接收目标数据的耗时会相应增加。如此,目标集群为作业调度资源后,目标集群也能在相近的时刻完成目标数据的接收,从而在避免目标集群为作业分配的资源出现闲置的条件下,通过减少目标集群的网络带宽能够实现降低算力网络中的带宽资源消耗。
本实施例中,通过对目标集群的网络带宽进行灵活调整,能够实现提高作业处理效率的同时,减少网络资源的浪费。
上述图5所示实施例中,主要介绍了调度器为单个作业动态调整目标集群的网络带宽,在其它实施例中,可能存在多个作业被提交至目标集群,而不同作业对于目标集群的网络带宽可能存在不同的要求, 此时,调度器可以基于多个作业的网络带宽要求进行综合考量,以确定向网络控制器请求的为目标集群分配的网络带宽。下面对调度器为多个作业中的目标作业调整目标集群的网络带宽过程进行详细介绍。
参见图6,示出了一种网络带宽调整方法的流程示意图,该方法具体可以包括:
S601:管理模块101获取代理模块102发送的针对目标集群上不同作业的多个带宽调整请求,每个带宽调整请求包括调整该目标集群的带宽的指示信息。
本实施例中,目标集群上可以存在多个作业,并且,目标集群可以同时为多个作业并行执行数据传输过程,以同时接收执行多个作业分别所需的数据。实际应用场景中,执行不同作业所需数据的数据量存在差异,这导致不同作业对于目标集群的网络带宽的要求可能存在差异,如执行作业A所需的数据量为10兆字节(MB),可能要求减小目标集群的网络带宽以减小带宽资源浪费,而执行作业B所需的数据量为1吉字节(GB)等,可能要求增大目标集群的网络带宽以减小目标集群传输数据的耗时。又比如,P作业在完成数据传输后,可以要求减小目标集群的网络带宽以减小带宽资源浪费,而Q作业在进行数据传输之前,可以要求增大目标集群的网络带宽以减小目标集群传输数据的耗时等。
其中,带宽调整请求中携带的调整目标集群带宽的指示信息,例如可以是具有正负之分的数值,并且,当指示信息为正值时,表征增大目标集群的带宽,带宽增加幅度为该正值大小;当指示信息为负值时,表征减小目标集群的带宽,带宽减小的幅度为该负值的绝对值大小。在其它实施例中,带宽调整请求中的指示信息,也可以时通过调整方向以及调整大小来实现,用于指示对目标集群的带宽进行增大。
作为一种实现示例,目标集群上的代理模块102可以针对不同作业,分别生成多个不同的带宽调整请求,并该多个带宽调整请求发送给管理模块101,以便管理模块101向网络控制器请求对目标集群的网络带宽进行相应的调整。
S602:管理模块101根据多个带宽调整请求,计算得到目标集群的网络带宽的更新量。
具体实现时,管理模块101可以根据多个带宽调整请求中包括的调整目标集群带宽的指示信息,将该多个带宽调整请求划分为带宽扩容和带宽缩容这两个类别,并根据带宽扩容类别下的一个或者多个带宽调整请求计算对目标集群的带宽的总扩容值,根据带宽缩容类别下的一个或者多个带宽调整请求计算对目标集群的带宽的总缩容值,从而管理模块101可以计算总扩容值与总缩容值之间的差值。该差值即为目标集群的网络带宽的更新量,并且,当该差值大于0时,管理模块101确定增大目标集群的网络带宽,并且,带宽的增加量为该差值大小;而当该差值小于0时,管理模块101确定减小目标集群的网络带宽,并且,带宽的减少量为该差值的绝对值大小。
实际应用时,管理模块101可能会收到针对同一作业的多个带宽调整请求。比如,当代理模块102与管理模块101之间的通信网络存在时延或者其它异常时,代理模块102可能会重复向管理模块101发送多个带宽调整请求,以确保管理模块101能够接收到代理模块102所发送的带宽调整请求。因此,管理模块101在获取到多个带宽调整请求后,可以先对该多个带宽调整请求进行过滤,以过滤掉针对同一作业的重复带宽调整请求,并基于剩余的带宽调整请求计算得到目标集群的网络带宽的更新量。
另外,由于管理模块101通常负责在整个算力网络内进行作业调度,这使得管理模块101可能会接收到针对不同集群上的作业的带宽调整请求。为此,管理模块101还可以对获取的多个带宽调整请求按照集群进行划分,确定分别属于各个集群的一个或者多个带宽调整请求,从而基于属于目标集群的带宽调整请求对目标集群的网络带宽进行调整。
然后,管理模块101可以根据计算出的更新量,调整目标集群的网络带宽。具体的,本实施例还可以包括:
S603:管理模块101根据该更新量,确定对目标集群的网络带宽进行扩容还是进行缩容,并且,当对目标集群的网络带宽进行扩容时,执行步骤S604;当对目标集群的网络带宽进行缩容时,执行步骤S610。
S604:管理模块101判断目标集群当前的网络通道的网络带宽是否达到上限值,若是,则执行步骤S607,若否,则继续执行步骤S605。
S605:管理模块101根据更新量,请求网络控制器增大目标集群当前的网络通道的网络带宽。
S606:管理模块101判断本次带宽调整的大小是否与更新量相匹配,若匹配,则本次带宽调整结束,而若不匹配,则继续执行步骤S607。
在部分场景中,目标集群当前的一个或者多个网络通道的网络带宽可能未达到上限值,但是,在通 过对该网络通道扩容到上限值后,可能存在目标集群的带宽的总体扩容量仍然小于更新量,此时,基于已有的网络通道难以满足目标集群的网络带宽的调整目标。因此,可以通过继续执行步骤S607,实现对目标集群的网络带宽的剩余部分扩容。
S607:管理模块101请求网络控制器判断是否可以为目标集群增加网络通道,若是,则继续执行步骤S608,若否,则执行步骤S609。
S608:管理模块101请求网络控制器为目标集群创建新的网络通道,并返回执行步骤S606。
S609:管理模块101确定针对目标集群的网络带宽调整已达到上限。
实际应用时,管理模块101可以通知部署于目标集群上的代理模块102,该目标集群的网络带宽已调整至最大值,无法继续为目标集群网络的带宽进一步扩容。
S610:管理模块101根据更新量,确定需要释放的网络通道。
比如,假设目标集群当前包括网络通道1、网络通道2、网络通道3,其网络带宽分别为100兆位每秒(Mbps)、200Mbps、500Mbps,并且,更新量为(降低)100Mbps,则调度器可以确定释放网络通道1,以实现为目标集群的网络带宽降低100Mbps。
S611:管理模块101请求网络控制器释放所确定的网络通道。
S612:管理模块101判断本次带宽调整的大小是否与更新量相匹配,若匹配,则本次带宽调整结束,而若不匹配,则继续执行步骤S613。
S613:管理模块101判断目标集群的剩余网络通道的网络带宽是否达到下限值,若是,则本次网络带宽调整结束,而若不是,则继续执行步骤S614。
当目标集群的剩余网络通道的网络带宽达到下限值时,表征目标集群的网络通道已经达到释放的下限值,并且,剩余的各个网络通道的带宽达到下限值。
S614:管理模块101请求网络控制器为目标集群进行缩容,实现对目标集群的网络带宽的剩余部分缩容。
比如,假设目标集群当前包括网络通道1、网络通道2、网络通道3,其带宽分别为100兆位每秒(Mbps)、200Mbps、500Mbps,并假设更新量为180Mbps,此时,在将网络通道1进行释放后,目标集群的网络带宽仅缩容100Mbps,未达到180Mbps,因此,调度器可以请求网络控制器继续为目标集群进行缩容,如将网络通道2中的网络带宽由200Mbps缩容至120Mbps,或者将网络通道3中的网络带宽由500Mbps缩容至420Mbps,以使得目标集群的网络带宽的总缩容量达到180Mbps。
上文中结合图1至图6,详细描述了本申请所提供的作业调度方法,下面将结合图7至图8,描述根据本申请所提供的调度器。
图7为本申请提供的一种调度器的结构示意图。如图7所示,调度器700用于管理多个集群,包括管理模块701和代理模块702,该管理模块701以及代理模块702可以用于实现上述图4至图6所示实施例中的管理模块101以及代理模块102所执行的方法。
具体地,所述管理模块701用于:
获取待处理的作业;
从所述多个集群中确定用于处理所述作业的目标集群;
将所述作业下发至代理模块702;
所述代理模块702用于指示所述目标集群为所述作业调度资源,并在为所述作业调度资源的过程中获取目标数据,所述目标数据为执行所述作业时需要的数据。
在一种可能的实施方式中,所述管理模块701在从所述多个集群中确定用于处理所述作业的目标集群时,具体用于:
预估所述多个集群中的每个集群为所述作业调度资源的第一时长;
计算所述多个集群中的每个集群获取所述目标数据的第二时长;
取所述第一时长及所述第二时长中的较大值作为该集群处理所述作业的等待时长;
将所述多个集群中等待时长最小的集群作为所述目标集群。
可选地,管理模块701也可以是根据各个集群的负载,将作业调度至负载最小的集群中进行处理, 以此实现算力网络中的负载均衡。
在一种可能的实施方式中,所述管理模块701还用于:
预估所述目标集群为所述作业调度资源的第一时长;
计算所述目标集群获取所述目标数据的第二时长;
发送所述第一时长及所述第二时长至所述代理模块702;
所述代理模块702指示所述目标集群为所述作业调度资源,并在为所述作业调度资源的过程中获取目标数据时,具体用于:
当所述第一时长大于所述第二时长时,指示所述目标集群为所述作业调度资源,并在为所述作业调度资源的过程中获取目标数据。
可选地,管理模块701在将作业下发至代理模块702后,也可以由代理模块702预估目标集群为作业调度资源的第一时长,并计算目标集群获取该目标数据的第二时长,以便根据第一时长与第二时长指示目标集群在为作业调度资源的过程中获取目标数据。
在一种可能的实施方式中,所述管理模块701还用于:
预估所述目标集群为所述作业调度资源的第一时长;
计算所述目标集群获取所述目标数据的第二时长;
发送所述第一时长及所述第二时长至所述代理模块702;
所述代理模块702指示所述目标集群为所述作业调度资源,并在为所述作业调度资源的过程中获取目标数据时,具体用于:
当所述第一时长不大于所述第二时长时,在指示目标集群为所述作业调度资源之前的第一时刻,指示所述目标集群开始获取所述目标数据,并在第二时刻指示所述目标集群为所述作业调度资源,所述第一时刻与所述第二时刻之间间隔的时长,为所述第二时长减去所述第一时长。
在一种可能的实施方式中,所述管理模块701在预估所述目标集群为所述作业调度资源的第一时长时,具体用于:
确定所述作业所属的目标类别;
计算所述目标集群为所述目标类别的历史作业调度资源的平均时长;
获取所述目标集群的当前可用资源占比与历史可用资源占比,所述当前可用资源占比为所述目标集群当前的可用资源的资源量相对于总资源的资源量的比值,所述历史可用资源占比为所述目标集群在为所述作业调度资源之前的一段时间内的平均可用资源的资源量相对于总资源的资源量的比值;
根据所述平均时长、所述当前可用资源占比、所述历史可用资源占比,预估所述目标集群为所述作业调度资源的第一时长。
在一种可能的实施方式中,所述目标类别根据资源申请类别、资源申请数量、作业依赖数据量、作业所属应用、作业优先级、作业所在队列、作业所属算例中的一种或者多种进行确定。
在一种可能的实施方式中,所述代理模块702还用于:
监控所述目标集群为所述作业调度资源的第三时长,并根据所述第三时长以及所述第一时长,指示所述管理模块701发送调节指令至所述目标集群的网络控制器,所述调节指令用于指示所述网络控制器调整所述目标集群的网络带宽,或者调整所述目标集群的网络通道的数量。
在一种可能的实施方式中,当所述第三时长小于所述第一时长,且所述第一时长与所述第三时长之间的差值大于预设值时,所述调节指令用于指示所述网络控制器增加所述目标集群的网络带宽,或者增大所述目标集群的网络通道的数量。
在一种可能的实施方式中,当所述第三时长大于所述第一时长,且所述第三时长与所述第一时长之间的差值大于预设值时,所述调节指令用于指示所述网络控制器减少所述目标集群的网络带宽,或者减小所述目标集群的网络通道的数量。
在一种可能的实施方式中,所述管理模块701和代理模块702部署在一个调度器上,所述调度器连接有所述多个集群。
在一种可能的实施方式中,所述管理模块701部署在调度器上,所述代理模块702部署在所述目标集群上。
图7所示的调度器700,对应于图4至图6所示实施例中的作业调度方法,故调度器700所具有的功能及其技术效果,可参见图4至图6所示实施例中的相关之处描述,在此不做赘述。
图8为本申请提供的一种调度器的结构示意图。如图8所示,调度器800包括处理器801、存储器802、通信接口803和总线804。处理器801、存储器802和通信接口803之间通过总线804通信。总线804可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。通信接口803用于与外部通信,例如接收用户通过终端或者客户端提交的作业(以及目标数据)等。
应理解,在本申请实施例中,处理器801可以是CPU,处理器801还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立器件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。
该存储器802可以包括只读存储器和随机存取存储器,并向处理器801提供指令和数据。存储器802还可以包括非易失性随机存取存储器。例如,存储器802还可以存储设备类型的信息。
该存储器802可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
存储器802中存储有可执行代码,处理器801执行该可执行代码以执行前述管理模块101所执行的方法,或者执行前述代理模块102所执行的方法,或者执行前述管理模块101以及代理模块102所执行的方法。
具体地,在实现图4至图6所示实施例的情况下,且图4至图6所示实施例中所描述的管理模块101以及代理模块102为通过软件实现的情况下,执行图4至图6所示实施例中的管理模块101以及代理模块102的功能所需的软件或程序代码存储在存储器802中,处理器801用于执行存储器802中的指令,实现管理模块101所执行的方法,或者实现代理模块102所执行的方法,或者实现管理模块101以及代理模块102所执行的方法。
应理解,根据本申请实施例的调度器800可对应于本申请实施例中的调度器700,并可以对应于执行根据本申请实施例中图4至图6所示方法中的管理模块101以及代理模块102,并且调度器800所实现的上述和其它操作和/或功能分别为了实现图4至图6中的各个方法的相应流程,为了简洁,在此不再赘述。
此外,本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在调度器上运行时,使得该调度器执行上述实施例中管理模块101以及代理模块102所执行的方法。
此外,本申请实施例还提供了一种计算机程序产品,所述计算机程序产品被调度器执行时,所述一个或者多个调度器执行前述作业调度方法中的任一方法。该计算机程序产品可以为一个软件安装包,在需要使用前述作业调度方法的任一方法的情况下,可以下载该计算机程序产品并在计算机上执行该计算机程序产品。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本 申请的实施例中对相同属性的对象在描述时所采用的区分方式。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘(solid state drive,SSD)
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (23)

  1. 一种调度器,其特征在于,所述调度器包括管理模块和代理模块,所述调度器用于管理多个集群;
    所述管理模块用于:
    获取待处理的作业;
    从所述多个集群中确定用于处理所述作业的目标集群;
    将所述作业下发至所述代理模块;
    所述代理模块用于指示所述目标集群为所述作业调度资源,并在为所述作业调度资源的过程中获取目标数据,所述目标数据为执行所述作业时需要的数据。
  2. 根据权利要求1所述的调度器,其特征在于,所述管理模块在从所述多个集群中确定用于处理所述作业的目标集群时,具体用于:
    预估所述多个集群中的每个集群为所述作业调度资源的第一时长;
    计算所述多个集群中的每个集群获取所述目标数据的第二时长;
    取所述第一时长及所述第二时长中的较大值作为该集群处理所述作业的等待时长;
    将所述多个集群中等待时长最小的集群作为所述目标集群。
  3. 根据权利要求1所述的调度器,其特征在于,所述管理模块还用于:
    预估所述目标集群为所述作业调度资源的第一时长;
    计算所述目标集群获取所述目标数据的第二时长;
    发送所述第一时长及所述第二时长至所述代理模块;
    所述代理模块在指示所述目标集群为所述作业调度资源,并在为所述作业调度资源的过程中获取目标数据时,具体用于:
    当所述第一时长大于所述第二时长时,指示所述目标集群为所述作业调度资源,并在为所述作业调度资源的过程中获取目标数据。
  4. 根据权利要求1所述的调度器,其特征在于,所述管理模块还用于:
    预估所述目标集群为所述作业调度资源的第一时长;
    计算所述目标集群获取所述目标数据的第二时长;
    发送所述第一时长及所述第二时长至所述代理模块;
    所述代理模块在指示所述目标集群为所述作业调度资源,并在为所述作业调度资源的过程中获取目标数据时,具体用于:
    当所述第一时长不大于所述第二时长时,在指示所述目标集群在为所述作业调度资源之前的第一时刻,指示所述目标集群开始获取所述目标数据,并在第二时刻指示所述目标集群为所述作业调度资源,所述第一时刻与所述第二时刻之间间隔的时长,为所述第二时长减去所述第一时长。
  5. 根据权利要求4所述的调度器,其特征在于,所述管理模块在预估所述目标集群为所述作业调度资源的第一时长时,具体用于:
    确定所述作业所属的目标类别;
    计算所述目标集群为所述目标类别的历史作业调度资源的平均时长;
    获取所述目标集群的当前可用资源占比与历史可用资源占比,所述当前可用资源占比为所述目标集群当前的可用资源的资源量相对于总资源的资源量的比值,所述历史可用资源占比为所述目标集群在为所述作业调度资源之前的一段时间内的平均可用资源的资源量相对于总资源的资源量的比值;
    根据所述平均时长、所述当前可用资源占比、所述历史可用资源占比,预估所述目标集群为所述作业调度资源的第一时长。
  6. 根据权利要求1或3所述的调度器,其特征在于,所述代理模块还用于:
    监控所述目标集群为所述作业调度资源的第三时长,并根据所述第三时长以及所述第一时长,指示所述管理模块发送调节指令至所述目标集群的网络控制器,所述调节指令用于指示所述网络控制器调整所述目标集群的网络带宽,或者调整所述目标集群的网络通道的数量。
  7. 根据权利要求6所述的调度器,其特征在于,当所述第三时长小于所述第一时长,且所述第一时长与所述第三时长之间的差值大于预设值时,所述调节指令用于指示所述网络控制器增加所述目标集群的 网络带宽,或者增大所述目标集群的网络通道的数量。
  8. 根据权利要求6所述的调度器,其特征在于,当所述第三时长大于所述第一时长,且所述第三时长与所述第一时长之间的差值大于预设值时,所述调节指令用于指示所述网络控制器减少所述目标集群的网络带宽,或者减小所述目标集群的网络通道的数量。
  9. 根据权利要求1-8任意一项所述的调度器,其特征在于,所述管理模块和所述代理模块部署在一个计算设备上,所述计算设备连接有所述多个集群。
  10. 根据权利要求1-8任意一项所述的调度器,其特征在于,所述管理模块部署在计算设备上,所述代理模块部署在所述目标集群上。
  11. 一种作业调度方法,其特征在于,所述方法应用于调度器,所述调度器包括管理模块和代理模块,所述方法包括:
    所述管理模块获取待处理的作业;
    所述管理模块从所述多个集群中确定用于处理所述作业的目标集群;
    所述管理模块将所述作业下发至所述代理模块;
    所述代理模块指示所述目标集群为所述作业调度资源,并在为所述作业调度资源的过程中获取目标数据,所述目标数据为执行所述作业时需要的数据。
  12. 根据权利要求11所述的方法,其特征在于,所述管理模块从所述多个集群中确定用于处理所述作业的目标集群,包括:
    所述管理模块预估所述多个集群中的每个集群为所述作业调度资源的第一时长;
    所述管理模块计算所述多个集群中的每个集群获取所述目标数据的第二时长;
    所述管理模块取所述第一时长及所述第二时长中的较大值作为该集群处理所述作业的等待时长;
    所述管理模块将所述多个集群中等待时长最小的集群作为所述目标集群。
  13. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    所述管理模块预估所述目标集群为所述作业调度资源的第一时长;
    所述管理模块计算所述目标集群获取所述目标数据的第二时长;
    所述管理模块发送所述第一时长及所述第二时长至所述代理模块;
    所述代理模块指示所述目标集群为所述作业调度资源,并在为所述作业调度资源的过程中获取目标数据,包括:
    当所述第一时长大于所述第二时长时,所述代理模块指示所述目标集群为所述作业调度资源,并在为所述作业调度资源的过程中获取目标数据。
  14. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    所述管理模块预估所述目标集群为所述作业调度资源的第一时长;
    所述管理模块计算所述目标集群获取所述目标数据的第二时长;
    所述管理模块发送所述第一时长及所述第二时长至所述代理模块;
    所述代理模块指示所述目标集群为所述作业调度资源,并在为所述作业调度资源的过程中获取目标数据,包括:
    当所述第一时长不大于所述第二时长时,所述代理模块在指示所述目标集群在为所述作业调度资源之前的第一时刻,指示所述目标集群开始获取所述目标数据,并在第二时刻并指示所述目标集群为所述作业调度资源,所述第一时刻与所述第二时刻之间间隔的时长,为所述第二时长减去所述第一时长。
  15. 根据权利要求14所述的方法,其特征在于,所述管理模块预估所述目标集群为所述作业调度资源的第一时长,包括:
    所述管理模块确定所述作业所属的目标类别;
    所述管理模块计算所述目标集群为所述目标类别的历史作业调度资源的平均时长;
    所述管理模块获取所述目标集群的当前可用资源占比与历史可用资源占比,所述当前可用资源占比为所述目标集群当前的可用资源的资源量相对于总资源的资源量的比值,所述历史可用资源占比为所述目标集群在为所述作业调度资源之前的一段时间内的平均可用资源的资源量相对于总资源的资源量的比值;
    所述管理模块根据所述平均时长、所述当前可用资源占比、所述历史可用资源占比,预估所述目标集群为所述作业调度资源的第一时长。
  16. 根据权利要求11或13所述的方法,其特征在于,所述方法还包括:
    所述代理模块监控所述目标集群为所述作业调度资源的第三时长;
    所述代理模块根据所述第三时长以及所述第一时长,指示所述管理模块发送调节指令至所述目标集群的网络控制器,所述调节指令用于指示所述网络控制器调整所述目标集群的网络带宽,或者调整所述目标集群的网络通道的数量。
  17. 根据权利要求16所述的方法,其特征在于,当所述第三时长小于所述第一时长,且所述第一时长与所述第三时长之间的差值大于预设值时,所述调节指令用于指示所述网络控制器增加所述目标集群的网络带宽,或者增大所述目标集群的网络通道的数量。
  18. 根据权利要求16所述的方法,其特征在于,当所述第三时长大于所述第一时长,且所述第三时长与所述第一时长之间的差值大于预设值时,所述调节指令用于指示所述网络控制器减少所述目标集群的网络带宽,或者减小所述目标集群的网络通道的数量。
  19. 根据权利要求11-18任意一项所述的方法,其特征在于,所述管理模块和所述代理模块部署在一个调度器上,所述调度器连接有所述多个集群。
  20. 根据权利要求11-18任意一项所述的方法,其特征在于,所述管理模块部署在调度器上,所述代理模块部署在所述目标集群上。
  21. 一种调度器,其特征在于,所述调度器包括处理器及存储器,所述存储器存储有程序指令,所述处理器运行所述程序指令以执行权利要求11至20任意一项所述的方法中所述管理模块及所述代理模块所执行的步骤。
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当其在调度器上运行时,使得所述调度器执行如权利要求11至20任一项所述的方法。
  23. 一种包含指令的计算机程序产品,当其在调度器上运行时,使得所述调度器执行如权利要求11至20任一项所述的方法。
PCT/CN2023/101278 2022-08-17 2023-06-20 一种调度器、作业调度方法及相关设备 WO2024037173A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210988472.7 2022-08-17
CN202210988472.7A CN117632398A (zh) 2022-08-17 2022-08-17 一种调度器、作业调度方法及相关设备

Publications (1)

Publication Number Publication Date
WO2024037173A1 true WO2024037173A1 (zh) 2024-02-22

Family

ID=89940596

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101278 WO2024037173A1 (zh) 2022-08-17 2023-06-20 一种调度器、作业调度方法及相关设备

Country Status (2)

Country Link
CN (1) CN117632398A (zh)
WO (1) WO2024037173A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018206994A1 (en) * 2017-05-11 2018-11-15 Bull Sas Method of managing resource providing in a computers cluster running jobs
CN110247979A (zh) * 2019-06-21 2019-09-17 北京邮电大学 一种调度方案确定方法、装置及电子设备
CN110457397A (zh) * 2019-08-16 2019-11-15 深圳前海微众银行股份有限公司 一种数据同步的方法及装置
CN111405055A (zh) * 2020-03-23 2020-07-10 北京达佳互联信息技术有限公司 多集群管理方法、系统、服务器、存储介质
CN112486658A (zh) * 2020-12-17 2021-03-12 华控清交信息科技(北京)有限公司 一种任务调度方法、装置和用于任务调度的装置
US20210096996A1 (en) * 2019-10-01 2021-04-01 Microsoft Technology Licensing, Llc Cache and i/o management for analytics over disaggregated stores
CN114490002A (zh) * 2022-02-17 2022-05-13 上海阵量智能科技有限公司 数据处理系统、任务调度方法、装置、芯片、及电子设备
CN114661462A (zh) * 2022-03-04 2022-06-24 阿里巴巴(中国)有限公司 资源分配方法、系统、计算机可读存储介质及电子设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018206994A1 (en) * 2017-05-11 2018-11-15 Bull Sas Method of managing resource providing in a computers cluster running jobs
CN110247979A (zh) * 2019-06-21 2019-09-17 北京邮电大学 一种调度方案确定方法、装置及电子设备
CN110457397A (zh) * 2019-08-16 2019-11-15 深圳前海微众银行股份有限公司 一种数据同步的方法及装置
US20210096996A1 (en) * 2019-10-01 2021-04-01 Microsoft Technology Licensing, Llc Cache and i/o management for analytics over disaggregated stores
CN111405055A (zh) * 2020-03-23 2020-07-10 北京达佳互联信息技术有限公司 多集群管理方法、系统、服务器、存储介质
CN112486658A (zh) * 2020-12-17 2021-03-12 华控清交信息科技(北京)有限公司 一种任务调度方法、装置和用于任务调度的装置
CN114490002A (zh) * 2022-02-17 2022-05-13 上海阵量智能科技有限公司 数据处理系统、任务调度方法、装置、芯片、及电子设备
CN114661462A (zh) * 2022-03-04 2022-06-24 阿里巴巴(中国)有限公司 资源分配方法、系统、计算机可读存储介质及电子设备

Also Published As

Publication number Publication date
CN117632398A (zh) 2024-03-01

Similar Documents

Publication Publication Date Title
WO2018120993A1 (zh) 一种分布式系统任务分配的方法和装置
US11977784B2 (en) Dynamic resources allocation method and system for guaranteeing tail latency SLO of latency-sensitive application
US9112809B2 (en) Method and apparatus for controlling utilization in a horizontally scaled software application
CN108667748B (zh) 一种控制带宽的方法、装置、设备和存储介质
US20230179538A1 (en) Systems and methods for provision of a guaranteed batch
CN107688492B (zh) 资源的控制方法、装置和集群资源管理系统
CN109564528B (zh) 分布式计算中计算资源分配的系统和方法
CN105379212B (zh) 管理内容分发网络中电子内容摄取的方法、控制器及系统
CN102388381A (zh) 用于分配共享存储资源的系统和方法
CN105373429A (zh) 任务调度方法、装置及系统
CN112749002A (zh) 一种集群资源动态管理的方法和装置
CN112463044A (zh) 一种保证分布式存储系统服务器端读尾延迟的方法及系统
US20140359182A1 (en) Methods and apparatus facilitating access to storage among multiple computers
CN113010309B (zh) 集群资源调度方法、装置、存储介质、设备和程序产品
CN107423134A (zh) 一种大规模计算集群的动态资源调度方法
WO2024037173A1 (zh) 一种调度器、作业调度方法及相关设备
WO2021115482A1 (zh) 一种令牌的调整方法及装置
CN113824652B (zh) 一种用于调度队列的方法及装置
US10853138B2 (en) Scheduling resource usage
Lu et al. Graduated QoS by decomposing bursts: Don't let the tail wag your server
KR20150012071A (ko) 다중 사용자를 위한 자원 할당 방법 및 장치
CN111782626A (zh) 任务分配方法和装置、分布式系统、电子设备和介质
CN109062707B (zh) 电子装置及其限制进程间通信的方法、存储介质
CN114489463A (zh) 动态调整存储卷qos的方法、装置及计算设备
Huang et al. A dynamic and complexity aware cloud scheduling algorithm for video transcoding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23854078

Country of ref document: EP

Kind code of ref document: A1