CN114868112A - Variable job resource representation and scheduling for cloud computing - Google Patents

Variable job resource representation and scheduling for cloud computing Download PDF

Info

Publication number
CN114868112A
CN114868112A CN201980101440.7A CN201980101440A CN114868112A CN 114868112 A CN114868112 A CN 114868112A CN 201980101440 A CN201980101440 A CN 201980101440A CN 114868112 A CN114868112 A CN 114868112A
Authority
CN
China
Prior art keywords
resource usage
job
jobs
host computing
run
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980101440.7A
Other languages
Chinese (zh)
Inventor
丁晓宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Publication of CN114868112A publication Critical patent/CN114868112A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing

Abstract

A data center scheduling system that schedules workloads for a plurality of host computing systems in a data center includes a processing unit that receives a resource usage profile for each of a plurality of jobs. Each resource usage profile has a plurality of resource usage entries for the job over a plurality of respective time intervals. The processing unit schedules a first job of the plurality of jobs to run on a first host computing system of the plurality of host computing systems according to a negative correlation between a combination of resource usage profiles of the first job and resource usage profiles of a first set of jobs scheduled to run on the first host computing system. The system monitors the resources used by the job when executed to generate an updated resource usage profile, and reschedules the job between the host computing systems using the updated profile.

Description

Variable job resource representation and scheduling for cloud computing
Cross reference to related applications
Is free of
Technical Field
A method for scheduling a plurality of jobs on a host computer is disclosed. In particular, these jobs are scheduled to achieve better quality of service (QoS) and more consistent resource usage over time.
Background
Data centers for network-connected (e.g., "cloud") data processing services may use thousands or even hundreds of thousands of servers that are used to provide data processing services to many types of clients. These services may include, but are not limited to, email systems, e-commerce systems, back-end processing of brick and mortar retail systems, social media systems, video streaming systems, and systems implementing web application services. These systems use different resources of the data center at different times, respectively, resulting in different loads on the servers.
Data centers use Virtual Machine (VM) orchestration systems or container orchestration systems (e.g., Virtual Machine (VM) orchestration systems)
Figure BDA0003601075560000011
Examples of the vessel (A)
Figure BDA0003601075560000012
Container instruments, ACI) and
Figure BDA0003601075560000013
elastic Container Service (ECS)) manages its workload to automatically deploy, extend, and manage jobs in a network-connected data processing system. These systems allow a subscriber of a network-connected data processing service to describe how jobs are executed on one or more servers of the data processing service. The orchestration system integrates related applications and data into containers or VMs, which are grouped into jobs. As used herein, a "job" is a grouping of containers or VMs to be hosted on a host. The job is also called pod. A container is a collection of applications and libraries that point to a particular function or set of interrelated functions. A "job file" is a job description of a container orchestration system or VM orchestration system. The host may include one or more servers for running applications and libraries in each container of the job. A data center may include a number of hosts, eachThe host may be used to run multiple jobs simultaneously.
For clarity, the examples described below describe jobs, each job including one or more containers. However, it is contemplated that a job may include one or more VMs in addition to one or more containers, or may include only VMs. Thus, any reference to a container may be interpreted as a reference to a container or a VM.
Disclosure of Invention
The following examples describe a method for scheduling jobs for execution by multiple host computing systems in a data center that takes into account relative resource usage of the jobs such that jobs having complementary resource usage over time are scheduled for the same host computing system. Further, these examples describe a method for monitoring actual resource usage of each job at runtime to detect patterns of actual resource usage and generate an updated resource usage profile for each job based on the monitored usage. The method then reschedules jobs between the plurality of host computing systems according to a negative correlation between the updated resource usage profiles to better utilize the host computing systems in the data center by balancing resource usage over time.
These examples are contained in the features of the independent claims. Other embodiments are apparent from the dependent claims, the description and the drawings.
According to a first aspect, a processing unit implements a computer-implemented method for scheduling workloads for a plurality of host computing systems in a data center by receiving a resource usage profile for each of a plurality of jobs. Each resource usage profile has a plurality of resource usage entries for the job over a plurality of respective time intervals. The processing unit schedules a first job to run on a first host computing system according to a negative correlation between the resource usage profile of the first job and a combination of the resource usage profiles of a first set of jobs scheduled to run on the first host computing system.
In a first implementation form of the method of the first aspect, the method comprises: executing the plurality of jobs on the plurality of host computing systems, including executing the first job on the first host computing system while monitoring respective resource usage of the plurality of jobs. The method further comprises the following steps: in response to the determined actual resource usage, respective updated resource usage profiles are generated for the plurality of jobs, including an updated resource usage profile for the first job. The method further comprises the following steps: rescheduling the first job to run on a second one of the plurality of host computing systems according to a negative correlation between the updated resource usage profile of the first job and a combination of updated resource usage profiles of a second set of jobs scheduled to run on the second host computing system.
In a second implementation form of the method according to the first aspect, the method comprises: rescheduling the plurality of jobs to run on respective ones of the plurality of host computing systems according to respective negative correlations between the updated resource usage profile of each job of the plurality of jobs and respective combinations of updated resource usage profiles of respective sets of jobs scheduled to run on the plurality of host computing systems.
In a third implementation form of the first aspect, the method includes: monitoring respective performance of the plurality of jobs to generate an updated resource usage profile by: for each job, the actual resource usage data for the job is recorded at each of a plurality of time instants within a predetermined time interval, and the actual resource usage data is grouped into a plurality of time windows, each time window comprising at least one of the plurality of time instants. The method further comprises the following steps: normalizing the grouped resource usage data for each job to a plurality of discrete usage levels to obtain a timing of resource usage, and auto-correlating the timing of resource usage with respective different offsets of the timing to identify a repeating pattern of the resource usage for the job. The method further comprises the following steps: generating the updated resource usage profile for each job in accordance with the identified repeating pattern of the resource usage of the job.
In a fourth implementation form of the first aspect, each of the resource usage profiles includes resource usage entries of at least one of memory occupancy, Central Processing Unit (CPU) occupancy, network bandwidth, disk input/output (I/O) bandwidth, CPU cache occupancy, or memory bandwidth.
In a fifth implementation form of the first aspect, the scheduling the first job to run on the first host computing system further comprises: comparing each entry in the resource usage profile of the first job to a corresponding resource availability value of the first host; scheduling the first job to run on the first host computing system when none of the resource usage profile entries exceeds the corresponding resource availability value.
In a sixth implementation of the first aspect, the scheduling the first job to run on the first host computing system further comprises: determining that the negative correlation between the resource usage profile of the first job and a combination of the resource usage profiles of the first set of jobs is greater than a threshold.
In a seventh implementation of the first aspect, the scheduling the first job to run on the first host computing system further comprises: comparing negative correlations between the resource usage profiles of the first job and respective combinations of resource usage profiles of the jobs scheduled for all other host computing systems, and scheduling the first job to run on the host computing system having the largest negative correlation with the resource usage profile of the first job.
According to a second aspect, an apparatus for scheduling workloads of a plurality of host computing systems in a data center, the apparatus comprising a memory storing program instructions and a processing unit coupled to the memory. The program instructions cause the processing unit to receive a resource usage profile for each job of a plurality of jobs, wherein each resource usage profile has a plurality of resource usage entries for the job over a plurality of respective time intervals. The program instructions also cause the processing unit to schedule a first job to run on a first one of the host computing systems based on a negative correlation between the resource usage profile of the first job and a combination of the resource usage profiles of a first set of jobs scheduled to run on the first host computing system.
In a first implementation of the apparatus of the second aspect, the program instructions further configure the processing system to cause the plurality of jobs to be executed on the plurality of host computing systems, including causing the first job to be executed on the first host computing system. The apparatus monitors respective resources used by the plurality of jobs as the jobs are executed by the host computing system to obtain respective actual resource usage data for each of the plurality of jobs to generate respective updated resource usage profiles for the plurality of jobs, including the first job. The apparatus reschedules the first job to run on a second host computing system according to a negative correlation between the updated resource usage profile of the first job and a combination of updated resource usage profiles of a second set of jobs scheduled to run on the second host computing system.
In a second implementation of the second aspect, the rescheduling of the first job to run on the second host computing system comprises: rescheduling the plurality of jobs to run on the plurality of host computing systems according to respective negative correlations between the updated resource usage profile of each job of the plurality of jobs and respective combinations of updated resource usage profiles of respective sets of jobs scheduled to run on the plurality of host computing systems.
In a third implementation form of the second aspect, the monitoring the resources used by the plurality of jobs comprises, for each job of the plurality of jobs: recording the actual resource usage data for the job at each of a plurality of times over a predetermined time interval; grouping the actual resource usage data into a plurality of time windows, each time window including at least one of the plurality of time instants; normalizing the grouped resource usage data into a plurality of discrete usage levels to obtain a time sequence of resource usage. The monitoring the resources used by the plurality of jobs further comprises: auto-correlating each of the timings with a respective different offset of the timings to identify a repeating pattern of the resource usage of the job; generating the updated resource usage profile for the job in accordance with the identified recurring pattern of resource usage for the job.
In a fourth implementation of the second aspect, each of the resource usage profiles includes resource usage entries for at least one of memory occupancy, CPU occupancy, network bandwidth, disk I/O bandwidth, CPU cache occupancy, or memory bandwidth.
In a fifth implementation of the second aspect, the scheduling the first job to run on the first host computing system comprises: comparing each entry in the resource usage profile of the first job to a corresponding resource availability value of the first host; scheduling the first job to run on the first host computing system when none of the resource usage profile entries exceeds the corresponding resource availability value.
In a sixth implementation of the second aspect, scheduling the first job to run on the first host computing system comprises: determining that the negative correlation between the resource usage profile of the first job and a combination of the resource usage profiles of the first set of jobs is greater than a threshold.
In a seventh implementation of the second aspect, the scheduling the first job to run on the first host computing system further comprises: determining that the negative correlation between the resource usage profile of the first job and a combination of the resource usage profiles of the first set of jobs running on the first host computing system is greater than a respective negative correlation between the resource usage profile of the first job and a respective combination of resource usage profiles of other sets of jobs scheduled for other ones of the host computing systems.
According to a third aspect, a non-transitory computer-readable medium includes instructions for scheduling workloads for a plurality of host computing systems in a data center, the instructions, when executed by a processing unit, cause the processing unit to receive a resource usage profile for each of a plurality of jobs, each resource usage profile having a plurality of resource usage entries over a plurality of respective time intervals. The instructions also cause the processing unit to schedule a first job of the plurality of jobs to run on a first host computing system of the plurality of host computing systems according to a negative correlation between a combination of resource usage profiles of the first job and resource usage profiles of a first set of jobs of the plurality of jobs scheduled to run on the first host computing system.
In a first implementation of the third aspect, the instructions cause the processing unit to execute the plurality of jobs on the plurality of host computing systems, including executing the first job on the first host computing system, and monitoring resources used by the plurality of jobs to obtain respective actual resource usage data for each of the plurality of jobs. In response to the actual resource usage data, the instructions further cause the processing unit to generate respective updated resource usage profiles for the plurality of jobs, including an updated resource usage profile for the first job; and rescheduling the first job to run on a second host computing system according to a negative correlation between the updated resource usage profile of the first job and a combination of updated resource usage profiles of a second set of jobs scheduled to run on the second host computing system.
In a second implementation of the third aspect, the rescheduling of the first job to run on the second host computing system includes: rescheduling the plurality of jobs to run on respective ones of the host computing systems according to respective negative correlations between the updated resource usage profiles of the jobs and respective combinations of the updated resource usage profiles of respective sets of jobs scheduled to run on the host computing systems.
In a third implementation of the third aspect, the monitoring the respective performances of the plurality of jobs comprises, for each job: recording the actual resource usage data for the job at each of a plurality of times over a predetermined time interval; grouping the actual resource usage data into a plurality of time windows, each time window including at least one time instance of the plurality of time instances; normalizing the grouped resource usage data into a plurality of discrete usage levels to obtain a time sequence of resource usage. The monitoring further comprises: auto-correlating each of the timings with a respective different offset of the timings to identify a repeating pattern of the resource usage of the job; generating the updated resource usage profile for the job in accordance with the identified recurring pattern of resource usage for the job.
Drawings
FIGS. 1A and 1B are diagrams of resource usage by a host versus time provided by exemplary embodiments.
FIG. 2 is a block diagram of a data center environment provided by an exemplary embodiment.
Fig. 3A and 3B are block diagrams of hosts in a data center provided by an exemplary embodiment.
FIG. 4 is a functional block diagram of the operation of a data center controller provided by an exemplary embodiment.
Fig. 5, 6A, and 6B are flowcharts of exemplary methods performed by a data center controller provided by exemplary embodiments.
FIG. 7 is a block diagram of a server computing device provided by exemplary embodiments.
Detailed Description
In the following description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific embodiments which may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosed subject matter, and it is to be understood that other embodiments may be utilized and that structural, logical and electrical changes may be made without departing from the scope of the appended claims. The following description of the exemplary embodiments is, therefore, not to be taken in a limiting sense.
A data center may run jobs for many different services simultaneously. Each job may include multiple containers, and the job file entry for each container may request the resources used by the container. The data center may consolidate the requested resources listed for all containers of the job to generate a sum of the requested resources. The data center controller may schedule the job to run on the host based on the total resource request for the job and the available resources in the host. Table 1 shows a portion of an exemplary kubernets job file.
TABLE 1
Figure BDA0003601075560000051
The last two rows of table 1 represent the container requesting resources of 64 megabytes of memory and one quarter of the CPU core.
The resource requests shown in table 1 do not occupy all of the resources of the host. For example, a host may include multiple beats of bytes of memory and multiple multi-core processing units, where each core may implement a CPU. Thus, the data center controller may schedule jobs to run on a single host based on the resources requested by the jobs. One method of scheduling jobs to hosts is to group jobs according to their resource requests to ensure that the resources requested by the grouped jobs do not exceed the resources available to the hosts. However, this approach may result in insufficient host resource utilization when maximum resource usage is reached in a relatively short time. Furthermore, the resources specified for a job are typically estimates that may be exceeded during execution of an application in the job container. Thus, multiple jobs running simultaneously on a host may exceed the resources of the host, causing the host to at least temporarily suspend one or more applications in one or more jobs.
An exemplary job may use other shared resources that are difficult to estimate, such as cache memory occupancy, I/O cache occupancy, and memory bandwidth. When a combination of jobs running on the host exceeds one of these resources, the host may need to pause one of the jobs, typically a low priority background job. Once sufficient resources are available, the suspended low priority jobs may need to be restarted.
FIGS. 1A and 1B are diagrams of resource usage by a host versus time provided by exemplary embodiments. Fig. 1A and 1B illustrate a single resource, such as CPU utilization or memory utilization. However, the examples described below may monitor a plurality of resources including, but not limited to, CPU occupancy, memory occupancy, network bandwidth, disk I/O bandwidth, cache memory occupancy, I/O cache occupancy, and memory bandwidth. It is contemplated that the examples described below may monitor any host resource having a usage that may be quantified (e.g., represented by a number) and may be specified in a job file. In fig. 1A and 1B, dashed line 120 represents a resource threshold for a corresponding resource of a host. Each monitored resource has a corresponding threshold. When the resource usage of all jobs running on the host exceeds one or more of these thresholds, the host and/or the data center controller may pause one or more applications in one or more currently running jobs. To ensure that the data center controller and/or the host can gracefully suspend one or more applications, each threshold may be set to a value that is less than the maximum value of resources on the host.
FIG. 1A shows resource usage profile 102 for job 1 and resource usage profile 104 for job 2. The profiles 102 and 104 may correspond to services that exhibit similar changes, such as video streaming services in the same time zone. As shown in FIG. 1A, the resource usage profiles exhibit similar temporal variations such that the combined resource usage profile 106 has a maximum value 110 that exceeds a host resource threshold 120. Further, combined resource profile 106 shows resource usage 112 and 114 for host resources that are underutilized at other times.
The examples described below address this problem by supporting the use time component to specify a resource usage profile, and grouping jobs with complementary resource usage profiles to run on the same host. In some examples, a data center controller compares usage profiles of a plurality of jobs and groups jobs according to negative correlations between job usage profiles. For example, as shown in FIG. 1B, job 3 has resource usage profile 130 with high values 136 and 138, while at about the same time resource usage profile 130 has high values 136 and 138, resource usage profile 102 has low values 132 and 134. For example, job 3 may represent a manufacturing enterprise's database application that exhibits greater usage during working hours, complementary to a video streaming application that exhibits greater usage before and after working hours. The grouping of job 1 and job 3 results in a combined resource usage profile 140 for the host, the combined resource usage profile 140 not exceeding the resource threshold 120 and varying less than the host usage profile 106 shown in FIG. 1A. Analyzing usage profiles over time and combining profiles that exhibit at least some negative correlation may reduce instances of suspended applications and improve host utilization relative to techniques that group jobs according to a combined usage profile of the jobs that does not exceed a threshold.
Further, the examples below monitor host resources used by each of a plurality of jobs running on a host to generate a resource usage profile for the job, including resources that are difficult to estimate. These resource usage profiles specify usage profiles for different resources that vary over time. Combining jobs in the host according to these measured resource usage profiles may provide better, more consistent resource usage for jobs running on the host. In addition, background jobs running on the host may experience fewer pauses.
FIG. 2 is a block diagram of a data center environment 200 provided by an exemplary embodiment. The environment 200 shown in fig. 2 includes a data center 202 coupled to two networks 208 and 210, the two networks 208 and 210 connecting client devices 218, 220, 222, and 228 to the data center 202. The exemplary data center 202 includes a plurality of servers 204A-204N and one or more databases 205. Servers 204A through 204N may be connected to each other and to one or more databases 205 through one or more high-speed distribution paths 206. The distribution path 206 may include a collection of routers and switches that shuttle packet data among the plurality of servers 204A through 204N in the data center 202. Although one or more databases 205 are shown coupled directly to the distribution path 206, in an exemplary embodiment, the databases 205 may be one or more data storage volumes coupled to the distribution path 206 through one or more of the servers 204A-204N. The exemplary data center 202 is greatly simplified because a real data center may include hundreds or thousands of servers and data storage volumes.
In the exemplary environment 200, the network 210 may be a private, high-speed network connection of an enterprise and the network 208 may be a global information network (e.g., the Internet). One or more Internet Service Providers (ISPs) 212 and one or more cellular communication systems 214 are coupled to network 208 to couple client devices 218, 220, and 222 to data center 202. In an example embodiment, client devices 218 and 220 may be located in one or more Wireless Local Area Networks (WLANs) coupled to one or more ISPs 212 through router 216. Client device 218 may be a laptop computer and client device 220 may be a networked television receiver. The client device 222 may be a mobile device (e.g., a smartphone or tablet) that accesses services hosted by the data center through one or more cellular communication systems 214. The mobile device 222 can also connect to services hosted by the data center 202 through the router 216. In one exemplary embodiment, client device 218 is conducting an internet search, client device 220 is streaming video, and client device 222 is accessing a social media account. All of these services may be accessed directly or indirectly using jobs running on hosts 204A through 204N of data center 202.
Exemplary enterprise servers 224 and 226 may be coupled to the private network 210 to access the hosts 204A through 204N of the data center 202. In one exemplary embodiment, the client device 228 accesses a word processor and/or enterprise database hosted by the data center 202. Server 226 may maintain an enterprise database.
As described above, applications used by client devices 218, 220, 222, 228 and server 226 may have resource usage profiles that vary over time in a similar or complementary manner. The embodiments described below group these jobs to take advantage of any complementary resource usage profiles between jobs. Some embodiments modify the job file to provide a time-varying resource profile for each container in the job, as shown in Table 2.
TABLE 2
Figure BDA0003601075560000071
This modification adds time-based resource profiles to memory occupancy and CPU occupancy, which define three intervals. In the first interval, the container request uses 64 megabytes of memory and one-fourth of the CPU for four hours. In the second interval, the container requests 512 megabytes of memory and two CPUs for two hours. In the third interval, the container requests 64 megabytes of memory and one-quarter of the CPU for four hours. An exemplary job file may include timed requests for resources, spanning the container expected run time. For example, a continuously running job may request resources for a 24 hour period. While the job file request resources shown in Table 2 last for a fixed period of time, other embodiments may request resources for a particular time of day value. Furthermore, as described above, the example shown in Table 2 only requests memory and CPU resources. In addition to CPU and memory, other exemplary job files may request resources for one or more of network bandwidth, disk I/O bandwidth, cache memory occupancy, disk cache occupancy, and memory bandwidth.
As described below with reference to fig. 4-6A, a rescheduler application, which may be running on one of the hosts 204A-204N of the data center 202, may parse each job file that has been provided to the data center to extract and consolidate resource usage requests from all containers in the job file. The consolidated resource usage request may be stored as a job resource profile in a resource allocation information database accessible by the host 204A, 204B, 204C, 204D … … 204N executing the rescheduler application.
The rescheduler application periodically or aperiodically retrieves a job resource profile and compares the retrieved profiles to identify negative correlations (i.e., job files with complementary resource usage requests). The rescheduler then regroups the job files having the supplemental resource request and schedules the corresponding regrouped job to run on the host taking into account the resource usage of the grouped job file request and the available resources on the host.
Some job files may not specify resource requests for all of their containers, or may specify requests that do not conform to an estimate of the resources actually used during execution of the containers. As described below with reference to fig. 4, 5, and 6B, in some embodiments, one or more hosts may monitor actual resources used to execute a job and report actual resource usage to a pattern detector. The pattern detector may then analyze the actual resource usage to generate an actual resource usage profile and replace the stored consolidated resource request with the actual usage profile. The rescheduler may then periodically or aperiodically reevaluate the jobs according to the stored usage profile, regroup the jobs, and reschedule the regrouped jobs to improve the overall efficiency of the data center 202.
Fig. 3A and 3B are block diagrams of hosts in a data center provided by an exemplary embodiment. Fig. 3A is a block diagram of a host computer 300 for operating a data center controller 318, the data center controller 318 including a pattern detector 320 and a rescheduler 322. The host 300 includes a server 302 (which may be, for example, a rack server in a large server farm), and an interface to a data store 340. The exemplary server 302 includes one or more CPUs 304 coupled to a memory controller 308, an I/O controller 312, and a disk cache memory 314. The example memory controller 308 may be coupled to the memory 306 directly and/or through a cache memory 310. The example memory 306 may include a data center controller 318, the data center controller 318 including a pattern detector 320 and a rescheduler 322. Memory 306 also includes a job 324 scheduled by rescheduler 322 to run on host 300, an application and Operating System (OS) 326 used by server 302, monitoring software 328 for monitoring resource usage of job 324 executing on server 302, and a temporary data store 330 used by data center controller 318, OS/application 326, and monitoring software 328. Monitoring software 328 (e.g., container advisor (cadadvisor) and/or Prometheus) monitors the resource usage of job 324 during execution to adjust the usage profile used by rescheduler 322. The I/O controller 312 is also coupled to one or more network interfaces 316 connected to the networks 208 and 210 described above, as well as to routers and switches that make up the high-speed distribution path 206 between the server 302 and other servers in the data center.
The resources used by the host 300 include the number of fractional CPUs 304, the amount of memory 306, the bandwidth of the memory controller 308, the bandwidth of the cache memory 310, the bandwidth of the I/O controller 312, the bandwidth of the disk cache memory 314, and the bandwidth of the respective network interfaces 316 to the networks 208 and 210.
FIG. 3B illustrates an exemplary host 350, which may be one of the hosts 204A through 204N shown in FIG. 2, not acting as a data center controller. The host 350 includes a server 352 and an interface to the data store 340. The exemplary server 352 includes one or more CPUs 354 coupled to a memory controller 358, an I/O controller 362 and a disk cache memory 364. The example memory controller 358 may be coupled to the memory 356 directly and/or through a cache memory 360. Exemplary memory 356 may include an OS/application 370 used by server 352, jobs 372 scheduled by rescheduler 322 (shown in FIG. 3A) to run on the server, monitoring software 374 (e.g., cAdviror and/or Prometheus) for monitoring resource usage of jobs 372 executing on server 352, and temporary data storage 376 used by OS/application 370 and monitoring software 374. The I/O controller 362 is further coupled to one or more network interfaces 366 that connect with the networks 208 and 210, as well as routers and switches that make up the high-speed distribution path 206 between the server 352 and the servers of the other hosts 204A through 204N of the data center.
FIG. 4 is a functional block diagram of the operation of a portion of a data center controller provided by an exemplary embodiment. FIG. 4 illustrates a job scheduler 400 of a data center controller. Job scheduler 400 includes a rescheduler 402, a resource pattern detector 404, and a job/host resource allocation database 406. The rescheduler 402 and the resource pattern detector 404 may be the exemplary rescheduler 322 and the pattern detector 320 shown in fig. 3A, and the data center controller may be the exemplary center controller 318 shown in fig. 3A. Job/host resource allocation database 406 may reside in temporary data storage 330 and/or data storage 340 shown in fig. 3A. Job scheduler 400 interacts with hosts 204A through 204N to schedule and reschedule jobs processed by data center 202.
In an exemplary embodiment, data center controller 318 receives jobs for processing by data center 202. As each new job is processed, the rescheduler 402 extracts and merges the resource requests from the containers in each job file and stores the merged resource requests as the respective resource profile for the job. The exemplary rescheduler stores the resource profile in job/host resource allocation database 406. As described above, each extracted resource request is defined over a plurality of time intervals. The rescheduler 402 then analyzes the resource profiles stored in the database 406 to identify jobs having requested resources that exhibit at least some negative correlation between their respective time intervals. Jobs identified as having a mutual negative correlation are grouped together, and each group is assigned to (e.g., scheduled to run on) one of the hosts 204A-204N. Rescheduler 402 groups jobs according to available resources in each of hosts 204A through 204N and schedules the grouped jobs to run on the hosts. For example, prior to scheduling the group job set to run on host 204A, rescheduler 402 combines the resource usage profiles of all jobs in the group to ensure that the resources used by the group job do not exceed the available resources of host 204A.
Rescheduler 402 repeats this process for all job resource profiles and all hosts 204A through 204N in job/host resource allocation database 406. As hosts 204A through 204N execute jobs, job monitoring software 328 and/or 374, shown in FIGS. 3A and 3B, running on each host monitors the resources actually used by the jobs and reports this resource usage information to resource pattern detector 404. As described below with reference to FIG. 6B, job monitoring software 328 and/or 374 and/or pattern detector 404 analyzes this data to determine whether each job exhibits a pattern in its resource usage. When a pattern is detected, it is converted to a format compatible with the job resource usage profile and stored in job/host resource allocation database 406. The rescheduler 402 continuously re-evaluates the grouping of jobs and the scheduling of the grouped jobs to run on the hosts 204A-204N, re-groups the jobs, and re-schedules the re-grouped jobs to run on the hosts 204A-204N according to the job resource usage profiles generated from the detected patterns. In some embodiments, rescheduler 402 evaluates and reschedules jobs when the workload of the data center is expected to be relatively low to minimize any delay resulting from transferring one or more jobs between hosts 204A-204N. The rescheduler 402 may also perform the evaluation and rescheduling as part of the data center recovery operation after the data center controller 318 detects a failure of one or more of the hosts 204A through 204N.
Fig. 5, 6A, and 6B are flowcharts of exemplary methods performed by data center controller 318 according to exemplary embodiments. FIG. 5 illustrates an example method 500 performed by the rescheduler 322 and/or 402 and the pattern detector 320 and/or 404 of the data center controller 318 provided by example embodiments. Fig. 6A illustrates an exemplary method 600 performed by the rescheduler 322 and/or 402, and fig. 6B illustrates an exemplary method 650 performed by the resource pattern detector 320 and/or 404. In some embodiments, portions of the method illustrated in FIG. 6B may be performed by monitoring software 328 and/or 374.
In FIG. 5, in operation 502, the method 500 receives jobs to be run by the data center 202 and parses the job files of the jobs to extract consolidated resource requests for all containers in each job. Operation 502 stores the consolidated resource request in job/host resource allocation database 406 as the job's corresponding resource profile. As described above, each job may have requests for multiple resources (e.g., memory, CPU, network bandwidth, disk I/O bandwidth, cache memory, memory bandwidth, disk cache, and/or other quantifiable resource values). The resource requests and resource profiles described below include one or more of these resources.
Operation 504 associates the resource profiles of the jobs and groups together jobs that exhibit at least some negative correlation in their resource usage. Operation 504 also schedules the packet job to run on a respective one of hosts 204A through 204N.
Operation 506 collects runtime resource usage data for each job from the hosts 204A-204N as the job executes. In an exemplary embodiment, this resource usage data is collected by monitoring software 328 shown in FIG. 3A and/or monitoring software 374 shown in FIG. 3B. The resource usage data is analyzed by operation 508 to identify stable resource usage patterns in the data. When these patterns are identified, they are formatted into corresponding job resource usage profiles and stored in job/host resource allocation database 406 as shown in FIG. 4, as shown in operation 510. The method 500 then regroups the jobs using the updated job resource profiles and reschedules the regrouped jobs between the hosts 204A-204N.
Fig. 5 shows operation 510 occurring after operation 504. In some embodiments, these operations may run concurrently, such that operation 510 continuously updates the resource usage patterns of the jobs and hosts stored in job/host resource allocation database 406, while operation 504 continuously checks the jobs and host resource profiles in job/host resource allocation database 406 to regroup and reschedule the jobs to run on hosts 204A through 204N.
Fig. 6A illustrates a method 600 for implementing operation 504 provided by an exemplary embodiment. Operation 602 of method 600 normalizes the time intervals of the job resource profiles stored in job/host resource allocation database 406 such that all jobs define resource usage over the same set of time intervals. Some jobs in the exemplary job/host resource allocation database are unallocated jobs that have not previously been executed on the host, and job profiles for these jobs are not generated by resource pattern detector 404, but are provided by the user. A user may not be able to provide detailed resource usage information over multiple time intervals and therefore may only provide a single value for each resource. In these examples, the value of the requested resource is the same for all time intervals.
Operation 604 then schedules a job profile to run on each of the hosts 204A through 204N. In one exemplary embodiment, the jobs initially scheduled to run on hosts 204A-204N are those jobs with the largest resource requests. The resources requested by these jobs are then stored as the initial host resource profile for the corresponding host. Then, operation 604 selects one of the hosts 204A-204N as the first host.
Operation 606 begins a loop that traverses all hosts and all jobs to group jobs with negatively correlated resource usage and schedules the grouped jobs to run on the respective hosts. Operation 606 determines whether the job/host resource allocation database 407 has any jobs that are not scheduled to run on one of the hosts 204A-204N. When all jobs are scheduled, the method 600 ends at operation 626, which operation 626 transfers control, for example, to operation 506 of FIG. 5.
When operation 606 determines that some jobs have not been scheduled to run on one of the hosts 204A-204N, operation 608 selects one of the unscheduled jobs as the next job. Operation 610 then determines whether the selected job can be scheduled to run on the currently selected host. Operation 610 adds the value in each time interval of the job resource profile to the corresponding value in the host profile and compares the result to the threshold T1 for the resource in the time interval. The threshold T1 represents the maximum value of the resource on the selected host. Operation 610 uses a threshold for each resource when the job resource profile includes requests for multiple resources. As described above, resource threshold T1 may be less than the maximum value of the host's resources so that the host may gracefully suspend one or more applications of one or more jobs when one of resource thresholds T1 is exceeded during execution.
When operation 610 determines that the combination of job resource profile and host resource profile will exceed one of the current host's thresholds T1 during one of the time intervals, operation 610 returns the job to the list of unscheduled jobs and branches back to operation 606 to select another job. When operation 610 determines that adding the selected job to a job already in the host group does not exceed any resource threshold T1, operation 610 passes control to operation 612, which operation 612 calculates a negative correlation between the resource usage profile of the selected job and the resource usage profile of the selected host.
In one exemplary embodiment, operation 612 may invert the resource value in each time interval of the job resource usage profile and associate the inverted job resource profile with the host profile to generate a negative correlation value. Alternatively, operation 612 may sum the absolute differences between the corresponding resource values in each time interval and sum the absolute differences to generate a negative correlation value. Operation 614 then compares the generated negative correlation value to a threshold T2 to determine whether to add the selected job to the group of jobs currently scheduled to run on the selected host.
In another exemplary embodiment, the method 600 does not compare the negative correlation value to a threshold, but rather calculates the negative correlation value for each of the hosts 204A-204N that may receive the job and schedules the job to run on the host with the largest negative correlation value. Some jobs may not be strongly negatively correlated with some hosts. When operation 614 determines that the negative correlation value for the selected job is less than the threshold, operation 614 returns the currently selected job to the list of jobs that are not scheduled to run on any host and branches back to operation 606 to select the next job. Operation 614 identifies a job that is not strongly negatively correlated with any of hosts 204A-204N by determining that the job has been compared to all hosts 204A-204N and that the job is still unscheduled. In this case, operation 614 schedules the job to run on the current host because operation 610 has determined that the current host can receive the job without exceeding any T1 resource threshold for the current host.
When operation 614 determines that the current job may be scheduled to run on the current host, operation 616 adds the current job to the job group of the current host, removes the current job from the unscheduled job list, and updates the host resource profile of the current host by adding the resource profile of the current job to the host resource profile. Operation 616 stores the updated host profile in job/host resource allocation database 406.
Operation 618 then determines whether the current host can receive any more jobs. In one exemplary embodiment, this determination may be made by associating each resource entry in each time interval of the current host profile with a value equal to a certain percentage (e.g., 90% to 98%) of the corresponding threshold T1 for the resource. When operation 618 determines that the current host is not full, it transfers control to operation 606 to select the next job. When operation 618 determines that the current host is full, operation 620 deletes the current host from the host list. Following operation 620, operation 622 determines whether the host list is empty, and if so, the method 600 ends at operation 626. When operation 622 determines that the host list is not empty, operation 624 selects the next host in the list as the current host, and passes control to operation 606 to select the next job from the job list as the current job.
In some exemplary embodiments, the method 600 illustrated in fig. 6A may be performed periodically by: the host profile stored in job/host resource allocation database 406 is cleared and method 600 is restarted in operation 602 to reschedule the job to run on the host according to the updated job profile. In this embodiment, the job may continue to run using its existing scheduled host allocation until a new host allocation is determined. In other embodiments, method 600 may run continuously and update the host profile based on updates to the job profile. In one exemplary embodiment, when the updated host profile indicates that one of the T1 resource thresholds has been exceeded, or the rescheduler 404 has found a better host (e.g., another host with sufficient resources and a negative correlation value that is also greater than the negative correlation value of the current host), the job executing on that host may be deleted from the host group and added to the job list. When the updated host profile indicates that additional resources are available, method 600 may add the host with the additional resources to the host list and schedule the newly received job or other jobs from the job list to run on hosts 402A through 402N according to the newly available resources, including the host with the updated host profile, as described above.
Fig. 6B is a flowchart of an exemplary method 650 performed by operation 508 of fig. 5. In an example embodiment, method 650 implements pattern detector 320 shown in FIG. 3A and resource pattern detector 404 shown in FIG. 4. Method 650 collects and analyzes job resource usage to identify patterns of the resource usage and converts any identified patterns to job resource profiles for storage in job/host resource allocation database 406. The method 650 determines a single usage pattern for a single resource. In some embodiments, multiple copies of method 650 may be run simultaneously to obtain resource usage patterns for multiple resources and multiple jobs. In other embodiments, method 650 may be modified to determine multiple resource patterns for a job simultaneously. While the exemplary embodiments shown in fig. 3A and 4 show the pattern detector 320 and the resource pattern detector 404 implemented in a data center controller, in other embodiments, the pattern detector 320 or the resource pattern detector 404 may be implemented as components of the monitoring software 328 shown in fig. 3A and/or the monitoring software 374 shown in fig. 3B. Implementing the resource pattern detector in the monitoring software may facilitate transmitting less data from hosts 204A through 204N to host 300 shown in fig. 3A running a data center controller.
The example method 650 begins when operation 652 receives a time series of data points that describe resource usage of the job within a target time interval. In an exemplary embodiment, the target time interval includes an amount of time that the job is expected to exhibit the resource usage pattern. For example, when a job is expected to have an hourly or daily resource usage pattern, the time series of data points may span two or more hours or days, respectively.
Similarly, when a job is expected to have a weekly, monthly or yearly resource usage pattern, the time series of data points may span two weeks, two months or two or more weeks, months or years. When there is no expected resource usage pattern, the target time interval may span a longer time interval (e.g., two weeks or two months) to ensure that patterns within a shorter time interval may be detected. When the method 650 determines multiple patterns simultaneously, the time series of data points may be a two-dimensional array with resource types in one dimension and time in another dimension.
Next, operation 654 of method 650 groups the data points into a time window. In an exemplary embodiment, the time window may be a sub-interval of the target time interval. For example, when the target time interval is one day, the time windows may be one hour each. Further, since resource usage is based on users accessing the services provided by the job, in some examples, the time windows are aligned with a clock and/or calendar such that the first hourly time window begins at midnight and the first daily time window begins at monday.
After grouping the data points into time windows at operation 654, operation 656 of the example method 650 normalizes the grouped data points to a resource usage range. Operation 656 assigns a single value of the set of predetermined values to each window. In one exemplary embodiment, the value is generated by rounding up the values in the time window. In some embodiments, the value may be a value used to quantify the resource. For example, when the resource is memory occupancy, the value may be 1024 (2) 10 ) Multiples of (a). Thus, if the memory occupancy in a time window is (800, 830), then this is the caseThe assigned value of the time window would be 1024. For example, a time series of data points representing memory occupancy collected at one and a half time interval may be 800, 830, 1200, 1300, 1555, 1590, 1600, 1650, 920, 940, 1820, 1800, 1830, 1920, 1890, 1900, where each value represents the number of megabytes of memory used by a job in the corresponding time interval. This sequence can be grouped into eight one-hour groups of two data points each. This sequence of values is truncated to help illustrate the method 650. The actual sequence of values may have a finer granularity and span multiple days to ensure that any determined correlation is stable. As described above, the groupings may be associated with a user's activities, and thus, each group may include data points that span an hour.
An exemplary packet may be represented as follows: (800, 830), (1200, 1300), (1555, 1590), (1600, 1650), (920, 940), (1820, 1800), (1830, 1920), (1890, 1900). Each group may then be assigned a single normalization value (in this example, a multiple of 1024) to generate a normalized sequence. This example normalizes each group to be a minimum multiple of 1024 greater than any value in the group. In this example, the normalized sequence is 1024, 2048, 1024, 2048.
Once operation 656 generates the normalized sequence of values, operation 658 sets the initial offset to be used in the autocorrelation operation 660 and sets the Boolean variable (Boolean variable) PATTERN FOUND to FALSE (FALSE). In this example, the initial offset is one hour. An autocorrelation operation 660 then associates the normalized sequence with itself, offset by the current offset. An exemplary autocorrelation operation may find the difference between corresponding pairs of values of the original sequence and the offset sequence and sum the results. An operation 662 then compares the result of the autocorrelation operation to a threshold. A result greater than the threshold value indicates that there is a relatively strong correlation between the sequence and the offset sequence, indicating that a pattern is present. If the current offset N cannot generate an autocorrelation value that is greater than the threshold, then the next offset (N +1) will be checked. This process will repeat until an autocorrelation value greater than a threshold is found or a predetermined maximum number of attempts is reached.
In the above example, the autocorrelation result generated by operation 660 is less than the threshold, so operation 664 compares the offset to the maximum offset. In this example, the maximum offset may be 24 hours. Since the offset is less than 24 hours for one hour, operation 666 increases the offset to two hours and moves to operation 660 to calculate the autocorrelation with the new offset. The result of this operation is also less than the threshold, so operations 664 and 666 increase the offset to three hours, then four hours. In this example, operation 662 determines that the autocorrelation generated at an offset of four hours is greater than a threshold and passes control to operation 668 to extract a pattern and output the pattern as a resource profile. Operation 668 also sets the boolean variable PATTERN FOUND to TRUE (TRUE). In this example, the extracted profile is: { {1024m,1hr }, {2048m,3hr } }, indicating the mode of operation over four hours as: 1024 megabytes of memory are used for one hour, with 2048 megabytes of memory used for the next three hours. After operation 668 or after operation 664, if the offset is greater than or equal to the maximum offset, the method 650 terminates at operation 670. When the method 650 terminates after operation 664, the method 650 does not detect a PATTERN and the boolean variable PATTERN FOUND is false.
As described above, any patterns found by method 650 are stored in job/host resource allocation database 406 shown in FIG. 4 as resource profiles for jobs processed by method 650. The rescheduler 402 uses this resource profile for the job when executing the next reschedule operation.
FIG. 7 is a block diagram of a server computing device 700 provided by an exemplary embodiment. Similar components may be used for other types of computing devices. For example, the client, server, and network resources may each use a different set of components shown in fig. 7 and/or computing components not shown in fig. 7 (e.g., components shown in fig. 3A and 3B).
One exemplary server computing device 700 may include a processor (e.g., one or more CPUs) 702, a memory 703, a removable memory 710, and a non-removable memory 712 communicatively coupled by a bus 701. Other types of computing devices include smartphones, tablets, smartwatches, laptops, workstations, or other computing devices. Devices such as smartphones, tablets, and smartwatches are commonly referred to collectively as mobile devices or user devices. Further, although the various data storage elements are shown as components of the server computing device 700, the removable memory 710 may additionally or alternatively include memory in other servers of the data center 202, accessible through the distribution path 206 as shown in fig. 2.
The memory 703 may include volatile memory 714 and non-volatile memory 708. The server computing device 700 may include or have access to a computing environment that includes a variety of computer-readable media, such as volatile 714 and non-volatile 708 memory, removable 710 and non-removable 712 memory. Computer memory includes Random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium capable of storing computer-readable instructions.
The server computing device 700 may include or have access to a computing environment that includes an input interface 706, an output interface 704, and a communication interface 716. The output interface 704 may provide an interface to a display device such as a touch screen, which may also serve as an input device. The input interface 706 may provide an interface to one or more of a touch screen, a touch pad, a mouse, a keyboard, a camera, one or more device-specific buttons, one or more sensors and/or other input devices integrated within the server computing device 700 or coupled to the server computing device 700 through a wired or wireless data connection. The server computing device 700 may operate in a network environment using the communication interface 716 to use distribution routesPath 206 is connected to one or both of networks 208 or 210 and/or to one or more other servers 204A-204N in data center 202. The communication interface may include a Local Area Network (LAN), a Wide Area Network (WAN), a cellular network, a WLAN network, and/or
Figure BDA0003601075560000141
One or more of the interfaces of the network.
Computer readable instructions stored in a computer readable medium may be executed by the processor 702 of the server computing device 700. The computer readable instructions may include application programs 718, such as pattern detector 320, rescheduler 322, jobs 324 and/or 372, monitoring software 328 and/or 374, and/or OS/ application programs 326 and 370, stored in memory 703. Hard disks, CD-ROMs, RAMs, and flash memory are some examples of articles of manufacture that include a non-transitory computer-readable medium, such as a storage device. The terms computer-readable medium and storage device do not include a carrier wave because a carrier wave is too transitory.
In one embodiment, the functions or algorithms described herein may be implemented using software. The software may include computer-executable instructions stored in a computer-readable medium or computer-readable storage device (e.g., one or more non-transitory memories or other types of hardware-based local or network storage devices), such as in application programs 718. The exemplary apparatus provided by the embodiments described herein implement software or computer instructions to perform query processing, including database management system (DBMS) query processing. Further, these functions correspond to modules, which may be software, hardware, firmware, or any combination thereof. Multiple functions may be performed in one or more modules as desired, and the described embodiments are merely examples. Software may be executed on one or more CPUs, including single or multi-core processors, digital signal processors, Application Specific Integrated Circuits (ASICs), microprocessors, or other types of processors running on a computer system, turning such computer system into a specially programmed machine.
A server computing device may serve as server 300 and/or 350. In some examples, server computing device 300, 350, and/or 700 includes memory 306, 356, and/or 703, including applications 318, 320, 322, 324, 326, 328, 370, 372, 374, and/or 718. Server computing devices 300, 350, and/or 700 include processors 304, 354, and/or 702, which may include one or more single cores in a multi-core processor.
Although several embodiments have been described in detail above, other modifications may be made. For example, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. Other steps may be provided, or steps may be deleted, from, the described flows, and other components may be added to, or removed from, the described systems. Other embodiments may be within the scope of the following claims.

Claims (20)

1. A computer-implemented method for scheduling workloads for a plurality of host computing systems in a data center, the method comprising:
a processing unit receiving a resource usage profile for each of a plurality of jobs, each resource usage profile having a plurality of resource usage entries for the job in a plurality of respective time intervals;
the processing unit schedules a first job of the plurality of jobs to run on a first host computing system of the plurality of host computing systems according to a negative correlation between a resource usage profile of the first job and a combination of resource usage profiles of a first set of jobs of the plurality of jobs scheduled to run on the first host computing system.
2. The computer-implemented method of claim 1, further comprising:
executing the plurality of jobs on the plurality of host computing systems, including executing the first job on the first host computing system;
monitoring respective resource usage of the plurality of jobs as they are executed on the host computing system to obtain respective actual resource usage data for each of the plurality of jobs;
generating respective updated resource usage profiles for the plurality of jobs, including an updated resource usage profile for the first job, in response to the obtained actual resource usage;
the processing unit reschedules the first job to run on a second one of the plurality of host computing systems according to a negative correlation between the updated resource usage profile of the first job and a combination of updated resource usage profiles of a second set of jobs of the plurality of jobs scheduled to run on the second one of the plurality of host computing systems.
3. The computer-implemented method of claim 2, wherein said rescheduling the first job to run on the second host computing system comprises: rescheduling the plurality of jobs to run on respective ones of the plurality of host computing systems according to respective negative correlations between the updated resource usage profile of each job of the plurality of jobs and respective combinations of updated resource usage profiles of respective sets of jobs scheduled to run on the plurality of host computing systems.
4. The computer-implemented method of claim 2, wherein the monitoring the respective resource usage of the plurality of jobs comprises, for each job of the plurality of jobs:
recording the actual resource usage data for the job at each of a plurality of times over a predetermined time interval;
grouping the actual resource usage data into a plurality of time windows, each time window including at least one of the plurality of time instants;
normalizing the grouped resource usage data into a plurality of discrete usage levels to obtain a time sequence of resource usage;
auto-correlating a timing of the resource usage with a respective different offset of the timing to identify a repeating pattern of the resource usage for the job;
generating the updated resource usage profile for the job in accordance with the identified recurring pattern of the resource usage of the job.
5. The computer-implemented method of claim 1, wherein each of the resource usage profiles comprises resource usage entries for at least one of memory occupancy, Central Processing Unit (CPU) occupancy, network bandwidth, disk input/output (I/O) bandwidth, CPU cache occupancy, or memory bandwidth.
6. The computer-implemented method in accordance with claim 1, wherein said scheduling the first job to run on the first host computing system further comprises:
comparing each entry in the resource usage profile of the first job to a corresponding resource availability value of the first host;
scheduling the first job to run on the first host computing system when none of the resource usage profile entries exceeds the corresponding resource availability value.
7. The computer-implemented method in accordance with claim 1, wherein said scheduling the first job to run on the first host computing system further comprises:
determining that the negative correlation between the resource usage profile of the first job and a combination of the resource usage profiles of the first set of jobs is greater than a threshold.
8. The computer-implemented method in accordance with claim 1, wherein said scheduling the first job to run on the first host computing system further comprises:
determining that the negative correlation between the resource usage profile of the first job and a combination of the resource usage profiles of the first set of jobs running on the first host computing system is greater than the respective negative correlation between the resource usage profile of the first job and a respective combination of resource usage profiles of other sets of jobs scheduled for other host computing systems of the plurality of host computing systems.
9. An apparatus for scheduling workloads for a plurality of host computing systems in a data center, the apparatus comprising:
a memory comprising program instructions;
a processing unit coupled to the memory, wherein the program instructions cause the processing unit to perform operations comprising:
receiving a resource usage profile for each job of a plurality of jobs, each resource usage profile having a plurality of resource usage entries for the job in a plurality of respective time intervals;
scheduling a first job of the plurality of jobs to run on a first host computing system of the plurality of host computing systems according to a negative correlation between the resource usage profile of the first job and a combination of the resource usage profiles of a first set of jobs of the plurality of jobs scheduled to run on the first host computing system.
10. The apparatus of claim 9, wherein the operations further comprise:
executing the plurality of jobs on the plurality of host computing systems, including executing the first job on the first host computing system;
monitoring respective resource usage of the plurality of jobs as they are executed on the host computing system to obtain respective actual resource usage data for each of the plurality of jobs;
generating respective updated resource usage profiles for the plurality of jobs, including an updated resource usage profile for the first job, in response to the obtained actual resource usage;
rescheduling the first job to run on a second one of the plurality of host computing systems according to a negative correlation between the updated resource usage profile of the first job and a combination of updated resource usage profiles of a second set of jobs of the plurality of jobs scheduled to run on the second one of the plurality of host computing systems.
11. The apparatus of claim 10, wherein the operation of rescheduling the first job to run on the second host computing system comprises: rescheduling the plurality of jobs to run on respective ones of the plurality of host computing systems according to respective negative correlations between the updated resource usage profile of each job of the plurality of jobs and respective combinations of updated resource usage profiles of respective sets of jobs scheduled to run on the plurality of host computing systems.
12. The apparatus of claim 10, wherein the operation of monitoring the respective performance of the plurality of jobs comprises, for each job of the plurality of jobs:
recording the actual resource usage data for the job at each of a plurality of times over a predetermined time interval;
grouping the actual resource usage data into a plurality of time windows, each time window including at least one of the plurality of time instants;
normalizing the grouped resource usage data into a plurality of discrete usage levels to obtain a time sequence of resource usage;
self-correlating a timing of the resource usage with a corresponding different offset of the timing to identify a repeating pattern of the resource usage for the job;
generating the updated resource usage profile for the job in accordance with the identified recurring pattern of the resource usage of the job.
13. The apparatus of claim 9, wherein each of the resource usage profiles comprises resource usage entries for at least one of memory occupancy, Central Processing Unit (CPU) occupancy, network bandwidth, disk input/output (I/O) bandwidth, CPU cache occupancy, or memory bandwidth.
14. The apparatus of claim 9, wherein the operation of scheduling the first job to run on the first host computing system further comprises:
comparing each entry in the resource usage profile of the first job to a corresponding resource availability value of the first host;
scheduling the first job to run on the first host computing system when none of the resource usage profile entries exceeds the corresponding resource availability value.
15. The apparatus of claim 9, wherein the operation of scheduling the first job to run on the first host computing system further comprises:
determining that the negative correlation between the resource usage profile of the first job and a combination of the resource usage profiles of the first set of jobs is greater than a threshold.
16. The apparatus of claim 9, wherein the operation of scheduling the first job to run on the first host computing system further comprises:
determining that the negative correlation between the resource usage profile of the first job and a combination of the resource usage profiles of the first set of jobs running on the first host computing system is greater than the respective negative correlation between the resource usage profile of the first job and a respective combination of resource usage profiles of other sets of jobs scheduled for other host computing systems of the plurality of host computing systems.
17. A computer-readable medium storing instructions for scheduling workloads for a plurality of host computing systems in a data center, the instructions, when executed by a processing unit, causing the processing unit to perform operations comprising:
receiving a resource usage profile for each job of a plurality of jobs, each resource usage profile having a plurality of resource usage entries for the job in a plurality of respective time intervals;
scheduling a first job of the plurality of jobs to run on a first host computing system of the plurality of host computing systems according to a negative correlation between the resource usage profile of the first job and a combination of the resource usage profiles of a first set of jobs of the plurality of jobs scheduled to run on the first host computing system.
18. The computer-readable medium of claim 17, wherein the operations further comprise:
executing the plurality of jobs on the plurality of host computing systems, including executing the first job on the first host computing system;
monitoring respective resource usage of the plurality of jobs as they are executed on the host computing system to obtain respective actual resource usage data for each of the plurality of jobs;
generating respective updated resource usage profiles for the plurality of jobs, including an updated resource usage profile for the first job, in response to the obtained actual resource usage;
rescheduling the first job to run on a second one of the plurality of host computing systems according to a negative correlation between the updated resource usage profile of the first job and a combination of updated resource usage profiles of a second set of jobs of the plurality of jobs scheduled to run on the second one of the plurality of host computing systems.
19. The computer-readable medium of claim 18, wherein the operation of rescheduling the first job to run on the second host computing system comprises: rescheduling the plurality of jobs to run on respective ones of the plurality of host computing systems according to respective negative correlations between the updated resource usage profile of each job of the plurality of jobs and respective combinations of the updated resource usage profiles of respective sets of jobs scheduled to run on the plurality of host computing systems.
20. The computer-readable medium of claim 19, wherein the operation of monitoring the respective performance of the plurality of jobs comprises, for each job in the plurality of jobs:
recording the actual resource usage data for the job at each of a plurality of times over a predetermined time interval;
grouping the actual resource usage data into a plurality of time windows, each time window including at least one of the plurality of time instants;
normalizing the grouped resource usage data into a plurality of discrete usage levels to obtain a time sequence of resource usage;
self-correlating a timing of the resource usage with a corresponding different offset of the timing to identify a repeating pattern of the resource usage for the job;
generating the updated resource usage profile for the job in accordance with the identified recurring pattern of the resource usage of the job.
CN201980101440.7A 2019-10-17 2019-10-17 Variable job resource representation and scheduling for cloud computing Pending CN114868112A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2019/056781 WO2021076140A1 (en) 2019-10-17 2019-10-17 Variable job resource representation and scheduling for cloud computing

Publications (1)

Publication Number Publication Date
CN114868112A true CN114868112A (en) 2022-08-05

Family

ID=68502016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980101440.7A Pending CN114868112A (en) 2019-10-17 2019-10-17 Variable job resource representation and scheduling for cloud computing

Country Status (2)

Country Link
CN (1) CN114868112A (en)
WO (1) WO2021076140A1 (en)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8806015B2 (en) * 2011-05-04 2014-08-12 International Business Machines Corporation Workload-aware placement in private heterogeneous clouds

Also Published As

Publication number Publication date
WO2021076140A1 (en) 2021-04-22

Similar Documents

Publication Publication Date Title
US10346203B2 (en) Adaptive autoscaling for virtualized applications
US20180198855A1 (en) Method and apparatus for scheduling calculation tasks among clusters
CN112162865B (en) Scheduling method and device of server and server
US9703591B2 (en) Workload distribution management apparatus and control method
US20150295970A1 (en) Method and device for augmenting and releasing capacity of computing resources in real-time stream computing system
EP3920490B1 (en) Adaptive resource allocation method and apparatus
US20180091588A1 (en) Balancing workload across nodes in a message brokering cluster
US20180091586A1 (en) Self-healing a message brokering cluster
US20230004436A1 (en) Container scheduling method and apparatus, and non-volatile computer-readable storage medium
US20160179560A1 (en) CPU Overprovisioning and Cloud Compute Workload Scheduling Mechanism
US9870269B1 (en) Job allocation in a clustered environment
US20120084414A1 (en) Automatic replication of virtual machines
Xu et al. Adaptive task scheduling strategy based on dynamic workload adjustment for heterogeneous Hadoop clusters
Chard et al. Cost-aware cloud provisioning
US20120221730A1 (en) Resource control system and resource control method
US20160321331A1 (en) Device and method
Copil et al. Advise–a framework for evaluating cloud service elasticity behavior
WO2017020742A1 (en) Load balancing method and device
WO2012048014A2 (en) Automatic selection of secondary backend computing devices for virtual machine image replication
Shi et al. Characterizing and orchestrating VM reservation in geo-distributed clouds to improve the resource efficiency
US8650571B2 (en) Scheduling data analysis operations in a computer system
CN108536525B (en) Host machine scheduling method and device
CN109450672B (en) Method and device for identifying bandwidth demand burst
CN114868112A (en) Variable job resource representation and scheduling for cloud computing
CN114090201A (en) Resource scheduling method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination