WO2019193570A1 - Batch jobs execution time prediction using distinct service demand of threads and instantaneous cpu utilization - Google Patents

Batch jobs execution time prediction using distinct service demand of threads and instantaneous cpu utilization Download PDF

Info

Publication number
WO2019193570A1
WO2019193570A1 PCT/IB2019/052828 IB2019052828W WO2019193570A1 WO 2019193570 A1 WO2019193570 A1 WO 2019193570A1 IB 2019052828 W IB2019052828 W IB 2019052828W WO 2019193570 A1 WO2019193570 A1 WO 2019193570A1
Authority
WO
WIPO (PCT)
Prior art keywords
threads
job
batch jobs
threaded
cpu
Prior art date
Application number
PCT/IB2019/052828
Other languages
French (fr)
Inventor
Dheeraj Chahal
Benny Mathew
Manoj Karunakaran Nambiar
Original Assignee
Tata Consultancy Services Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tata Consultancy Services Limited filed Critical Tata Consultancy Services Limited
Publication of WO2019193570A1 publication Critical patent/WO2019193570A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/483Multiproc
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5019Workload prediction

Definitions

  • the disclosure herein generally relates to execution time prediction of batch jobs based upon service demand of one or more threads and instantaneous Central Processing Unit (CPU) utilization, and, more particularly, to systems and methods for execution time prediction of multi-threaded batch jobs based upon a distinct service demand of one or more threads and instantaneous Central Processing Unit (CPU) utilization.
  • CPU Central Processing Unit
  • Batch jobs perform automated, complex processing of large volumes of data without human intervention.
  • Parallel processing allows multiple batch jobs to run concurrently to minimize total completion time. However, this may result in one or more jobs exceeding their individual completion deadline due to resource sharing.
  • the batch jobs are normally grouped by streams based on business functions such that a batch job may have a predecessor and successor jobs. Finding a valid sequence of batch jobs such that their completion time is minimized, may be challenging and requires estimating completion time of an individual batch job in presence of other jobs running concurrently.
  • Generally resources used for the batch jobs are managed using time sharing.
  • the completion time of each individual batch job, when it runs concurrently with other jobs may be derived simply from its clock time or total execution time in isolation. This traditional way of execution time computation may generate useful results when number of cores available is more than the cores required by one or more concurrent jobs.
  • a method for execution time prediction of multi-threaded batch jobs based upon a distinct service demand of one or more threads and instantaneous Central Processing Unit (CPU) utilization comprising: identifying, based upon a concurrency level of one or more multi-threaded batch jobs, a set of multi-threaded batch jobs executing in parallel on a set of resources, wherein the set of resources comprise of a memory, a server and a disk; clustering, the one or more threads from the identified set of multi-threaded batch jobs, wherein the one or more threads are clustered based upon the distinct service demand of the one or more threads by implementing a K-means clustering technique, and wherein the distinct service demand comprises a distinct CPU utilization of the one or more threads; deriving, by one or more hardware processor
  • a system for execution time prediction of multi-threaded batch jobs based upon a distinct service demand of one or more threads and instantaneous Central Processing Unit (CPU) utilization comprising a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: identify, based upon a concurrency level of one or more multi-threaded batch jobs, a set of multi-threaded batch jobs executing in parallel on a set of resources, wherein the set of resources comprise of a memory, a server and a disk; cluster, the one or more threads from the identified set of multi -threaded batch jobs, wherein the one or more threads are clustered based upon the distinct service demand of the one or more threads by implementing a K-means clustering technique, and wherein the distinct service demand comprises a distinct CPU utilization of the one or more threads;
  • CPU Central Processing Unit
  • one or more non-transitory machine readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors causes the one or more hardware processors to perform a method for execution time prediction of multi-threaded batch jobs based upon a distinct service demand of one or more threads and instantaneous Central Processing Unit (CPU) utilization, the method comprising: identifying, based upon a concurrency level of one or more multi-threaded batch jobs, a set of multi-threaded batch jobs executing in parallel on a set of resources, wherein the set of resources comprise of a memory, a server and a disk; clustering, the one or more threads from the identified set of multi-threaded batch jobs, wherein the one or more threads are clustered based upon the distinct service demand of the one or more threads by implementing a K-means clustering technique, and wherein the distinct service demand comprises a distinct CPU utilization of the one or more threads; deriving, by one or more hardware processors to perform a method for
  • FIG. 1 illustrates a block diagram of a system for predicting execution time of multi-threaded batch jobs based upon a distinct service demand of one or more threads and instantaneous Central Processing Unit (CPU) utilization, in accordance with some embodiments of the present disclosure.
  • CPU Central Processing Unit
  • FIG. 2A through 2B is a flow diagram illustrating the steps involved in the process of predicting the execution time of the multi-threaded batch jobs based upon the distinct service demand of the one or more threads and the instantaneous CPU utilization, in accordance with some embodiments of the present disclosure.
  • FIG. 3 shows a graphical representation of an identified set of multi-threaded batch jobs executing in parallel and the one or more threads clustered, in accordance with some embodiments of the present disclosure.
  • FIG. 4 shows a graphical representation of a total execution time of each thread amongst the one or more threads clustered, in accordance with some embodiments of the present disclosure.
  • FIG. 5 shows a graphical representation of the instantaneous utilization of the CPU by one or more multi-threaded batch jobs amongst the set of multi-threaded batch jobs identified, in accordance with some embodiments of the present disclosure.
  • FIG. 6 shows a graphical representation of instantaneous values of the CPU derived for the one or more threads clustered for predicting the execution time of the multi threaded batch jobs, in accordance with some embodiments of the present disclosure.
  • FIG. 7 shows a graphical representation of a job execution model auto-designed based the one or more threads clustered and the instantaneous value of the CPU, in accordance with some embodiments of the present disclosure.
  • FIGS. 8(a) through 13(b) show graphical representations of a plurality of results corresponding to experimental data and predicted data, wherein the experimental data and the predicted data are obtained by implementing traditional systems and methods and the proposed methodology respectively, in accordance with some embodiments of the present disclosure.
  • the embodiments of the present disclosure provide systems and methods for predicting execution time of multi-threaded batch jobs based upon a distinct service demand of threads and instantaneous Central Processing Unit (CPU) utilization, according to some embodiments of the present disclosure.
  • Batch processing constitutes a big portion of complex and large data processing in many organizations. For example, banks generate reports at the end of the day using the batch processing and financial institutions run computationally intensive workloads to model stock performance. Batch jobs may be multi-threaded and threads can have distinct CPU requirements. Even batch jobs with high Input / Output (IO) requests saturate the CPU due to over subscription, that is, if one job or a thread is waiting for the IO, the CPU will be used by other jobs or threads.
  • IO Input / Output
  • batch jobs are generally scheduled to run during off business hours, known as a batch window, but the batch jobs may over run long enough in the presence of other batch jobs and thus impact the critical transactions during business hours. Hence it is very important to estimate the completion time of a job a priori in the presence of other batch jobs.
  • HT Hyper-Threading
  • the HT results in execution of two threads on a single core and leverages the latencies due to data access.
  • the HT does not provide the advantage of multi-core CPU but offers advantage over a single core by executing two threads simultaneously and filling unused stages in the functional pipeline. Since HT processor(s) behave differently than the single core or multi-core systems, predicting the thread or job execution behavior based on physical core data becomes challenging.
  • FIGS 1 through 13(b) where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
  • FIG. 1 illustrates an exemplary block diagram of a system 100 for predicting execution time of multi-threaded batch jobs based upon a distinct service demand of one or more threads and instantaneous Central Processing Unit (CPU) utilization, in accordance with an embodiment of the present disclosure.
  • the system 100 includes one or more processors 104, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more processors 104.
  • processors 104 includes one or more processors 104, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more processors 104.
  • I/O input/output
  • the one or more processors 104 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions.
  • the processor(s) is configured to fetch and execute computer-readable instructions stored in the memory 102.
  • the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
  • the I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite.
  • the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
  • the memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • volatile memory such as static random access memory (SRAM) and dynamic random access memory (DRAM)
  • non-volatile memory such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • FIG. 2A through 2B illustrates an exemplary flow diagram of a method for predicting the execution time of the multi-threaded batch jobs based upon the distinct service demand of the one or more threads and the instantaneous CPU utilization, in accordance with an embodiment of the present disclosure.
  • the system 100 comprises one or more data storage devices of the memory 102 operatively coupled to the one or more hardware processors 104 and is configured to store instructions for execution of steps of the method by the one or more processors 104.
  • the steps of the method of the present disclosure will now be explained with reference to the components of the system 100 as depicted in FIG. 1 and the flow diagram.
  • the hardware processors 104 when configured the instructions performs one or more methodologies described herein.
  • the one or more hardware processors 104 identify, based upon a concurrency level of one or more multi threaded batch jobs, a set of multi-threaded batch jobs executing in parallel on a set of resources, wherein the set of resources comprise inter-alia of a memory (other than the memory 102), a server and a disk.
  • multi-threading is a type of execution model that allows multiple threads to exist within the context of a process.
  • a multi-threaded architecture supports not only multiple processors but also multiple streams (or batch streams) executing simultaneously in each processor.
  • the processor(s) of the multi-threaded architecture computer are interconnected via an interconnection network.
  • the concurrency level of a processing stage refers to a maximum number of batch jobs that are executing concurrently on a computing system at the same time. For example, when the concurrency level at a processing stage is one, all batch jobs at that stage are processed in sequential order.
  • the set of multi threaded batch jobs executing in parallel may be referred.
  • each of the batch job (amongst the set of multi-threaded batch jobs), that is job 1, job 2, job3 and job4 comprise multiple threads. Jobl has 28 threads, Job2 has 28 threads, while Job3 and Job4 have 56 threads each.
  • the concurrency level is 4, as four multi-threaded batch jobs (that is, job 1 , job2, job3 and job4) are executing in parallel.
  • the one or more hardware processors 104 cluster the one or more threads from the identified set of multi threaded batch jobs.
  • the one or more threads are clustered based upon the distinct service demand of the one or more threads by implementing a K-means clustering technique.
  • the distinct service demand may be measured as a distinct CPU utilization of the one or more threads.
  • completion time £ of each batch job J t may be predicted, wherein the batch job J t is running concurrently with all other batch jobs amongst the set of multi-threaded batch jobs.
  • J ⁇ the distinct service demand of each thread j in a batch job i
  • t may be assumed that the number of threads in the set of multi-threaded batch jobs differ.
  • the resource requirement of each multi-threaded batch job amongst the set of multi-threaded batch jobs may be measured till the completion time t when a multi-threaded batch job runs in isolation.
  • the set of multi-threaded batch jobs run on a server with suppose, m number of cores.
  • the number of threads in a batch job Ji may be less than, equal to or more than the number of cores m.
  • the service demand as an input comprises a critical input for predictive modelling of job execution.
  • the one or more threads in the set of multi-threaded batch jobs may be assigned a unique work and hence the service demand of the one or more threads are non-identical or distinct with a job.
  • some of the threads amongst the one or more threads may be finish early while others may ran slow.
  • the variation in the service demand of the one or more threads in a batch job affects the completion time of all batch jobs (amongst the set of multi-threaded batch jobs) running concurrently.
  • the proposed disclosure provides for measuring the service demand of each thread (amongst the one or more threads) in a job (amongst the set of multi-threaded batch jobs) in isolation. Threads having a similar demand may be clustered for the ease of simulation.
  • the embodiment of the present disclosure provides for the clustering of the one or more threads based upon similarity in the service demands of the one or more threads.
  • the proposed disclosure implements the K-means clustering technique for clustering the service demands of the one or more threads of an individual batch job amongst the set of multi-threaded batch jobs executing concurrently.
  • the cluster centers are initialized using k-means ++ algorithms.
  • the one or more threads clustered may be referred.
  • the distinct service demand comprises the distinct CPU utilization of the one or more threads and may be measured accordingly.
  • the CPU utilization of the one or more threads corresponding to the batch jobs jobl and job2 is close to 300 while the CPU utilization of the one or more threads corresponding to the batch jobs job3 and job4 is close to 450.
  • the one or more threads have been clustered as per the distinct service demands (or the CPU utilization) of the one or more threads.
  • the one or more hardware processors 104 derive an instantaneous value of the CPU for the one or more threads clustered.
  • the instantaneous value is derived based upon the instantaneous utilization of the CPU by the one or more multi-threaded batch jobs amongst the set of multi-threaded batch jobs.
  • the instantaneous CPU utilization may be derived as a function of time for a set of intervals (that is, for small intervals) for predicting the batch execution time.
  • varying CPU utilization due to caching comprises a major challenge in predictive modelling.
  • the traditional systems and methods do not provide for deriving the instantaneous value of the CPU by measuring the instantaneous utilization of the CPU.
  • the variation in the CPU can be a function of time or a distribution of some other kind.
  • the varying CPU utilization by a batch job during different stages of execution affects the completion time of other batch jobs.
  • the aberrations in predicting the completion time of a batch job may be mitigated by simulating the execution behavior of the one or more threads for smaller intervals of time.
  • the proposed disclosure provides for capturing the CPU utilization value of a batch job (or more specifically, of a multi-threaded batch job amongst the set of multi-threaded batch jobs) at the set of intervals, and more specifically, at a set of small intervals in isolation and fitting capturing the CPU utilization value captured into a regression function or distribution (for example, linear or non-linear, exponential, polynomial of degree n etc).
  • a regression function or distribution for example, linear or non-linear, exponential, polynomial of degree n etc.
  • the instantaneous value(s) of the CPU may be derived with the function of time or the distribution of time for each of the smaller intervals during simulation.
  • a thread is in an active state when it acquires CPU core for execution.
  • the thread then releases server core(s) and switches to block state for some other work, for example, waiting for an input/output to complete or making a remote procedure call.
  • the thread switches to a ready state and competes for the server core(s) before returning to the active state.
  • the completion time £) of any thread is determined by obtaining a sum of time spent the active state (that is, while using the CPU) and a blocked or a ready state, while completion time of a batch job may be represented with time of slowest running threads in the batch job.
  • C T is the CPU utilization of the batch job (amongst the set of multi -threaded batch jobs) defined at time t of execution.
  • the instantaneous utilization of the CPU by the one or more multi-threaded batch jobs amongst the set of multi-threaded batch jobs may be referred.
  • the instantaneous CPU utilization of a multi-threaded job may be observed at regular intervals of time for a job when it is running in isolation. The utilization can be represented with the help of the uniform distribution in the interval approximately [72%, 78%]. Further, in an example implementation of the step 203, referring to FIG.
  • the instantaneous value of the CPU for the one or more threads clustered may be referred, wherein the instantaneous value is derived based upon the instantaneous utilization of the CPU by the one or more multi-threaded batch jobs.
  • the instantaneous value of the CPU for the one or more threads clustered varies between [68%, 78%] for different set of intervals.
  • the one or more hardware processors 104 auto-design, based upon the one or more threads clustered and the instantaneous value of the CPU, a job execution model, wherein the job execution model comprises a plurality of idle threads and a plurality of threads ready for execution amongst the one or more threads clustered.
  • the job execution model is auto-designed to be simulated in a simulation environment for facilitating the execution time prediction of each job amongst the set of multi-threaded batch jobs (discussed in step 205).
  • the proposed disclosure facilitates modelling the execution time of a thread (amongst the one or more threads clustered) in the set of small intervals of time based upon the time spent in the CPU and the idle time outside the CPU.
  • the job execution model auto-designed based the one or more threads clustered and the instantaneous value of the CPU may be referred.
  • step 205(i) simulate the job execution model in a Predicting the Runtime of Batch Workloads (PROWL) simulation environment, wherein the simulation is performed based upon the one or more threads clustered and the CPU utilization of the one or more multi-threaded batch jobs.
  • the PROWL comprises a discrete event simulation environment for facilitating simulations of batch load executions, and thus, facilitating simulating the job execution model.
  • the simulation in the PROWL is performed based upon the one or more threads clustered and the CPU utilization of the one or more multi-threaded batch jobs.
  • a set of inputs to the PROWL simulation environment comprise number of threads in each job (amongst the set of multi threaded batch jobs), number of cores (or the server cores) available in the computing system (that is, the system on which the proposed methodology is being tested and implemented, and other than the system 100), length of each small interval amongst the set of small intervals and the number of simulations to perform.
  • the PROWL simulation environment Upon completing simulations, the PROWL simulation environment predicts the minimum and maximum completion time of each job amongst the set of multi-threaded batch jobs identified.
  • the process of simulating the job execution model in the PROWL simulation environment may now be considered in detail by referring to PROWL algorithm below.
  • gapTime ⁇ - (Uniform) gapDistList
  • the PROWL simulation environment comprises a set of programming functions corresponding to at least either of the one or more threads or the one or more multi-threaded batch jobs for executing a plurality of tasks (for example, shuffling the one or more threads) corresponding to the batch execution time prediction.
  • the PROWL algorithm comprises a set of five major set of functions (or the programming functions) namely, interval_end, schedule Job _execution, interval _start, arrival and departure. The task executed by each programming function amongst set of programming functions is discussed below in paragraph 52.
  • the PROWL simulation environment simulates the job execution model auto-designed in the step 204 above.
  • the step of predicting the batch execution time of each job is preceded by defining each job amongst the set of multi threaded batch jobs in the PROWL simulation environment, and wherein each of the job is defined based upon a total number of threads, a distinct service demand of each thread amongst the total number of threads and CPU utilization of the job to be defined.
  • each of the job may be defined based upon the total number of threads, the distinct service demand of each thread and the CPU utilization of the job to be defined as below:
  • each job defined in the PROWL simulation environment may execute one or more functions to initialize one or more variables corresponding to the set of multi-threaded batch jobs, wherein the one or more variables comprise the distinct service demand of each of the thread amongst the total number of threads, the CPU utilization of the defined job and a job identification metric of the defined job.
  • the job 1 defined in the PROWL simulation environment may execute the function arrival only once to initialize the one or more variables like the job identification metric of the job 1 as 6543.
  • the instantaneous value of the CPU may be derived for each interval amongst the set of intervals for predicting the batch execution time using the equations (3) and (4).
  • an average CPU utilization of a batch job is 35% and it varies uniformly in a range of + or— 1.75%. If the set of intervals are of 1 second each, then CPU utilization for each of this set of intervals will be randomly selected in the range of 33.25 to 36.75 ie. [(35— 1.75, [(35— 1.75)] .
  • the CPU idle time in thel second interval is also randomly selected from the range of 63.25 to 66.75, that is, [(65— 1.75, [(65— 1.75)] .
  • a thread remains idle for CPU idle time and executes for the remaining time.
  • the PROWL simulation environment comprises a set of queues for the one or more threads, and wherein the set of queues comprise either of at least one ready thread or of at least one active thread amongst the one or more threads.
  • the embodiments of the proposed disclosure facilitates maintaining of two queues by the PROWL simulation environment, that is, a job queue and an execution_queue (referred to in the PROWL algorithm), wherein the job_queue comprises thread(s) amongst the one or more threads that are in ready state and the execution_queue stores thread(s) amongst the one or more threads that are in active state or run state.
  • interval_start and an interval end functions are used to shuffle the one or more threads between the set of queues, that is, the job queue and the execution_queue upon start or completion of busy time in an interval.
  • the schedule Job execution function moves a batch job (defined in the PROWL simulation environment) from the job_queue to the execution queue upon receiving request(s) from the interval_start and the interval_end functions.
  • the departure function is invoked when the one or more threads exit from the computing system on completion of the service demand.
  • the one or more hardware processors 104 predict based upon the simulation, the execution time of each job amongst the set of multi-threaded batch jobs executing in parallel.
  • the execution time predicted for each of the job amongst the set of multi-threaded batch jobs identified by implementing the proposed methodology may be referred.
  • The‘experimental time’ refers to the job completion time predicted using the traditional systems and methods.
  • Jobl, Job2, job3 and Job4 have the predicted completion time of 1693 seconds, 1699 seconds, 1302 seconds, and 1285 seconds respectively.
  • the experimental analysis to highlight the accuracy of the proposed disclosure as compared to the traditional systems and methods may now be considered.
  • the experiments were performed using two synthetic benchmarks namely, Flexible Input / Output Tester (FIO) and Likwid-bench.
  • FIO Flexible Input / Output Tester
  • Likwid-bench is a micro-benchmarking framework that provides for a set of assembly language kernels.
  • the read and write operations of the FIO were redirected to a set of temporary files stored by using temporary file system tempfs.
  • the FIO behaves as a CPU intensive benchmark since all the 10 operations may be carried out in a memory (other than the memory 102).
  • the FIO allows both forked and threaded jobs
  • the proposed disclosure used the threaded jobs to replicate behavior of a multi-threaded batch workload.
  • the FIO provides for 19 different kinds of 10 engines, for example, mmap, sync etc.
  • a set of batch jobs of distinct characteristics may be created by changing the FIO parameters as below (taking an example scenario):
  • Randomness Four levels of randomness in the access pattern were used (25%, 50%, 75%, 100%) block sizes. IO requests of sizes 16K, 32K, 64K and 128K were generated.
  • Thinktime is defined as time duration for which a job is stalled between two IO operations. The thinktime was changed to vary the CPU utilization of a job in the range 10%-100%.
  • a sample FIO job is as shown in the below:
  • the FIO jobs of distinct configurations were used to create a batch of jobs for performing evaluating the proposed methodology (disclosure).
  • the Lidwik-bench suite has 89 benchmarks and comprises of a set of features like thread parallelism and thread placement. While executing the Lidwik benchmarks, total number of threads were equally divided between two sockets.
  • both the benchmarks were executed on 8 physical (16 logical) and 28 physical (56 logical) core Intel® Xeon machines respectively. Both the machines were executing on CentOS 6.0 Linux system. The benchmarks were run on both the machines with HT on and off.
  • time-series data was obtained from the run-history of individual job which included the service demand, the CPU utilization, and the number of threads spawned by the job.
  • the data was collected using low overhead Linux ps and mpstat commands every five seconds during the job run in isolation.
  • the batches of jobs were formed from the pool of distinct FIO and Likwid jobs available. The batch jobs were executed in parallel.
  • the job characteristics of four distinct FIO jobs used for experimental purposes may be referred.
  • the four distinct Likwid jobs that were used for experimental purposes comprises of 224, 112 , 56 and 28 threads and 2500, 5K, 10K, 20K iterations respectively and each were executing on a vector size of 1GB.
  • the prediction error in each experiment was computed as:
  • E e and E P represent experimental and predictive completion times respectively for a batch job when executing concurrently with other batch jobs.
  • Case 1 - No Hyper-Threading (HT) a set of four FIO and Likwid benchmark batch jobs were executed in isolation initially and then concurrently on 8 and 28 physical core machines. The CPU was oversubscribed when four jobs run concurrently with 168 threads.
  • FIO - In an embodiment, one set of experiment was conducted, wherein a set of FIO batch jobs resulting 100% CPU utilization in isolation were considered on 8 physical core machine. Referring to FIG. 8(a), a comparison of experimental and predictive completion time of the set of FIO batch jobs comprising of four concurrently running batch jobs may be referred, wherein each FIO job when running in isolation required 100% CPU utilization. In another embodiment, another set of experiment was conducted, wherein the idle time of the CPU was obtained by introducing think-time(s) in the IO operations of each job amongst the set of FIO batch jobs, wherein each of the job was running in isolation. Referring to FIG. 8(b), a comparison of experimental and predictive completion time. Both sets of experiments were repeated on 28 physical core machine. Referring to FIG. 9(a), results may be referred when each job has 100% CPU utilization. Further referring to FIG. 9(b), a comparison may be observed when each job under-utilized the CPU in isolation.
  • Likwid - In an embodiment, similar to the FIO jobs, a set of Likwid benchmark batch jobs were executed on two different machines with no HT first. Referring to FIGS. 10(a) and (b), a comparison of experimental and predictive execution time on 8 core and 28 core machines respectively may be referred, wherein the set of Likwid benchmark batch jobs comprise four Likwid benchmark batch jobs running concurrently.
  • the proposed disclosure by implementing the PROWL simulation environment and the PROWL algorithm predicts the completion time for the set of FIO batch jobs and the set of Likwid benchmark batch jobs with a very minimal error. Due to variations in the concurrency level vis-a-vis the service demand, a minimal error may be observed. The proposed disclosure does not considers the variations in the concurrency level vis-a-vis the service demand.
  • FIO - In an embodiment, one set of experiment was conducted, wherein the set of FIO batch jobs resulted 100% CPU utilization in isolation. Referring to FIG. 11(a), a comparison of an individual completion time with a predictive set of values for the set of FIO batch jobs comprising of concurrently running batch jobs may be referred. In another embodiment, another set of experiment was conducted, and referring to FIG. 11(b), the idle time of the CPU was obtained by introducing the think-time(s) in the 10 operations of each job amongst the set of FIO batch jobs, wherein each of the job was running in isolation. Referring to FIGS. 12(a) and (b), results obtained by repeating both set of experiments (conducted in the case of HT) on 56 core machines may be referred.
  • the technical advantages of the proposed disclosure may now be considered.
  • the proposed disclosure provides for predicting the execution time of batch jobs correctly while considering the set of multi-threaded batch jobs with the distinct service demand(s) amongst the one or more threads and the time varying CPU utilization. None of the traditional systems and methods provide for clustering the one or more threads from the identified set of multi-threaded batch jobs based upon the distinct service demand of the one or more threads. Further, none of the traditional systems and methods provide for the computing or deriving the instantaneous value of the one or more threads based upon the instantaneous utilization of the CPU. Referring to FIGS. 8(a) through 13(b) once again, it may be noted that the proposed disclosure provides for a very high level of accuracy in predicting the execution time of each of the job amongst the set of multi threaded batch jobs by implementing the proposed methodology.
  • Jobl, Job2, job3 and Job4 have experimental time of 1668 seconds, 1721 seconds, 1369 seconds, and 1358 seconds respectively, which represents the completion time prediction using traditional systems and methods.
  • Jobl, Job2, job3 and Job4 have the predicted completion time of 1693 seconds, 1699 seconds, 1302 seconds, and 1285 seconds.
  • the proposed disclosure has a very high level of accuracy in predicting the execution time completion of the multi-threaded batch jobs.
  • the proposed disclosure also provides for the PROWL simulation environment for predicting the execution time of batch jobs, wherein the PROWL simulation environment simulates the auto-designed job execution model.
  • the PROWL comprises a discrete event simulation environment and has capabilities to perform what-if scenarios for capacity planning purpose(s) corresponding to batch processing environments.
  • the memory 102 can be configured to store any data that is associated with predicting the execution time of the multi-threaded batch jobs based upon the distinct service demand of the one or more threads and the instantaneous CPU utilization.
  • the information pertaining to the set of multi-threaded batch jobs, the one or more threads clustered, the instantaneous value of the CPU derived for the one or more threads clustered etc. and all information pertaining to predicting the execution time of the multi threaded batch jobs is stored in the memory 102.
  • all information (inputs, outputs and so on) pertaining to predicting the execution time of the multi-threaded batch jobs based upon the distinct service demand of the one or more threads and the instantaneous CPU utilization may also be stored in the database, as history data, for reference purpose.
  • the hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof.
  • the device may also include means which could be e.g. hardware means like e.g. an application- specific integrated circuit (ASIC), a field- programmable gate array (FPGA), or a combination of hardware and software means, e.g.
  • ASIC application- specific integrated circuit
  • FPGA field- programmable gate array
  • the means can include both hardware means and software means.
  • the method embodiments described herein could be implemented in hardware and software.
  • the device may also include software means.
  • the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
  • the embodiments herein can comprise hardware and software elements.
  • the embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc.
  • the functions performed by various modules described herein may be implemented in other modules or combinations of other modules.
  • a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • a computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored.
  • a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein.
  • the term“computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Systems and methods for execution time prediction of batch jobs based upon service demand of threads and instantaneous Central Processing Unit (CPU) utilization. The traditional systems and methods provide for predicting the execution time of the batch jobs based upon clock time or previously logged execution times of the batch jobs. Embodiments of the present disclosure provide for predicting the execution time of each multi-threaded batch job amongst a set of concurrently executing multi-threaded batch jobs by clustering one or more threads from a set of multi-threaded batch jobs based upon a distinct service demand of the one or more threads; deriving an instantaneous value of the CPU for the one or more threads clustered; auto-designing a job execution model; simulating the job execution model based upon the one or more threads clustered and the instantaneous CPU utilization of one or more multi-threaded batch jobs to predict the execution time.

Description

BATCH JOBS EXECUTION TIME PREDICTION USING DISTINCT SERVICE DEMAND OF THREADS AND INSTANTANEOUS CPU UTIUIZATION
CROSS-REFERENCE TO REUATED APPUICATIONS AND PRIORITY
[001] This patent application claims priority to India Patent Application 201821013348, filed on April 07, 2018.
TECHNICAU FIEUD
[002] The disclosure herein generally relates to execution time prediction of batch jobs based upon service demand of one or more threads and instantaneous Central Processing Unit (CPU) utilization, and, more particularly, to systems and methods for execution time prediction of multi-threaded batch jobs based upon a distinct service demand of one or more threads and instantaneous Central Processing Unit (CPU) utilization.
BACKGROUND
[003] Many applications in enterprise domain require batch processing to perform business critical operations. Batch jobs perform automated, complex processing of large volumes of data without human intervention. Parallel processing allows multiple batch jobs to run concurrently to minimize total completion time. However, this may result in one or more jobs exceeding their individual completion deadline due to resource sharing. The batch jobs are normally grouped by streams based on business functions such that a batch job may have a predecessor and successor jobs. Finding a valid sequence of batch jobs such that their completion time is minimized, may be challenging and requires estimating completion time of an individual batch job in presence of other jobs running concurrently.
[004] Generally resources used for the batch jobs are managed using time sharing. The completion time of each individual batch job, when it runs concurrently with other jobs may be derived simply from its clock time or total execution time in isolation. This traditional way of execution time computation may generate useful results when number of cores available is more than the cores required by one or more concurrent jobs.
[005] However, when the number of cores is less than the total requirement of the jobs, cores are generally shared between jobs or threads in accordance with the operating system policy. This requires an advanced job execution model for predicting the execution time of the batch jobs, especially in the case of a set of multi-threaded batch jobs executing in parallel.
SUMMARY
[006] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for execution time prediction of multi-threaded batch jobs based upon a distinct service demand of one or more threads and instantaneous Central Processing Unit (CPU) utilization is provided, the method comprising: identifying, based upon a concurrency level of one or more multi-threaded batch jobs, a set of multi-threaded batch jobs executing in parallel on a set of resources, wherein the set of resources comprise of a memory, a server and a disk; clustering, the one or more threads from the identified set of multi-threaded batch jobs, wherein the one or more threads are clustered based upon the distinct service demand of the one or more threads by implementing a K-means clustering technique, and wherein the distinct service demand comprises a distinct CPU utilization of the one or more threads; deriving, by one or more hardware processors, an instantaneous value of the CPU for the one or more threads clustered, wherein the instantaneous value is derived based upon the instantaneous utilization of the CPU by the one or more multi-threaded batch jobs amongst the set of multi-threaded batch jobs, and wherein the instantaneous CPU utilization is derived as a function of time for a set of intervals for predicting the batch execution time; auto-designing, based upon the one or more threads clustered and the instantaneous value of the CPU, a job execution model, wherein the job execution model comprises a plurality of idle threads and a plurality of threads ready for execution amongst the one or more threads clustered; predicting, by a Predicting the Runtime of Batch Workloads (PROWL) simulation environment, the batch execution time for each job amongst the set of multi-threaded batch jobs by performing a plurality of steps, wherein the plurality of steps comprise: (i) simulating the job execution model in the PROWL simulation environment, wherein the simulation is performed based upon the one or more threads clustered and the CPU utilization of the one or more multi-threaded batch jobs; and (ii) predicting, based upon the simulation, the execution time of each job amongst the set of multi-threaded batch jobs executing in parallel; predicting the batch execution time based upon the simulation by defining each of the job amongst the set of multi-threaded batch jobs in the PROWL simulation environment, and wherein each of the job is defined based upon a total number of threads, a distinct service demand of each thread amongst the total number of threads and CPU utilization of the job to be defined; executing one or more functions via each of the job defined in the PROWL simulation environment to initialize one or more variables corresponding to the set of multi-threaded batch jobs; and deriving the instantaneous value of the CPU for each interval amongst the set of intervals for predicting the batch execution time.
[007] In another aspect, there is provided a system for execution time prediction of multi-threaded batch jobs based upon a distinct service demand of one or more threads and instantaneous Central Processing Unit (CPU) utilization, the system comprising a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: identify, based upon a concurrency level of one or more multi-threaded batch jobs, a set of multi-threaded batch jobs executing in parallel on a set of resources, wherein the set of resources comprise of a memory, a server and a disk; cluster, the one or more threads from the identified set of multi -threaded batch jobs, wherein the one or more threads are clustered based upon the distinct service demand of the one or more threads by implementing a K-means clustering technique, and wherein the distinct service demand comprises a distinct CPU utilization of the one or more threads; derive an instantaneous value of the CPU for the one or more threads clustered, wherein the instantaneous value is derived based upon the instantaneous utilization of the CPU by the one or more multi-threaded batch jobs amongst the set of multi-threaded batch jobs, and wherein the instantaneous CPU utilization is derived as a function of time for a set of intervals for predicting the batch execution time; auto-design, based upon the one or more threads clustered and the instantaneous value of the CPU, a job execution model, wherein the job execution model comprises a plurality of idle threads and a plurality of threads ready for execution amongst the one or more threads clustered; predict, by a Predicting the Runtime of Batch Workloads (PROWL) simulation environment, the batch execution time for each job amongst the set of multi-threaded batch jobs by performing a plurality of steps, wherein the plurality of steps comprise: (i) simulate the job execution model in the PROWL simulation environment, wherein the simulation is performed based upon the one or more threads clustered and the CPU utilization of the one or more multi-threaded batch jobs; and (ii) predict, based upon the simulation, the execution time of each job amongst the set of multi-threaded batch jobs executing in parallel; predict the batch execution time based upon the simulation by defining each of the job amongst the set of multi-threaded batch jobs in the PROWL simulation environment, and wherein each of the job is defined based upon a total number of threads, a distinct service demand of each thread amongst the total number of threads and CPU utilization of the job to be defined; execute one or more functions via each of the job defined in the PROWL simulation environment to initialize one or more variables corresponding to the set of multi-threaded batch jobs; and derive the instantaneous value of the CPU for each interval amongst the set of intervals for predicting the batch execution time.
[008] In yet another aspect, there is provided one or more non-transitory machine readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors causes the one or more hardware processors to perform a method for execution time prediction of multi-threaded batch jobs based upon a distinct service demand of one or more threads and instantaneous Central Processing Unit (CPU) utilization, the method comprising: identifying, based upon a concurrency level of one or more multi-threaded batch jobs, a set of multi-threaded batch jobs executing in parallel on a set of resources, wherein the set of resources comprise of a memory, a server and a disk; clustering, the one or more threads from the identified set of multi-threaded batch jobs, wherein the one or more threads are clustered based upon the distinct service demand of the one or more threads by implementing a K-means clustering technique, and wherein the distinct service demand comprises a distinct CPU utilization of the one or more threads; deriving, by one or more hardware processors, an instantaneous value of the CPU for the one or more threads clustered, wherein the instantaneous value is derived based upon the instantaneous utilization of the CPU by the one or more multi-threaded batch jobs amongst the set of multi-threaded batch jobs, and wherein the instantaneous CPU utilization is derived as a function of time for a set of intervals for predicting the batch execution time; auto-designing, based upon the one or more threads clustered and the instantaneous value of the CPU, a job execution model, wherein the job execution model comprises a plurality of idle threads and a plurality of threads ready for execution amongst the one or more threads clustered; predicting, by a Predicting the Runtime of Batch Workloads (PROWL) simulation environment, the batch execution time for each job amongst the set of multi-threaded batch jobs by performing a plurality of steps, wherein the plurality of steps comprise: (i) simulating the job execution model in the PROWL simulation environment, wherein the simulation is performed based upon the one or more threads clustered and the CPU utilization of the one or more multi-threaded batch jobs; and (ii) predicting, based upon the simulation, the execution time of each job amongst the set of multi threaded batch jobs executing in parallel; predicting the batch execution time based upon the simulation by defining each of the job amongst the set of multi-threaded batch jobs in the PROWL simulation environment, and wherein each of the job is defined based upon a total number of threads, a distinct service demand of each thread amongst the total number of threads and CPU utilization of the job to be defined; executing one or more functions via each of the job defined in the PROWL simulation environment to initialize one or more variables corresponding to the set of multi-threaded batch jobs; and deriving the instantaneous value of the CPU for each interval amongst the set of intervals for predicting the batch execution time.
[009] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[010] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles:
[011] FIG. 1 illustrates a block diagram of a system for predicting execution time of multi-threaded batch jobs based upon a distinct service demand of one or more threads and instantaneous Central Processing Unit (CPU) utilization, in accordance with some embodiments of the present disclosure.
[012] FIG. 2A through 2B is a flow diagram illustrating the steps involved in the process of predicting the execution time of the multi-threaded batch jobs based upon the distinct service demand of the one or more threads and the instantaneous CPU utilization, in accordance with some embodiments of the present disclosure.
[013] FIG. 3 shows a graphical representation of an identified set of multi-threaded batch jobs executing in parallel and the one or more threads clustered, in accordance with some embodiments of the present disclosure.
[014] FIG. 4 shows a graphical representation of a total execution time of each thread amongst the one or more threads clustered, in accordance with some embodiments of the present disclosure.
[015] FIG. 5 shows a graphical representation of the instantaneous utilization of the CPU by one or more multi-threaded batch jobs amongst the set of multi-threaded batch jobs identified, in accordance with some embodiments of the present disclosure.
[016] FIG. 6 shows a graphical representation of instantaneous values of the CPU derived for the one or more threads clustered for predicting the execution time of the multi threaded batch jobs, in accordance with some embodiments of the present disclosure.
[017] FIG. 7 shows a graphical representation of a job execution model auto-designed based the one or more threads clustered and the instantaneous value of the CPU, in accordance with some embodiments of the present disclosure.
[018] FIGS. 8(a) through 13(b) show graphical representations of a plurality of results corresponding to experimental data and predicted data, wherein the experimental data and the predicted data are obtained by implementing traditional systems and methods and the proposed methodology respectively, in accordance with some embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
[019] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope and spirit being indicated by the following claims.
[020] The embodiments of the present disclosure provide systems and methods for predicting execution time of multi-threaded batch jobs based upon a distinct service demand of threads and instantaneous Central Processing Unit (CPU) utilization, according to some embodiments of the present disclosure. Batch processing constitutes a big portion of complex and large data processing in many organizations. For example, banks generate reports at the end of the day using the batch processing and financial institutions run computationally intensive workloads to model stock performance. Batch jobs may be multi-threaded and threads can have distinct CPU requirements. Even batch jobs with high Input / Output (IO) requests saturate the CPU due to over subscription, that is, if one job or a thread is waiting for the IO, the CPU will be used by other jobs or threads.
[021] Although batch jobs are generally scheduled to run during off business hours, known as a batch window, but the batch jobs may over run long enough in the presence of other batch jobs and thus impact the critical transactions during business hours. Hence it is very important to estimate the completion time of a job a priori in the presence of other batch jobs.
[022] Predicting the completion time of the batch jobs becomes more challenging if the batch jobs are multi-threaded with a distinct service demands of threads. Different threads constituting a batch job may have different service demands which may directly affect the job completion time. Early finishing threads do not compete for the resources and unused computing resources are available for remaining threads. This sometimes results in faster overall execution of the remaining threads in the system.
[023] Additionally, new architectures from Intel® and other organizations provide for a Hyper-Threading (HT) in its processors. The HT results in execution of two threads on a single core and leverages the latencies due to data access. Although the HT does not provide the advantage of multi-core CPU but offers advantage over a single core by executing two threads simultaneously and filling unused stages in the functional pipeline. Since HT processor(s) behave differently than the single core or multi-core systems, predicting the thread or job execution behavior based on physical core data becomes challenging.
[024] Hence, there is a need for a technology that provides for simulating and determining distinct service demand(s) of threads in the batch jobs for predicting the overall completion time of batch jobs. The technology must provide for considering CPU utilization of a job as a function of time or any distribution while simulating a thread on a server and predict job's completion time using instantaneous CPU utilization value in small intervals of execution time when multiple batch jobs are running concurrently.
[025] Referring now to the drawings, and more particularly to FIGS 1 through 13(b), where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.
[026] FIG. 1 illustrates an exemplary block diagram of a system 100 for predicting execution time of multi-threaded batch jobs based upon a distinct service demand of one or more threads and instantaneous Central Processing Unit (CPU) utilization, in accordance with an embodiment of the present disclosure. In an embodiment, the system 100 includes one or more processors 104, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more processors 104. The one or more processors 104 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) is configured to fetch and execute computer-readable instructions stored in the memory 102. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.
[027] The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.
[028] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
[029] FIG. 2A through 2B, with reference to FIG. 1, illustrates an exemplary flow diagram of a method for predicting the execution time of the multi-threaded batch jobs based upon the distinct service demand of the one or more threads and the instantaneous CPU utilization, in accordance with an embodiment of the present disclosure. In an embodiment the system 100 comprises one or more data storage devices of the memory 102 operatively coupled to the one or more hardware processors 104 and is configured to store instructions for execution of steps of the method by the one or more processors 104. The steps of the method of the present disclosure will now be explained with reference to the components of the system 100 as depicted in FIG. 1 and the flow diagram. In the embodiments of the present disclosure, the hardware processors 104 when configured the instructions performs one or more methodologies described herein.
[030] According to an embodiment of the present disclosure, at step 201, the one or more hardware processors 104 identify, based upon a concurrency level of one or more multi threaded batch jobs, a set of multi-threaded batch jobs executing in parallel on a set of resources, wherein the set of resources comprise inter-alia of a memory (other than the memory 102), a server and a disk. In general, multi-threading is a type of execution model that allows multiple threads to exist within the context of a process. A multi-threaded architecture supports not only multiple processors but also multiple streams (or batch streams) executing simultaneously in each processor. The processor(s) of the multi-threaded architecture computer are interconnected via an interconnection network. Each processor can communicate with every other processor through the interconnection network. Further, the concurrency level of a processing stage (corresponding to batch streams(s)) refers to a maximum number of batch jobs that are executing concurrently on a computing system at the same time. For example, when the concurrency level at a processing stage is one, all batch jobs at that stage are processed in sequential order. [031] In an example implementation of step 201, referring to FIG. 3, the set of multi threaded batch jobs executing in parallel (identified based upon the concurrency level) may be referred. Referring to FIG. 3 again, it may be noted that each of the batch job (amongst the set of multi-threaded batch jobs), that is job 1, job 2, job3 and job4 comprise multiple threads. Jobl has 28 threads, Job2 has 28 threads, while Job3 and Job4 have 56 threads each. The concurrency level is 4, as four multi-threaded batch jobs (that is, job 1 , job2, job3 and job4) are executing in parallel.
[032] According to an embodiment of the present disclosure, at step 202, the one or more hardware processors 104 cluster the one or more threads from the identified set of multi threaded batch jobs. In an embodiment, the one or more threads are clustered based upon the distinct service demand of the one or more threads by implementing a K-means clustering technique. The distinct service demand may be measured as a distinct CPU utilization of the one or more threads. The step 202 may now be discussed in detail.
[033] In an embodiment, suppose given n multi-threaded batch jobsJ1,J2 . /n) amongst the set of multi-threaded batch jobs, completion time £) of each batch job Jt may be predicted, wherein the batch job Jt is running concurrently with all other batch jobs amongst the set of multi-threaded batch jobs. In an embodiment, it may be assumed that the distinct service demand of each thread j in a batch job i is measureable and may be denoted by J^. Also, t may be assumed that the number of threads in the set of multi-threaded batch jobs differ.
[034] In an embodiment, the resource requirement of each multi-threaded batch job amongst the set of multi-threaded batch jobs may be measured till the completion time t when a multi-threaded batch job runs in isolation. As is known in the art, the set of multi-threaded batch jobs run on a server with suppose, m number of cores. The number of threads in a batch job Ji may be less than, equal to or more than the number of cores m.
[035] The method and importance of simulating the distinct service demand of the one or more threads may now be considered in detail. In general, the service demand as an input comprises a critical input for predictive modelling of job execution. However, the one or more threads in the set of multi-threaded batch jobs may be assigned a unique work and hence the service demand of the one or more threads are non-identical or distinct with a job. Referring to FIG. 3 yet again, it may be noted that some of the threads amongst the one or more threads may be finish early while others may ran slow. The variation in the service demand of the one or more threads in a batch job affects the completion time of all batch jobs (amongst the set of multi-threaded batch jobs) running concurrently. Fast running threads in a job with low service demand finish early and on completion do not compete for resources with slow running threads of the same batch job or other batch jobs. Hence, it is very important to measure the service demand of each threads (amongst the one or more threads) in a batch job for an accurate simulation and prediction of job completion time of the set of multi -threaded batch jobs concurrently running.
[036] In view of the limitations of the traditional systems and methods discussed in the preceding paragraph, the proposed disclosure provides for measuring the service demand of each thread (amongst the one or more threads) in a job (amongst the set of multi-threaded batch jobs) in isolation. Threads having a similar demand may be clustered for the ease of simulation. The embodiment of the present disclosure provides for the clustering of the one or more threads based upon similarity in the service demands of the one or more threads. The proposed disclosure implements the K-means clustering technique for clustering the service demands of the one or more threads of an individual batch job amongst the set of multi-threaded batch jobs executing concurrently. The cluster centers are initialized using k-means ++ algorithms.
[037] In an example implementation of the step 202, referring to FIG. 3 yet again, the one or more threads clustered may be referred. As mentioned above, the distinct service demand comprises the distinct CPU utilization of the one or more threads and may be measured accordingly. Referring to FIG. 3 yet again, it may be noted that the CPU utilization of the one or more threads corresponding to the batch jobs jobl and job2 is close to 300 while the CPU utilization of the one or more threads corresponding to the batch jobs job3 and job4 is close to 450. Hence, by implementing the K-means clustering technique, the one or more threads have been clustered as per the distinct service demands (or the CPU utilization) of the one or more threads.
[038] According to an embodiment of the present disclosure, at step 203, the one or more hardware processors 104 derive an instantaneous value of the CPU for the one or more threads clustered. In an embodiment, the instantaneous value is derived based upon the instantaneous utilization of the CPU by the one or more multi-threaded batch jobs amongst the set of multi-threaded batch jobs. The instantaneous CPU utilization may be derived as a function of time for a set of intervals (that is, for small intervals) for predicting the batch execution time. The process and technical importance of deriving the instantaneous value of the CPU for the one or more threads clustered may now be considered in detail. [039] In general, varying CPU utilization due to caching comprises a major challenge in predictive modelling. The traditional systems and methods do not provide for deriving the instantaneous value of the CPU by measuring the instantaneous utilization of the CPU. The variation in the CPU can be a function of time or a distribution of some other kind. The varying CPU utilization by a batch job during different stages of execution affects the completion time of other batch jobs. The aberrations in predicting the completion time of a batch job may be mitigated by simulating the execution behavior of the one or more threads for smaller intervals of time.
[040] The proposed disclosure provides for capturing the CPU utilization value of a batch job (or more specifically, of a multi-threaded batch job amongst the set of multi-threaded batch jobs) at the set of intervals, and more specifically, at a set of small intervals in isolation and fitting capturing the CPU utilization value captured into a regression function or distribution (for example, linear or non-linear, exponential, polynomial of degree n etc). Thus the CPU utilization of the one or more multi-threaded batch jobs may be represented with the help of an appropriate function of time or by some distribution. Further, instead of using constant CPU utilization for entire run of a thread (amongst the one or more threads clustered), the instantaneous value(s) of the CPU may be derived with the function of time or the distribution of time for each of the smaller intervals during simulation.
[041] In general, a thread is in an active state when it acquires CPU core for execution. The thread then releases server core(s) and switches to block state for some other work, for example, waiting for an input/output to complete or making a remote procedure call. Again, the thread switches to a ready state and competes for the server core(s) before returning to the active state. The completion time £) of any thread is determined by obtaining a sum of time spent the active state (that is, while using the CPU) and a blocked or a ready state, while completion time of a batch job may be represented with time of slowest running threads in the batch job.
[042] In an example scenario where the CPU utilization of a batch job may be represented using a uniform distribution, referring to FIG. 4, let the total execution time of each thread (amongst the one or more threads clustered) i is divided into the set of small intervals of time n of size T such that n x T = £). In an embodiment, within each interval, the idle time td and the execution time te of each thread (amongst the one or more threads clustered) may be determined as below: te = T X CT equation (1) td = T X (1— CT) equation (2) wherein CT is the CPU utilization of the batch job (amongst the set of multi -threaded batch jobs) defined at time t of execution. The proposed disclosure provides for selecting the idle time and execution of the one or more threads from the uniform distribution with average td and te for considering the fluctuations in the idle time and executions as below:
[(1— s) X te, (1 + s) X te ] = te equation (3)
[(1— s) X td, (1 + s) X te] = td equation (4) wherein s represents the variation or range around mean CPU utilization of a batch job (amongst the set of multi-threaded batch jobs) measured at the set of small intervals of time.
[043] In an example implementation, referring to FIG. 5, the instantaneous utilization of the CPU by the one or more multi-threaded batch jobs amongst the set of multi-threaded batch jobs may be referred. Referring to FIG. 5 again, the instantaneous CPU utilization of a multi-threaded job may be observed at regular intervals of time for a job when it is running in isolation. The utilization can be represented with the help of the uniform distribution in the interval approximately [72%, 78%]. Further, in an example implementation of the step 203, referring to FIG. 6, the instantaneous value of the CPU for the one or more threads clustered may be referred, wherein the instantaneous value is derived based upon the instantaneous utilization of the CPU by the one or more multi-threaded batch jobs. Referring to FIG. 6 again, it may be noted that the instantaneous value of the CPU for the one or more threads clustered varies between [68%, 78%] for different set of intervals.
[044] According to an embodiment of the present disclosure, at step 204, the one or more hardware processors 104 auto-design, based upon the one or more threads clustered and the instantaneous value of the CPU, a job execution model, wherein the job execution model comprises a plurality of idle threads and a plurality of threads ready for execution amongst the one or more threads clustered. The job execution model is auto-designed to be simulated in a simulation environment for facilitating the execution time prediction of each job amongst the set of multi-threaded batch jobs (discussed in step 205).
[045] Referring to step 203 above with equations (1) to (4), it may be noted that the proposed disclosure facilitates modelling the execution time of a thread (amongst the one or more threads clustered) in the set of small intervals of time based upon the time spent in the CPU and the idle time outside the CPU. In an example implementation of the step 204, referring to FIG. 7, the job execution model auto-designed based the one or more threads clustered and the instantaneous value of the CPU may be referred.
[046] According to an embodiment of the present disclosure, at step 205(i), simulate the job execution model in a Predicting the Runtime of Batch Workloads (PROWL) simulation environment, wherein the simulation is performed based upon the one or more threads clustered and the CPU utilization of the one or more multi-threaded batch jobs. The PROWL comprises a discrete event simulation environment for facilitating simulations of batch load executions, and thus, facilitating simulating the job execution model.
[047] According to an embodiment of the present disclosure, the simulation in the PROWL is performed based upon the one or more threads clustered and the CPU utilization of the one or more multi-threaded batch jobs. In addition to the one or more threads clustered and the CPU utilization of the one or more multi-threaded batch jobs, a set of inputs to the PROWL simulation environment comprise number of threads in each job (amongst the set of multi threaded batch jobs), number of cores (or the server cores) available in the computing system (that is, the system on which the proposed methodology is being tested and implemented, and other than the system 100), length of each small interval amongst the set of small intervals and the number of simulations to perform. Upon completing simulations, the PROWL simulation environment predicts the minimum and maximum completion time of each job amongst the set of multi-threaded batch jobs identified. The process of simulating the job execution model in the PROWL simulation environment may now be considered in detail by referring to PROWL algorithm below.
PROWL Algorithm-
Data (Inputs) - The one or more threads clustered and the CPU utilization of the one or more multi-threaded batch jobs (along with the set of inputs to the PROWL simulation environment referred in paragraph 46 above)
Result (Output) - Job completion time prediction of each job in the batch (amongst the set of multi-threaded batch jobs)
Initialization:
nCPU < - Number of CPUs, cpui Ue <- CPUidle time
intervaltime <- Time interval, — «Fi :—rst arrival of the J job *—
Function arrival
for (each_job_in_the_batch) {
Calculate cpui Ueand cpubusytime
distribution for interval intervaltime
gapoist <- intervaltime x cpuidle x (1 ± a)
busyDist <- intervaltime x (1 - cpuidie ) x (1 ± s)
Generate uniform distribution of the idle time and busy time for any interval gapDistList(gapDist)
busy D ist List (busy Dist)
for (i = 0; i < jobjhreads ; i = i + 1){
remjtime <- servicedemand of the thread
Add job to job_queue ;
if execution queue. sizeQ < nCPU then
schedule JobjexecutionQ)
end
}
total Jhreads = total_threads + 1
}
/* Start of a new interval of a thread */
Function interval _start
Add job in job_queue ;
if executionjqueue. sizeQ < nCPU then
schedule JobjexecutionQ)
end
/* Completion of a new interval of a thread */
Function interval end
Remove job from execution queue,·
gapTime <- (Uniform) gapDistList. sampleQ
Schedule _after(gapTime); if job jqueue. size j < nCPU then
schedule Job executionQ;
end
/* Start executing job */
Function schedule Job_execution
Pick one thread from the job_queueQ,
Get residual time remjime of the thread;
Add thread to execution queue ;
proc_time <- ( Uniform ) busyDistList. sample()
if procjime >= remjime then
departure ();
remjime <- 0;
end
else
remjime <- ( remjime— procjime )
interval _endQ;
end
/* Removal of thread from the system on completion */
Function departure
Release job from executionjqueue ;
Print job execution time and other stats
if job jqueue. size () > O then
schedule Job executionQ;
end
[048] According to an embodiment of the present disclosure, referring to the PROWL algorithm above, it may be noted that the PROWL simulation environment comprises a set of programming functions corresponding to at least either of the one or more threads or the one or more multi-threaded batch jobs for executing a plurality of tasks (for example, shuffling the one or more threads) corresponding to the batch execution time prediction. The PROWL algorithm comprises a set of five major set of functions (or the programming functions) namely, interval_end, schedule Job _execution, interval _start, arrival and departure. The task executed by each programming function amongst set of programming functions is discussed below in paragraph 52. As mentioned above, the PROWL simulation environment simulates the job execution model auto-designed in the step 204 above.
[049] According to an embodiment of the present disclosure, the step of predicting the batch execution time of each job is preceded by defining each job amongst the set of multi threaded batch jobs in the PROWL simulation environment, and wherein each of the job is defined based upon a total number of threads, a distinct service demand of each thread amongst the total number of threads and CPU utilization of the job to be defined. In an example scenario corresponding to each of the job may be defined based upon the total number of threads, the distinct service demand of each thread and the CPU utilization of the job to be defined as below:
job 1 : No. of threads=2, Service demand thread l=l0s, service demand thread 2=l2s, average CPU utilization of the job=25%
job 1 : No. of threads=3, Service demand thread l=25s, service demand thread 2=22s, Service demand thread l=27s , average CPU utilization of the job=35%
[050] In an embodiment, each job defined in the PROWL simulation environment may execute one or more functions to initialize one or more variables corresponding to the set of multi-threaded batch jobs, wherein the one or more variables comprise the distinct service demand of each of the thread amongst the total number of threads, the CPU utilization of the defined job and a job identification metric of the defined job. In an example scenario, the job 1 defined in the PROWL simulation environment may execute the function arrival only once to initialize the one or more variables like the job identification metric of the job 1 as 6543.
[051] According to an embodiment of the present disclosure, the instantaneous value of the CPU may be derived for each interval amongst the set of intervals for predicting the batch execution time using the equations (3) and (4). In an example scenario, considering a case where an average CPU utilization of a batch job is 35% and it varies uniformly in a range of + or— 1.75%. If the set of intervals are of 1 second each, then CPU utilization for each of this set of intervals will be randomly selected in the range of 33.25 to 36.75 ie. [(35— 1.75, [(35— 1.75)] . The CPU idle time in thel second interval is also randomly selected from the range of 63.25 to 66.75, that is, [(65— 1.75, [(65— 1.75)] . Within each interval, a thread remains idle for CPU idle time and executes for the remaining time.
[052] According to an embodiment of the present disclosure, the PROWL simulation environment comprises a set of queues for the one or more threads, and wherein the set of queues comprise either of at least one ready thread or of at least one active thread amongst the one or more threads. The embodiments of the proposed disclosure facilitates maintaining of two queues by the PROWL simulation environment, that is, a job queue and an execution_queue (referred to in the PROWL algorithm), wherein the job_queue comprises thread(s) amongst the one or more threads that are in ready state and the execution_queue stores thread(s) amongst the one or more threads that are in active state or run state.
[053] In an embodiment, referring to the PROWL algorithm again, it may be noted that interval_start and an interval end functions are used to shuffle the one or more threads between the set of queues, that is, the job queue and the execution_queue upon start or completion of busy time in an interval. The schedule Job execution function moves a batch job (defined in the PROWL simulation environment) from the job_queue to the execution queue upon receiving request(s) from the interval_start and the interval_end functions. The departure function is invoked when the one or more threads exit from the computing system on completion of the service demand.
[054] According to an embodiment of the present disclosure, at step 205(ii), the one or more hardware processors 104 predict based upon the simulation, the execution time of each job amongst the set of multi-threaded batch jobs executing in parallel. Referring to‘predicted job completion time’ in FIGS. 8(a) through 13(b), the execution time predicted for each of the job amongst the set of multi-threaded batch jobs identified by implementing the proposed methodology may be referred. The‘experimental time’ refers to the job completion time predicted using the traditional systems and methods.
[055] Considering one of the figure amongst the FIGS. 8(a) through 13(b), let’s say FIG. 9(a), it may be noted that Jobl, Job2, job3 and Job4 have the predicted completion time of 1693 seconds, 1699 seconds, 1302 seconds, and 1285 seconds respectively.
[056] According to an embodiment of the present disclosure, the experimental analysis to highlight the accuracy of the proposed disclosure as compared to the traditional systems and methods may now be considered. The experiments were performed using two synthetic benchmarks namely, Flexible Input / Output Tester (FIO) and Likwid-bench. As is known in the art, the FIO is a flexible Input / Output (IO) benchmark or a workload generator that repeatedly performs read and write operations on a set of files, while the Likwid-bench is a micro-benchmarking framework that provides for a set of assembly language kernels.
[057] The read and write operations of the FIO were redirected to a set of temporary files stored by using temporary file system tempfs. Thus, the FIO behaves as a CPU intensive benchmark since all the 10 operations may be carried out in a memory (other than the memory 102). Although the FIO allows both forked and threaded jobs, the proposed disclosure used the threaded jobs to replicate behavior of a multi-threaded batch workload. The FIO provides for 19 different kinds of 10 engines, for example, mmap, sync etc. In an embodiment, a set of batch jobs of distinct characteristics may be created by changing the FIO parameters as below (taking an example scenario):
No. of threads: the multi-threading among the jobs was controlled and distinct jobs were generated with 16, 28, 56, 128 threads.
Randomness: Four levels of randomness in the access pattern were used (25%, 50%, 75%, 100%) block sizes. IO requests of sizes 16K, 32K, 64K and 128K were generated.
Thinktime: Thinktime is defined as time duration for which a job is stalled between two IO operations. The thinktime was changed to vary the CPU utilization of a job in the range 10%-100%.
A sample FIO job is as shown in the below:
Sample benchmark (FIO) job file with 56 thread job
[global]
ioengine=sync
filesize=8gb
sync=0
[jobl]
loops=8
numjobs=56
thread=l
#group_reporting= 1
filename=/D/myfile/u ser20
percentage_random=25
rw=randread
thinktime=200
[058] In an embodiment, referring to Tables 1 through 8 below, it may be noted that the FIO jobs of distinct configurations were used to create a batch of jobs for performing evaluating the proposed methodology (disclosure). The Lidwik-bench suite has 89 benchmarks and comprises of a set of features like thread parallelism and thread placement. While executing the Lidwik benchmarks, total number of threads were equally divided between two sockets. [059] Referring to Tables 1 through 8 below again, it may be noted that both the benchmarks were executed on 8 physical (16 logical) and 28 physical (56 logical) core Intel® Xeon machines respectively. Both the machines were executing on CentOS 6.0 Linux system. The benchmarks were run on both the machines with HT on and off. To incorporate the repetitive nature of batch workloads the job configuration was reiterated multiple times in a single run of the FIO and the Likwid-bench. Further, time-series data was obtained from the run-history of individual job which included the service demand, the CPU utilization, and the number of threads spawned by the job. The data was collected using low overhead Linux ps and mpstat commands every five seconds during the job run in isolation. Further, the batches of jobs were formed from the pool of distinct FIO and Likwid jobs available. The batch jobs were executed in parallel.
[060] According to an embodiment of the present disclosure, the results obtained from the experimental analysis discussed in paragraphs 54 to 57 above may now be considered in detail. One job each from the sets of FIO jobs and Likwid jobs were run in isolation on two different servers. The service demand of each thread or cluster of threads of each job along with the CPU utilization of the job was observed for standalone run of the job on both systems with and without HT. Further, all participating jobs were executed in the batch concurrently and record each jobs execution time. It may be noted that the execution time of each of the job in the batch may be predicted using the PROWL simulation environment (or the PROWL algorithm) and compared with the experimental data.
[061] Referring to Tables 1 through 8 again, the job characteristics of four distinct FIO jobs used for experimental purposes may be referred. The four distinct Likwid jobs that were used for experimental purposes comprises of 224, 112 , 56 and 28 threads and 2500, 5K, 10K, 20K iterations respectively and each were executing on a vector size of 1GB. The prediction error in each experiment was computed as:
Figure imgf000021_0001
wherein Ee and EP represent experimental and predictive completion times respectively for a batch job when executing concurrently with other batch jobs.
[062] According to an embodiment of the present disclosure, the analysis of the results may now be considered in detail.
Case 1 - No Hyper-Threading (HT) In an embodiment, a set of four FIO and Likwid benchmark batch jobs were executed in isolation initially and then concurrently on 8 and 28 physical core machines. The CPU was oversubscribed when four jobs run concurrently with 168 threads.
FIO - In an embodiment, one set of experiment was conducted, wherein a set of FIO batch jobs resulting 100% CPU utilization in isolation were considered on 8 physical core machine. Referring to FIG. 8(a), a comparison of experimental and predictive completion time of the set of FIO batch jobs comprising of four concurrently running batch jobs may be referred, wherein each FIO job when running in isolation required 100% CPU utilization. In another embodiment, another set of experiment was conducted, wherein the idle time of the CPU was obtained by introducing think-time(s) in the IO operations of each job amongst the set of FIO batch jobs, wherein each of the job was running in isolation. Referring to FIG. 8(b), a comparison of experimental and predictive completion time. Both sets of experiments were repeated on 28 physical core machine. Referring to FIG. 9(a), results may be referred when each job has 100% CPU utilization. Further referring to FIG. 9(b), a comparison may be observed when each job under-utilized the CPU in isolation.
Likwid - In an embodiment, similar to the FIO jobs, a set of Likwid benchmark batch jobs were executed on two different machines with no HT first. Referring to FIGS. 10(a) and (b), a comparison of experimental and predictive execution time on 8 core and 28 core machines respectively may be referred, wherein the set of Likwid benchmark batch jobs comprise four Likwid benchmark batch jobs running concurrently.
[063] It may be noted that the proposed disclosure, by implementing the PROWL simulation environment and the PROWL algorithm predicts the completion time for the set of FIO batch jobs and the set of Likwid benchmark batch jobs with a very minimal error. Due to variations in the concurrency level vis-a-vis the service demand, a minimal error may be observed. The proposed disclosure does not considers the variations in the concurrency level vis-a-vis the service demand.
Case 2 - Hyper-Threading (HT) - Similar sets of experiments as performed in the case of no hyper-threading were repeated by turning on the HT, wherein the HT resulted in 16 and 56 logical cores on machines.
FIO - In an embodiment, one set of experiment was conducted, wherein the set of FIO batch jobs resulted 100% CPU utilization in isolation. Referring to FIG. 11(a), a comparison of an individual completion time with a predictive set of values for the set of FIO batch jobs comprising of concurrently running batch jobs may be referred. In another embodiment, another set of experiment was conducted, and referring to FIG. 11(b), the idle time of the CPU was obtained by introducing the think-time(s) in the 10 operations of each job amongst the set of FIO batch jobs, wherein each of the job was running in isolation. Referring to FIGS. 12(a) and (b), results obtained by repeating both set of experiments (conducted in the case of HT) on 56 core machines may be referred.
Likwid - Referring to FIGS. 13(a) and (b), results obtained by executing the set of Likwid benchmark batch jobs with HT on two different machines may be referred. Referring to FIGS. 13(a) and (b) again, it may be noted that the proposed disclosure predicts the execution time with significant accuracy in the HT (environment) as well.
Table 1
Figure imgf000023_0001
Table 2
Figure imgf000023_0002
Table 3
Figure imgf000024_0001
Table 4
Figure imgf000024_0002
Table 5
Figure imgf000024_0003
Table 6
Figure imgf000025_0001
Table 7
Figure imgf000025_0002
Table 8
Figure imgf000025_0003
[064] According to an embodiment of the present disclosure, the technical advantages of the proposed disclosure (methodology) may now be considered. The proposed disclosure provides for predicting the execution time of batch jobs correctly while considering the set of multi-threaded batch jobs with the distinct service demand(s) amongst the one or more threads and the time varying CPU utilization. None of the traditional systems and methods provide for clustering the one or more threads from the identified set of multi-threaded batch jobs based upon the distinct service demand of the one or more threads. Further, none of the traditional systems and methods provide for the computing or deriving the instantaneous value of the one or more threads based upon the instantaneous utilization of the CPU. Referring to FIGS. 8(a) through 13(b) once again, it may be noted that the proposed disclosure provides for a very high level of accuracy in predicting the execution time of each of the job amongst the set of multi threaded batch jobs by implementing the proposed methodology.
[065] Considering one of the figure amongst the FIGS. 8(a) through 13(b), let’s say FIG. 9(a), it may be noted that Jobl, Job2, job3 and Job4 have experimental time of 1668 seconds, 1721 seconds, 1369 seconds, and 1358 seconds respectively, which represents the completion time prediction using traditional systems and methods. Using the proposed methodology, Jobl, Job2, job3 and Job4 have the predicted completion time of 1693 seconds, 1699 seconds, 1302 seconds, and 1285 seconds. Thus, the proposed disclosure has a very high level of accuracy in predicting the execution time completion of the multi-threaded batch jobs.
[066] The proposed disclosure also provides for the PROWL simulation environment for predicting the execution time of batch jobs, wherein the PROWL simulation environment simulates the auto-designed job execution model. The PROWL comprises a discrete event simulation environment and has capabilities to perform what-if scenarios for capacity planning purpose(s) corresponding to batch processing environments.
[067] In an embodiment, the memory 102 can be configured to store any data that is associated with predicting the execution time of the multi-threaded batch jobs based upon the distinct service demand of the one or more threads and the instantaneous CPU utilization. In an embodiment, the information pertaining to the set of multi-threaded batch jobs, the one or more threads clustered, the instantaneous value of the CPU derived for the one or more threads clustered etc. and all information pertaining to predicting the execution time of the multi threaded batch jobs is stored in the memory 102. Further, all information (inputs, outputs and so on) pertaining to predicting the execution time of the multi-threaded batch jobs based upon the distinct service demand of the one or more threads and the instantaneous CPU utilization may also be stored in the database, as history data, for reference purpose.
[068] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[069] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application- specific integrated circuit (ASIC), a field- programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.
[070] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
[071] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments. Also, the words“comprising,”“having,”“containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms“a,”“an,” and“the” include plural references unless the context clearly dictates otherwise.
[072] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term“computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[073] It is intended that the disclosure and examples be considered as exemplary only, with a true scope and spirit of disclosed embodiments being indicated by the following claims.

Claims

WE CLAIM:
1. A method of multi-threaded batch jobs execution time prediction by simulating instantaneous utilization of a Central Processing Unit (CPU) and a distinct service demand of one or more threads, the method comprising a processor implemented steps of:
identify, based upon a concurrency level of one or more multi-threaded batch jobs, a set of multi-threaded batch jobs executing in parallel on a set of resources, wherein the set of resources comprise of a memory, a server and a disk (201);
clustering, the one or more threads from the identified set of multi-threaded batch jobs, wherein the one or more threads are clustered based upon the distinct service demand of the one or more threads by implementing a K-means clustering technique, and wherein the distinct service demand comprises a distinct CPU utilization of the one or more threads (202);
deriving, by one or more hardware processors, an instantaneous value of the CPU for the one or more threads clustered, wherein the instantaneous value is derived based upon the instantaneous utilization of the CPU by the one or more multi-threaded batch jobs amongst the set of multi-threaded batch jobs, and wherein the instantaneous CPU utilization is derived as a function of time for a set of intervals for predicting the batch execution time (203);
auto-designing, based upon the one or more threads clustered and the instantaneous value of the CPU, a job execution model, wherein the job execution model comprises a plurality of idle threads and a plurality of threads ready for execution amongst the one or more threads clustered (204); and
predicting, by a Predicting the Runtime of Batch Workloads (PROWL) simulation environment, the batch execution time for each job amongst the set of multi-threaded batch jobs by performing a plurality of steps, wherein the plurality of steps comprise:
(i) simulating the job execution model in the PROWL simulation environment, wherein the simulation is performed based upon the one or more threads clustered and the CPU utilization of the one or more multi-threaded batch jobs (205(i)); and
(ii) predicting, based upon the simulation, the execution time of each job amongst the set of multi-threaded batch jobs executing in parallel (205(ii)).
2. The method of claim 1, wherein the step of predicting the batch execution time based upon the simulation is preceded by defining each of the job amongst the set of multi-threaded batch jobs in the PROWL simulation environment, and wherein each of the job is defined based upon a total number of threads, a distinct service demand of each thread amongst the total number of threads and CPU utilization of the job to be defined.
3. The method of claim 2, wherein the each job defined in the PROWL simulation environment executes one or more functions to initialize one or more variables corresponding to the set of multi-threaded batch jobs, wherein the one or more variables comprise the distinct service demand of each of the thread amongst the total number of threads, the CPU utilization of the defined job and a job identification metric of the defined job.
4. The method of claim 1, wherein the PROWL simulation environment comprises a set of queues for the one or more threads, and wherein the set of queues comprise either of at least one ready thread or of at least one active thread amongst the one or more threads.
5. The method of claim 1, wherein the PROWL simulation environment comprises a set of programming functions corresponding to at least either of the one or more threads or the one or more multi-threaded batch jobs for executing a plurality of tasks corresponding to the batch execution time prediction.
6. The method of claim 1, wherein the instantaneous value of the CPU is derived for each interval amongst the set of intervals for predicting the batch execution time.
7. A system (100) for multi-threaded batch jobs execution time prediction by simulating instantaneous utilization of a Central Processing Unit (CPU) and a distinct service demand of one or more threads, the system (100) comprising:
a memory (102) storing instructions;
one or more communication interfaces (106); and
one or more hardware processors (104) coupled to the memory (102) via the one or more communication interfaces (106), wherein the one or more hardware processors (104) are configured by the instructions to:
identify, based upon a concurrency level of one or more multi-threaded batch jobs, a set of multi-threaded batch jobs executing in parallel on a set of resources, wherein the set of resources comprise of a memory, a server and a disk;
cluster, the one or more threads from the identified set of multi-threaded batch jobs, wherein the one or more threads are clustered based upon the distinct service demand of the one or more threads by implementing a K-means clustering technique, and wherein the distinct service demand comprises a distinct CPU utilization of the one or more threads;
derive an instantaneous value of the CPU for the one or more threads clustered, wherein the instantaneous value is derived based upon the instantaneous utilization of the CPU by the one or more multi-threaded batch jobs amongst the set of multi-threaded batch jobs, and wherein the instantaneous CPU utilization is derived as a function of time for a set of intervals for predicting the batch execution time;
auto-design, based upon the one or more threads clustered and the instantaneous value of the CPU, a job execution model, wherein the job execution model comprises a plurality of idle threads and a plurality of threads ready for execution amongst the one or more threads clustered; and
predict, by a Predicting the Runtime of Batch Workloads (PROWL) simulation environment, the batch execution time for each job amongst the set of multi-threaded batch jobs by performing a plurality of steps, wherein the plurality of steps comprise:
(i) simulate the job execution model in the PROWL simulation environment, wherein the simulation is performed based upon the one or more threads clustered and the CPU utilization of the one or more multi-threaded batch jobs; and
(ii) predict, based upon the simulation, the execution time of each job amongst the set of multi-threaded batch jobs executing in parallel.
8. The system (100) of claim 7, wherein the one or more hardware processors (104) are configured to predict the batch execution time based upon the simulation is by defining each of the job amongst the set of multi-threaded batch jobs in the PROWL simulation environment, and wherein each of the job is defined based upon a total number of threads, a distinct service demand of each thread amongst the total number of threads and CPU utilization of the job to be defined.
9. The system (100) of claim 8, wherein the each job defined in the PROWL simulation environment executes one or more functions to initialize one or more variables corresponding to the set of multi-threaded batch jobs, wherein the one or more variables comprise the distinct service demand of each of the thread amongst the total number of threads, the CPU utilization of the defined job and a job identification metric of the defined job.
10. The system (100) of claim 7, wherein the PROWL simulation environment comprises a set of queues for the one or more threads, and wherein the set of queues comprise either of at least one ready thread or of at least one active thread amongst the one or more threads.
11. The system (100) of claim 7, wherein the PROWL simulation environment comprises a set of programming functions corresponding to at least either of the one or more threads or the one or more multi-threaded batch jobs for executing a plurality of tasks corresponding to the batch execution time prediction.
12. The system (100) of claim 7, wherein the one or more hardware processors (104) are configured to derive the instantaneous value of the CPU for each interval amongst the set of intervals for predicting the batch execution time.
PCT/IB2019/052828 2018-04-07 2019-04-05 Batch jobs execution time prediction using distinct service demand of threads and instantaneous cpu utilization WO2019193570A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN201821013348 2018-04-07
IN201821013348 2018-04-07

Publications (1)

Publication Number Publication Date
WO2019193570A1 true WO2019193570A1 (en) 2019-10-10

Family

ID=68100167

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2019/052828 WO2019193570A1 (en) 2018-04-07 2019-04-05 Batch jobs execution time prediction using distinct service demand of threads and instantaneous cpu utilization

Country Status (1)

Country Link
WO (1) WO2019193570A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782379A (en) * 2020-08-01 2020-10-16 中国人民解放军国防科技大学 Data center job scheduling method and system based on completion efficiency
CN111815146A (en) * 2020-07-02 2020-10-23 上海微亿智造科技有限公司 Quality inspection machine simulation test data method and system
WO2021206711A1 (en) * 2020-04-08 2021-10-14 Hewlett-Packard Development Company, L.P. Execution prediction for compute clusters with multiple cores
CN114938339A (en) * 2022-05-19 2022-08-23 中国农业银行股份有限公司 Data processing method and related device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140047342A1 (en) * 2012-08-07 2014-02-13 Advanced Micro Devices, Inc. System and method for allocating a cluster of nodes for a cloud computing system based on hardware characteristics
US20160232036A1 (en) * 2012-01-13 2016-08-11 Accenture Global Services Limited Performance interference model for managing consolidated workloads in qos-aware clouds

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160232036A1 (en) * 2012-01-13 2016-08-11 Accenture Global Services Limited Performance interference model for managing consolidated workloads in qos-aware clouds
US20140047342A1 (en) * 2012-08-07 2014-02-13 Advanced Micro Devices, Inc. System and method for allocating a cluster of nodes for a cloud computing system based on hardware characteristics

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021206711A1 (en) * 2020-04-08 2021-10-14 Hewlett-Packard Development Company, L.P. Execution prediction for compute clusters with multiple cores
CN111815146A (en) * 2020-07-02 2020-10-23 上海微亿智造科技有限公司 Quality inspection machine simulation test data method and system
CN111782379A (en) * 2020-08-01 2020-10-16 中国人民解放军国防科技大学 Data center job scheduling method and system based on completion efficiency
CN111782379B (en) * 2020-08-01 2023-01-31 中国人民解放军国防科技大学 Data center job scheduling method and system based on completion efficiency
CN114938339A (en) * 2022-05-19 2022-08-23 中国农业银行股份有限公司 Data processing method and related device

Similar Documents

Publication Publication Date Title
JP5897747B2 (en) Fault tolerant batch processing
US8739171B2 (en) High-throughput-computing in a hybrid computing environment
EP2879055B1 (en) System and method facilitating performance prediction of multi-threaded application in presence of resource bottlenecks
Salehi et al. Stochastic-based robust dynamic resource allocation for independent tasks in a heterogeneous computing system
US11175940B2 (en) Scheduling framework for tightly coupled jobs
US20120054771A1 (en) Rescheduling workload in a hybrid computing environment
Cordingly et al. Predicting performance and cost of serverless computing functions with SAAF
US20150339129A1 (en) SYSTEM AND METHOD THEREOF TO OPTIMIZE BOOT TIME OF COMPUTERS HAVING MULTIPLE CPU&#39;s
WO2019193570A1 (en) Batch jobs execution time prediction using distinct service demand of threads and instantaneous cpu utilization
US11188348B2 (en) Hybrid computing device selection analysis
US10133660B2 (en) Dynamically allocated thread-local storage
US11880715B2 (en) Method and system for opportunistic load balancing in neural networks using metadata
Horovitz et al. Faastest-machine learning based cost and performance faas optimization
Kroß et al. Model-based performance evaluation of batch and stream applications for big data
WO2020008392A2 (en) Predicting execution time of memory bandwidth intensive batch jobs
Ardagna et al. Predicting the performance of big data applications on the cloud
Battré et al. Detecting bottlenecks in parallel dag-based data flow programs
Rodríguez-Pascual et al. Job migration in hpc clusters by means of checkpoint/restart
Beaumont et al. Comparison of static and runtime resource allocation strategies for matrix multiplication
Lázaro-Muñoz et al. A tasks reordering model to reduce transfers overhead on GPUs
Beach et al. Integrating acceleration devices using CometCloud
Karimian-Aliabadi et al. Scalable performance modeling and evaluation of MapReduce applications
Cordingly Serverless performance modeling with cpu time accounting and the serverless application analytics framework
Chahal et al. Predicting the Runtime of Memory Intensive Batch Workloads
Comprés Ureña et al. Towards elastic resource management

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19781322

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19781322

Country of ref document: EP

Kind code of ref document: A1