CN117608800A - Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method - Google Patents

Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method Download PDF

Info

Publication number
CN117608800A
CN117608800A CN202311666485.3A CN202311666485A CN117608800A CN 117608800 A CN117608800 A CN 117608800A CN 202311666485 A CN202311666485 A CN 202311666485A CN 117608800 A CN117608800 A CN 117608800A
Authority
CN
China
Prior art keywords
job
decision tree
gradient lifting
backfill
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311666485.3A
Other languages
Chinese (zh)
Inventor
王文逾
吕俊哲
赵欢
吴睿
郭威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi Data Technology Co ltd
Shanxi Yunshidai Technology Co ltd
Original Assignee
Shanxi Data Technology Co ltd
Shanxi Yunshidai Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi Data Technology Co ltd, Shanxi Yunshidai Technology Co ltd filed Critical Shanxi Data Technology Co ltd
Priority to CN202311666485.3A priority Critical patent/CN117608800A/en
Publication of CN117608800A publication Critical patent/CN117608800A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • G06F9/4887Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues involving deadlines, e.g. rate based, periodic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of scheduling algorithms, and discloses an intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method, which comprises the following specific technical scheme: a user submits a job script, which comprises resources required by the job and an operation script; searching a similar job set of the job to be tested by using a Catboost classification algorithm according to the job characteristics submitted by the user from a historical job library of the user to which the job belongs; in similar operation sets, a gradient lifting decision tree regression algorithm is used, and operation characteristics are combined to predict operation time required by operation; checking whether the current environment meets the job starting condition or not through the predicted value of the resource, if yes, starting the job, otherwise, queuing the job by a queue; the method uses the classification algorithm of Catboost and the regression algorithm of the gradient lifting decision tree to search a similar job set of the job to be tested from a historical job library, predicts the resources required by the job, optimizes the performance of a backfill scheduling strategy and increases the number of the backfillable jobs.

Description

Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method
Technical Field
The invention belongs to the technical field of scheduling algorithms, and particularly relates to an intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method based on cross-domain super-computing interconnection.
Background
High performance computing (High Performance Computing, HPC) is a computing method that uses supercomputers, clustered systems, and other high performance computing technologies to address large-scale, complex scientific, engineering, and business issues. The operation level of industrial calculation, scientific calculation, intelligent calculation and the like is continuously expanded, and cross-domain super-calculation interconnection gradually plays a role.
Efficient job scheduling schemes are important for supercomputer centers to improve system metrics (e.g., utilization) and user metrics (e.g., turnaround time). The default approach used by current supercomputer centric job schedulers is essentially first come first served based. However, using only the first-come-first-serve method may result in serious fragmentation, resulting in waste of processor resources. Thus, most schedulers use a backfill policy. If the next queued job is not running because there are insufficient processors, the scheduler will still continue to scan the queue and select smaller jobs that are likely to utilize the available resources, thereby increasing the utilization of system resources.
The backfill policy, in turn, requires the runtime of the job to be obtained, whether knowing when the resources required for the current job to run can be met, or determining whether the job is eligible for backfilling. Thus, backfilling requires the user to provide a job run time estimate when submitting the job, and jobs beyond the user estimated time will be ended. However, a large number of facts prove that the user's estimation is often inaccurate. In particular, the user overestimates the run time of the job when submitting the job in order to avoid that the own job is ended without completion, which is very detrimental to the backfill strategy. Thus, how to provide a more accurate job run time for a scheduling system is the most interesting issue in current backfill scheduling strategies.
Gaussier et al use an online regression algorithm to predict job run-time and redesign the loss function of the learning model. Lamar et al propose a Top percentage predictor that uses a hierarchical classification scheme to provide job run-time predictions with better accuracy than the time requested by the user, while overcoming the multi-mode job profile by predicting based on the outlier with the longest duration. Smith et al use genetic algorithms to dynamically determine which working features produced the best similarity definition, then categorize the job based on these features, and create a model for each class to predict execution time.
Disclosure of Invention
The invention provides an intelligent scheduling backfill strategy mixing gradient lifting decision tree optimization method for solving the technical problems in the prior art.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: the intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method comprises the following specific steps:
step S1, submitting a job script by a user, wherein the job script comprises resources required by the job and a running script.
And S2, searching a similar job set of the job to be tested by using a Catboost classification algorithm from a historical job library of a user to which the job belongs according to the job characteristics submitted by the user.
And S3, in the similar operation set, predicting the operation time required by the operation by using a gradient lifting decision tree regression algorithm and combining the operation characteristics. For an actually running job, it is first determined which category the job belongs to, and then the predicted job run time is obtained using the gradient lifting decision tree method for the category.
And S4, checking whether the current environment meets the job starting condition or not through the predicted value of the resource, if yes, starting the job, otherwise, queuing the job in a queue.
In step S4, after the job enqueuing and queuing are completed, it is only necessary to check whether the current environment meets the job starting condition through the predicted value of the resource.
In step S3, the gradient boosting decision tree regression algorithm is:
wherein h is t Weak learner for the t-th round selected by H, F t-1 For the strong learner obtained in the previous round,as a loss function, L (·,) is a specific loss function, ++>In order to make->H, H is the hypothesis space of the weak learners, H is one weak learner in H, y is the true value, +.>Is the expected value of L (& gtand.);
strong learner F of the t th round t The update mode of (2) is as follows:
F t =F t-1 +αh t
wherein α is the learning rate.
The Catboost avoids target leakage by using an orderly lifting method, and a new algorithm for processing classification features is added. One popular technique for handling classification features in enhancement trees is thermal coding, which in the case of high radix features (e.g., the "user ID" feature) can result in an unnecessarily large number of new features. One popular approach is to group categories into a limited number of clusters and then apply a thermal encoding, one popular approach is to group the categories by Target Statistics (TS) that estimate the expected target values in each category. CatBoost uses a strategy based on the ranking principle and is inspired by online learning algorithms that obtain training examples sequentially in time. By introducing a manual "time", i.e. a random permutation σ of training examples, for each example, all available "histories" are used to calculate its TS.
The invention provides a backfill scheduling optimization method based on a mixed gradient lifting decision tree algorithm, which provides more accurate scheduling parameters for a backfill scheduling strategy by predicting the operation time of an operation before the operation is executed, is used as the basis of operation resource allocation and operation backfill scheduling, and does not influence the original function of the backfill scheduling. The gradient lifting decision tree algorithm in machine learning can process nonlinear relations, is suitable for various data types, has good robustness on abnormal values and missing values, has high model prediction precision and efficiency, and can effectively analyze characteristics in a data set so as to perform accurate classification and prediction. The method uses the classification algorithm of Catboost and the regression algorithm of the gradient lifting decision tree to search a similar job set of the job to be tested from a historical job library, predicts the resources required by the job, optimizes the performance of a backfill scheduling strategy and increases the number of the backfillable jobs.
Drawings
Fig. 1 is a flow chart of the operation of the present invention.
Detailed Description
In order to make the technical problems, technical schemes and beneficial effects to be solved more clear, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, the optimization method for the intelligent scheduling backfill strategy mixed gradient lifting decision tree comprises the following specific steps:
step S1, submitting a job script by a user, wherein the job script comprises resources required by the job and an operation script;
step S2, searching a similar job set of the job to be tested by using a Catboost classification algorithm from a historical job library of a user to which the job belongs according to the job characteristics submitted by the user;
s3, in the similar operation set, a gradient lifting decision tree regression algorithm is used, and operation characteristics are combined to predict operation time required by operation;
and S4, checking whether the current environment meets the job starting condition or not through the predicted value of the resource, if yes, starting the job, otherwise, queuing the job in a queue.
In step S4, after the job enqueuing and queuing are completed, it is only necessary to check whether the current environment meets the job starting condition through the predicted value of the resource.
The job characteristics used in step S2 and step S3 are shown in table 1:
table 1 working characteristics used in the present method
The job features used above can be queried using command job information query command scontrol show job < jobid > in the SLURM job scheduling system, and log analysis results.
The invention provides a backfill scheduling optimization method based on a mixed gradient lifting decision tree algorithm, which provides more accurate scheduling parameters for a backfill scheduling strategy by predicting the operation time of an operation before the operation is executed, is used as the basis of operation resource allocation and operation backfill scheduling, and does not influence the original function of the backfill scheduling. The gradient lifting decision tree algorithm in machine learning can process nonlinear relations, is suitable for various data types, has good robustness on abnormal values and missing values, has high model prediction precision and efficiency, and can effectively analyze characteristics in a data set so as to perform accurate classification and prediction. The method uses the classification algorithm of Catboost and the regression algorithm of the gradient lifting decision tree to search a similar job set of the job to be tested from a historical job library, predicts the resources required by the job, optimizes the performance of a backfill scheduling strategy and increases the number of the backfillable jobs.
The foregoing description of the preferred embodiment of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (3)

1. The intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method is characterized by comprising the following specific steps of:
step S1, submitting a job script by a user, wherein the job script comprises resources required by the job and an operation script;
step S2, searching a similar job set of the job to be tested by using a Catboost classification algorithm from a historical job library of a user to which the job belongs according to the job characteristics submitted by the user;
s3, in the similar operation set, a gradient lifting decision tree regression algorithm is used, and operation characteristics are combined to predict operation time required by operation;
and S4, checking whether the current environment meets the job starting condition or not through the predicted value of the resource, if yes, starting the job, otherwise, queuing the job in a queue.
2. The method for optimizing the intelligent scheduling backfill strategy hybrid gradient lifting decision tree according to claim 1, wherein in step S4, after the job enqueuing and queuing are completed, whether the current environment meets the job starting condition is checked only by the predicted value of the resource.
3. The intelligent scheduling backfill strategy hybrid gradient boost decision tree optimization method of claim 1, wherein in step S3, the gradient boost decision tree regression algorithm is:
wherein h is t Weak learner for the t-th round selected from H, F t-1 For the strong learner obtained in the previous round,as a loss function, L (·,) is a specific loss function, ++>In order to make->H, H is the hypothesis space of the weak learners, H is one weak learner in H, y is the true value, +.>Is the expected value of L (& gtand.);
strong learner F of the t th round t The update mode of (2) is as follows:
F t =F t-1 +αh t
wherein α is the learning rate.
CN202311666485.3A 2023-12-06 2023-12-06 Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method Pending CN117608800A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311666485.3A CN117608800A (en) 2023-12-06 2023-12-06 Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311666485.3A CN117608800A (en) 2023-12-06 2023-12-06 Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method

Publications (1)

Publication Number Publication Date
CN117608800A true CN117608800A (en) 2024-02-27

Family

ID=89946125

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311666485.3A Pending CN117608800A (en) 2023-12-06 2023-12-06 Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method

Country Status (1)

Country Link
CN (1) CN117608800A (en)

Similar Documents

Publication Publication Date Title
CN108182115B (en) Virtual machine load balancing method in cloud environment
CN108108233B (en) Cluster job scheduling method and system for task multi-copy execution
Nadeem et al. Optimizing execution time predictions of scientific workflow applications in the grid through evolutionary programming
CN113127173B (en) Heterogeneous sensing cluster scheduling method and device
CN114217930A (en) Accelerator system resource optimization management method based on mixed task scheduling
Wang et al. Prophet: Fine-grained Load Balancing for Parallel Training of Large-scale MoE Models
CN111930485B (en) Job scheduling method based on performance expression
CN110928659B (en) Numerical value pool system remote multi-platform access method with self-adaptive function
CN109271295B (en) Abnormal operation prediction method in cloud cluster environment
CN115145709B (en) Low-carbon big data artificial intelligence method and medical health state system
CN117608800A (en) Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method
Yi et al. Research on scheduling of two types of tasks in multi-cloud environment based on multi-task optimization algorithm
CN116360921A (en) Cloud platform resource optimal scheduling method and system for electric power Internet of things
CN113010288B (en) Scheduling method and device of cloud resources and computer storage medium
Dong et al. A general analysis framework for soft real-time tasks
Wei et al. Composite rules selection using reinforcement learning for dynamic job-shop scheduling
Jiang et al. An optimized resource scheduling strategy for Hadoop speculative execution based on non-cooperative game schemes
Kareem et al. Optimal CPU Jobs Scheduling Method Based on Simulated Annealing Algorithm
CN118733283B (en) Intelligent optimization method and system for data acquisition task of big data system
Martínez et al. Using accurate AIC-based performance models to improve the scheduling of parallel applications
CN118133969B (en) Large language model reasoning acceleration method and system
CN109298921B (en) Distributed computing task scheduling algorithm based on Bayesian network
Samal et al. CPU Burst-Time Estimation using Machine Learning
CN115454645A (en) Job runtime prediction method, apparatus, system, and computer-readable storage medium
CN114138095B (en) Power consumption processing method and device for internet data center IDC and readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination