CN117608800A - Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method - Google Patents
Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method Download PDFInfo
- Publication number
- CN117608800A CN117608800A CN202311666485.3A CN202311666485A CN117608800A CN 117608800 A CN117608800 A CN 117608800A CN 202311666485 A CN202311666485 A CN 202311666485A CN 117608800 A CN117608800 A CN 117608800A
- Authority
- CN
- China
- Prior art keywords
- job
- decision tree
- gradient lifting
- backfill
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003066 decision tree Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000005457 optimization Methods 0.000 title claims abstract description 11
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 19
- 238000007635 classification algorithm Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000013468 resource allocation Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- WFWLQNSHRPWKFK-UHFFFAOYSA-N Tegafur Chemical compound O=C1NC(=O)C(F)=CN1C1OCCC1 WFWLQNSHRPWKFK-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/485—Task life-cycle, e.g. stopping, restarting, resuming execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G06F9/4887—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues involving deadlines, e.g. rate based, periodic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to the technical field of scheduling algorithms, and discloses an intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method, which comprises the following specific technical scheme: a user submits a job script, which comprises resources required by the job and an operation script; searching a similar job set of the job to be tested by using a Catboost classification algorithm according to the job characteristics submitted by the user from a historical job library of the user to which the job belongs; in similar operation sets, a gradient lifting decision tree regression algorithm is used, and operation characteristics are combined to predict operation time required by operation; checking whether the current environment meets the job starting condition or not through the predicted value of the resource, if yes, starting the job, otherwise, queuing the job by a queue; the method uses the classification algorithm of Catboost and the regression algorithm of the gradient lifting decision tree to search a similar job set of the job to be tested from a historical job library, predicts the resources required by the job, optimizes the performance of a backfill scheduling strategy and increases the number of the backfillable jobs.
Description
Technical Field
The invention belongs to the technical field of scheduling algorithms, and particularly relates to an intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method based on cross-domain super-computing interconnection.
Background
High performance computing (High Performance Computing, HPC) is a computing method that uses supercomputers, clustered systems, and other high performance computing technologies to address large-scale, complex scientific, engineering, and business issues. The operation level of industrial calculation, scientific calculation, intelligent calculation and the like is continuously expanded, and cross-domain super-calculation interconnection gradually plays a role.
Efficient job scheduling schemes are important for supercomputer centers to improve system metrics (e.g., utilization) and user metrics (e.g., turnaround time). The default approach used by current supercomputer centric job schedulers is essentially first come first served based. However, using only the first-come-first-serve method may result in serious fragmentation, resulting in waste of processor resources. Thus, most schedulers use a backfill policy. If the next queued job is not running because there are insufficient processors, the scheduler will still continue to scan the queue and select smaller jobs that are likely to utilize the available resources, thereby increasing the utilization of system resources.
The backfill policy, in turn, requires the runtime of the job to be obtained, whether knowing when the resources required for the current job to run can be met, or determining whether the job is eligible for backfilling. Thus, backfilling requires the user to provide a job run time estimate when submitting the job, and jobs beyond the user estimated time will be ended. However, a large number of facts prove that the user's estimation is often inaccurate. In particular, the user overestimates the run time of the job when submitting the job in order to avoid that the own job is ended without completion, which is very detrimental to the backfill strategy. Thus, how to provide a more accurate job run time for a scheduling system is the most interesting issue in current backfill scheduling strategies.
Gaussier et al use an online regression algorithm to predict job run-time and redesign the loss function of the learning model. Lamar et al propose a Top percentage predictor that uses a hierarchical classification scheme to provide job run-time predictions with better accuracy than the time requested by the user, while overcoming the multi-mode job profile by predicting based on the outlier with the longest duration. Smith et al use genetic algorithms to dynamically determine which working features produced the best similarity definition, then categorize the job based on these features, and create a model for each class to predict execution time.
Disclosure of Invention
The invention provides an intelligent scheduling backfill strategy mixing gradient lifting decision tree optimization method for solving the technical problems in the prior art.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: the intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method comprises the following specific steps:
step S1, submitting a job script by a user, wherein the job script comprises resources required by the job and a running script.
And S2, searching a similar job set of the job to be tested by using a Catboost classification algorithm from a historical job library of a user to which the job belongs according to the job characteristics submitted by the user.
And S3, in the similar operation set, predicting the operation time required by the operation by using a gradient lifting decision tree regression algorithm and combining the operation characteristics. For an actually running job, it is first determined which category the job belongs to, and then the predicted job run time is obtained using the gradient lifting decision tree method for the category.
And S4, checking whether the current environment meets the job starting condition or not through the predicted value of the resource, if yes, starting the job, otherwise, queuing the job in a queue.
In step S4, after the job enqueuing and queuing are completed, it is only necessary to check whether the current environment meets the job starting condition through the predicted value of the resource.
In step S3, the gradient boosting decision tree regression algorithm is:
wherein h is t Weak learner for the t-th round selected by H, F t-1 For the strong learner obtained in the previous round,as a loss function, L (·,) is a specific loss function, ++>In order to make->H, H is the hypothesis space of the weak learners, H is one weak learner in H, y is the true value, +.>Is the expected value of L (& gtand.);
strong learner F of the t th round t The update mode of (2) is as follows:
F t =F t-1 +αh t
wherein α is the learning rate.
The Catboost avoids target leakage by using an orderly lifting method, and a new algorithm for processing classification features is added. One popular technique for handling classification features in enhancement trees is thermal coding, which in the case of high radix features (e.g., the "user ID" feature) can result in an unnecessarily large number of new features. One popular approach is to group categories into a limited number of clusters and then apply a thermal encoding, one popular approach is to group the categories by Target Statistics (TS) that estimate the expected target values in each category. CatBoost uses a strategy based on the ranking principle and is inspired by online learning algorithms that obtain training examples sequentially in time. By introducing a manual "time", i.e. a random permutation σ of training examples, for each example, all available "histories" are used to calculate its TS.
The invention provides a backfill scheduling optimization method based on a mixed gradient lifting decision tree algorithm, which provides more accurate scheduling parameters for a backfill scheduling strategy by predicting the operation time of an operation before the operation is executed, is used as the basis of operation resource allocation and operation backfill scheduling, and does not influence the original function of the backfill scheduling. The gradient lifting decision tree algorithm in machine learning can process nonlinear relations, is suitable for various data types, has good robustness on abnormal values and missing values, has high model prediction precision and efficiency, and can effectively analyze characteristics in a data set so as to perform accurate classification and prediction. The method uses the classification algorithm of Catboost and the regression algorithm of the gradient lifting decision tree to search a similar job set of the job to be tested from a historical job library, predicts the resources required by the job, optimizes the performance of a backfill scheduling strategy and increases the number of the backfillable jobs.
Drawings
Fig. 1 is a flow chart of the operation of the present invention.
Detailed Description
In order to make the technical problems, technical schemes and beneficial effects to be solved more clear, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, the optimization method for the intelligent scheduling backfill strategy mixed gradient lifting decision tree comprises the following specific steps:
step S1, submitting a job script by a user, wherein the job script comprises resources required by the job and an operation script;
step S2, searching a similar job set of the job to be tested by using a Catboost classification algorithm from a historical job library of a user to which the job belongs according to the job characteristics submitted by the user;
s3, in the similar operation set, a gradient lifting decision tree regression algorithm is used, and operation characteristics are combined to predict operation time required by operation;
and S4, checking whether the current environment meets the job starting condition or not through the predicted value of the resource, if yes, starting the job, otherwise, queuing the job in a queue.
In step S4, after the job enqueuing and queuing are completed, it is only necessary to check whether the current environment meets the job starting condition through the predicted value of the resource.
The job characteristics used in step S2 and step S3 are shown in table 1:
table 1 working characteristics used in the present method
The job features used above can be queried using command job information query command scontrol show job < jobid > in the SLURM job scheduling system, and log analysis results.
The invention provides a backfill scheduling optimization method based on a mixed gradient lifting decision tree algorithm, which provides more accurate scheduling parameters for a backfill scheduling strategy by predicting the operation time of an operation before the operation is executed, is used as the basis of operation resource allocation and operation backfill scheduling, and does not influence the original function of the backfill scheduling. The gradient lifting decision tree algorithm in machine learning can process nonlinear relations, is suitable for various data types, has good robustness on abnormal values and missing values, has high model prediction precision and efficiency, and can effectively analyze characteristics in a data set so as to perform accurate classification and prediction. The method uses the classification algorithm of Catboost and the regression algorithm of the gradient lifting decision tree to search a similar job set of the job to be tested from a historical job library, predicts the resources required by the job, optimizes the performance of a backfill scheduling strategy and increases the number of the backfillable jobs.
The foregoing description of the preferred embodiment of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.
Claims (3)
1. The intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method is characterized by comprising the following specific steps of:
step S1, submitting a job script by a user, wherein the job script comprises resources required by the job and an operation script;
step S2, searching a similar job set of the job to be tested by using a Catboost classification algorithm from a historical job library of a user to which the job belongs according to the job characteristics submitted by the user;
s3, in the similar operation set, a gradient lifting decision tree regression algorithm is used, and operation characteristics are combined to predict operation time required by operation;
and S4, checking whether the current environment meets the job starting condition or not through the predicted value of the resource, if yes, starting the job, otherwise, queuing the job in a queue.
2. The method for optimizing the intelligent scheduling backfill strategy hybrid gradient lifting decision tree according to claim 1, wherein in step S4, after the job enqueuing and queuing are completed, whether the current environment meets the job starting condition is checked only by the predicted value of the resource.
3. The intelligent scheduling backfill strategy hybrid gradient boost decision tree optimization method of claim 1, wherein in step S3, the gradient boost decision tree regression algorithm is:
wherein h is t Weak learner for the t-th round selected from H, F t-1 For the strong learner obtained in the previous round,as a loss function, L (·,) is a specific loss function, ++>In order to make->H, H is the hypothesis space of the weak learners, H is one weak learner in H, y is the true value, +.>Is the expected value of L (& gtand.);
strong learner F of the t th round t The update mode of (2) is as follows:
F t =F t-1 +αh t
wherein α is the learning rate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311666485.3A CN117608800A (en) | 2023-12-06 | 2023-12-06 | Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311666485.3A CN117608800A (en) | 2023-12-06 | 2023-12-06 | Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117608800A true CN117608800A (en) | 2024-02-27 |
Family
ID=89946125
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311666485.3A Pending CN117608800A (en) | 2023-12-06 | 2023-12-06 | Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117608800A (en) |
-
2023
- 2023-12-06 CN CN202311666485.3A patent/CN117608800A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108182115B (en) | Virtual machine load balancing method in cloud environment | |
CN108108233B (en) | Cluster job scheduling method and system for task multi-copy execution | |
Nadeem et al. | Optimizing execution time predictions of scientific workflow applications in the grid through evolutionary programming | |
CN113127173B (en) | Heterogeneous sensing cluster scheduling method and device | |
CN114217930A (en) | Accelerator system resource optimization management method based on mixed task scheduling | |
Wang et al. | Prophet: Fine-grained Load Balancing for Parallel Training of Large-scale MoE Models | |
CN111930485B (en) | Job scheduling method based on performance expression | |
CN110928659B (en) | Numerical value pool system remote multi-platform access method with self-adaptive function | |
CN109271295B (en) | Abnormal operation prediction method in cloud cluster environment | |
CN115145709B (en) | Low-carbon big data artificial intelligence method and medical health state system | |
CN117608800A (en) | Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method | |
Yi et al. | Research on scheduling of two types of tasks in multi-cloud environment based on multi-task optimization algorithm | |
CN116360921A (en) | Cloud platform resource optimal scheduling method and system for electric power Internet of things | |
CN113010288B (en) | Scheduling method and device of cloud resources and computer storage medium | |
Dong et al. | A general analysis framework for soft real-time tasks | |
Wei et al. | Composite rules selection using reinforcement learning for dynamic job-shop scheduling | |
Jiang et al. | An optimized resource scheduling strategy for Hadoop speculative execution based on non-cooperative game schemes | |
Kareem et al. | Optimal CPU Jobs Scheduling Method Based on Simulated Annealing Algorithm | |
CN118733283B (en) | Intelligent optimization method and system for data acquisition task of big data system | |
Martínez et al. | Using accurate AIC-based performance models to improve the scheduling of parallel applications | |
CN118133969B (en) | Large language model reasoning acceleration method and system | |
CN109298921B (en) | Distributed computing task scheduling algorithm based on Bayesian network | |
Samal et al. | CPU Burst-Time Estimation using Machine Learning | |
CN115454645A (en) | Job runtime prediction method, apparatus, system, and computer-readable storage medium | |
CN114138095B (en) | Power consumption processing method and device for internet data center IDC and readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |