CN117608800A

CN117608800A - Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method

Info

Publication number: CN117608800A
Application number: CN202311666485.3A
Authority: CN
Inventors: 王文逾; 吕俊哲; 赵欢; 吴睿; 郭威
Original assignee: Shanxi Data Technology Co ltd; Shanxi Yunshidai Technology Co ltd
Current assignee: Shanxi Data Technology Co ltd; Shanxi Yunshidai Technology Co ltd
Priority date: 2023-12-06
Filing date: 2023-12-06
Publication date: 2024-02-27

Abstract

The invention belongs to the technical field of scheduling algorithms, and discloses an intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method, which comprises the following specific technical scheme: a user submits a job script, which comprises resources required by the job and an operation script; searching a similar job set of the job to be tested by using a Catboost classification algorithm according to the job characteristics submitted by the user from a historical job library of the user to which the job belongs; in similar operation sets, a gradient lifting decision tree regression algorithm is used, and operation characteristics are combined to predict operation time required by operation; checking whether the current environment meets the job starting condition or not through the predicted value of the resource, if yes, starting the job, otherwise, queuing the job by a queue; the method uses the classification algorithm of Catboost and the regression algorithm of the gradient lifting decision tree to search a similar job set of the job to be tested from a historical job library, predicts the resources required by the job, optimizes the performance of a backfill scheduling strategy and increases the number of the backfillable jobs.

Description

Intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method

Technical Field

The invention belongs to the technical field of scheduling algorithms, and particularly relates to an intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method based on cross-domain super-computing interconnection.

Background

High performance computing (High Performance Computing, HPC) is a computing method that uses supercomputers, clustered systems, and other high performance computing technologies to address large-scale, complex scientific, engineering, and business issues. The operation level of industrial calculation, scientific calculation, intelligent calculation and the like is continuously expanded, and cross-domain super-calculation interconnection gradually plays a role.

Efficient job scheduling schemes are important for supercomputer centers to improve system metrics (e.g., utilization) and user metrics (e.g., turnaround time). The default approach used by current supercomputer centric job schedulers is essentially first come first served based. However, using only the first-come-first-serve method may result in serious fragmentation, resulting in waste of processor resources. Thus, most schedulers use a backfill policy. If the next queued job is not running because there are insufficient processors, the scheduler will still continue to scan the queue and select smaller jobs that are likely to utilize the available resources, thereby increasing the utilization of system resources.

The backfill policy, in turn, requires the runtime of the job to be obtained, whether knowing when the resources required for the current job to run can be met, or determining whether the job is eligible for backfilling. Thus, backfilling requires the user to provide a job run time estimate when submitting the job, and jobs beyond the user estimated time will be ended. However, a large number of facts prove that the user's estimation is often inaccurate. In particular, the user overestimates the run time of the job when submitting the job in order to avoid that the own job is ended without completion, which is very detrimental to the backfill strategy. Thus, how to provide a more accurate job run time for a scheduling system is the most interesting issue in current backfill scheduling strategies.

Gaussier et al use an online regression algorithm to predict job run-time and redesign the loss function of the learning model. Lamar et al propose a Top percentage predictor that uses a hierarchical classification scheme to provide job run-time predictions with better accuracy than the time requested by the user, while overcoming the multi-mode job profile by predicting based on the outlier with the longest duration. Smith et al use genetic algorithms to dynamically determine which working features produced the best similarity definition, then categorize the job based on these features, and create a model for each class to predict execution time.

Disclosure of Invention

The invention provides an intelligent scheduling backfill strategy mixing gradient lifting decision tree optimization method for solving the technical problems in the prior art.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: the intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method comprises the following specific steps:

step S1, submitting a job script by a user, wherein the job script comprises resources required by the job and a running script.

And S2, searching a similar job set of the job to be tested by using a Catboost classification algorithm from a historical job library of a user to which the job belongs according to the job characteristics submitted by the user.

And S3, in the similar operation set, predicting the operation time required by the operation by using a gradient lifting decision tree regression algorithm and combining the operation characteristics. For an actually running job, it is first determined which category the job belongs to, and then the predicted job run time is obtained using the gradient lifting decision tree method for the category.

And S4, checking whether the current environment meets the job starting condition or not through the predicted value of the resource, if yes, starting the job, otherwise, queuing the job in a queue.

In step S4, after the job enqueuing and queuing are completed, it is only necessary to check whether the current environment meets the job starting condition through the predicted value of the resource.

In step S3, the gradient boosting decision tree regression algorithm is:

wherein h is ^t Weak learner for the t-th round selected by H, F ^t-1 For the strong learner obtained in the previous round,as a loss function, L (·,) is a specific loss function, ++>In order to make->H, H is the hypothesis space of the weak learners, H is one weak learner in H, y is the true value, +.>Is the expected value of L (& gtand.);

strong learner F of the t th round ^t The update mode of (2) is as follows:

F ^t ＝F ^t-1 +αh ^t

wherein α is the learning rate.

The Catboost avoids target leakage by using an orderly lifting method, and a new algorithm for processing classification features is added. One popular technique for handling classification features in enhancement trees is thermal coding, which in the case of high radix features (e.g., the "user ID" feature) can result in an unnecessarily large number of new features. One popular approach is to group categories into a limited number of clusters and then apply a thermal encoding, one popular approach is to group the categories by Target Statistics (TS) that estimate the expected target values in each category. CatBoost uses a strategy based on the ranking principle and is inspired by online learning algorithms that obtain training examples sequentially in time. By introducing a manual "time", i.e. a random permutation σ of training examples, for each example, all available "histories" are used to calculate its TS.

The invention provides a backfill scheduling optimization method based on a mixed gradient lifting decision tree algorithm, which provides more accurate scheduling parameters for a backfill scheduling strategy by predicting the operation time of an operation before the operation is executed, is used as the basis of operation resource allocation and operation backfill scheduling, and does not influence the original function of the backfill scheduling. The gradient lifting decision tree algorithm in machine learning can process nonlinear relations, is suitable for various data types, has good robustness on abnormal values and missing values, has high model prediction precision and efficiency, and can effectively analyze characteristics in a data set so as to perform accurate classification and prediction. The method uses the classification algorithm of Catboost and the regression algorithm of the gradient lifting decision tree to search a similar job set of the job to be tested from a historical job library, predicts the resources required by the job, optimizes the performance of a backfill scheduling strategy and increases the number of the backfillable jobs.

Drawings

Fig. 1 is a flow chart of the operation of the present invention.

Detailed Description

In order to make the technical problems, technical schemes and beneficial effects to be solved more clear, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, the optimization method for the intelligent scheduling backfill strategy mixed gradient lifting decision tree comprises the following specific steps:

step S1, submitting a job script by a user, wherein the job script comprises resources required by the job and an operation script;

step S2, searching a similar job set of the job to be tested by using a Catboost classification algorithm from a historical job library of a user to which the job belongs according to the job characteristics submitted by the user;

s3, in the similar operation set, a gradient lifting decision tree regression algorithm is used, and operation characteristics are combined to predict operation time required by operation;

The job characteristics used in step S2 and step S3 are shown in table 1:

table 1 working characteristics used in the present method

The job features used above can be queried using command job information query command scontrol show job < jobid > in the SLURM job scheduling system, and log analysis results.

The foregoing description of the preferred embodiment of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. The intelligent scheduling backfill strategy mixed gradient lifting decision tree optimization method is characterized by comprising the following specific steps of:

2. The method for optimizing the intelligent scheduling backfill strategy hybrid gradient lifting decision tree according to claim 1, wherein in step S4, after the job enqueuing and queuing are completed, whether the current environment meets the job starting condition is checked only by the predicted value of the resource.

3. The intelligent scheduling backfill strategy hybrid gradient boost decision tree optimization method of claim 1, wherein in step S3, the gradient boost decision tree regression algorithm is:

wherein h is ^t Weak learner for the t-th round selected from H, F ^t-1 For the strong learner obtained in the previous round,as a loss function, L (·,) is a specific loss function, ++>In order to make->H, H is the hypothesis space of the weak learners, H is one weak learner in H, y is the true value, +.>Is the expected value of L (& gtand.);

strong learner F of the t th round ^t The update mode of (2) is as follows:

F ^t ＝F ^t-1 +αh ^t

wherein α is the learning rate.