WO2021159638A1

WO2021159638A1 - Method, apparatus and device for scheduling cluster queue resources, and storage medium

Info

Publication number: WO2021159638A1
Application number: PCT/CN2020/093185
Authority: WO
Inventors: 张国庆; 贺波; 万书武; 李均
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-02-12
Filing date: 2020-05-29
Publication date: 2021-08-19
Also published as: CN111338791A

Abstract

The present application provides a method, apparatus and device for scheduling cluster queue resources, and a storage medium. The method comprises: determining each sub-task queue to be processed in the cluster system and each sub-task to be processed in said sub-task queue, and obtaining system resource parameters of the cluster system, queue related parameters of said sub-task queue and task related parameters of said sub-task; inputting the system resource parameters, the queue related parameters and the task related parameters into a preset linear regression model, and obtaining sub-task estimated time corresponding to said sub-task by means of the linear regression model; and comparing the sub-task estimated time with preset standard time, and scheduling queue resources and system resources of said sub-task according to the comparison result of the sub-task estimated time and the standard time. According to the present application, the task completion time is reduced, and the resource scheduling efficiency is improved.

Description

Method, device, equipment and storage medium for scheduling resource of cluster queue

Cross-references to related applications

This application affirms that it enjoys the priority of the Chinese patent application with the application number 202010089180.0 and the name "cluster queue resource scheduling method, device, equipment and storage medium" filed on February 12, 2020. The overall content of the Chinese patent application is based on The reference method is incorporated in this application.

Technical field

This application relates to the technical field of task scheduling, and in particular to a method, device, device, and computer-readable storage medium for scheduling cluster queue resources.

Background technique

In the existing cluster system, a queue is generally set up for each business user, and corresponding processing resources, including cpu and memory, are fixed in advance for each queue. Some business logic tasks require the completion of some computing tasks. The inventor found that the progress of the above-mentioned computing tasks may be delayed due to some reasons (such as cluster environment problems, pre-job failure, etc.), and the processing resources of the queue cannot be adjusted in time, and the accumulation of tasks is prone to occur. The calculation task cannot be completed within the specified time, which reduces the scheduling efficiency of cluster queue resources. Therefore, how to solve the low scheduling efficiency of the existing cluster queue resources has become a technical problem to be solved urgently.

Summary of the invention

The main purpose of this application is to provide a method, device, device, and computer-readable storage medium for scheduling cluster queue resources, aiming to solve the technical problem of low scheduling efficiency of existing cluster queue resources.

In order to achieve the above objective, the present application provides a method for scheduling cluster queue resources. The method for scheduling cluster queue resources is applied to a cluster system. The method for scheduling cluster queue resources includes the following steps:

Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system and the queue correlation of the pending subtask queue Parameters and task-related parameters of the subtasks to be processed;

Inputting the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtaining the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;

The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.

In addition, in order to achieve the above object, this application also provides a cluster queue resource scheduling device, the cluster queue resource scheduling device is applied to a cluster system, and the cluster queue resource scheduling device includes:

The resource parameter acquisition module is used to determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system, the pending subtasks Processing the queue related parameters of the subtask queue and the task related parameters of the to-be-processed subtask;

An estimated time calculation module, configured to input the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtain the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;

The task resource scheduling module is used to compare the estimated time of the subtask with a preset standard time, and according to the comparison result of the estimated time of the subtask and the standard time, compare the queue of the subtask to be processed Resources and system resources are scheduled.

In addition, in order to achieve the above object, this application also provides a scheduling device for cluster queue resources. The scheduling device for cluster queue resources includes a processor, a memory, and a device that is stored on the memory and can be executed by the processor. A scheduler of cluster queue resources, wherein when the scheduler of cluster queue resources is executed by the processor, the following steps are implemented:

In addition, in order to achieve the above-mentioned object, the present application also provides a computer-readable storage medium on which a scheduler for cluster queue resources is stored, wherein when the scheduler for cluster queue resources is executed by a processor , To achieve the following steps:

This application provides a method for scheduling cluster queue resources. The method for scheduling cluster queue resources is applied to a cluster system. The method for scheduling cluster queue resources determines each subtask queue to be processed in the cluster system and the Each of the to-be-processed sub-tasks in the to-be-processed sub-task queue, and obtain the system resource parameters of the cluster system, the queue-related parameters of the to-be-processed sub-task queue, and the task-related parameters of the to-be-processed sub-task; The system resource parameters, queue-related parameters, and task-related parameters are input to a preset linear regression model, and the estimated time of the subtask corresponding to the subtask to be processed is obtained through the linear regression model; and the estimated time of the subtask is compared with the preset The standard time is compared, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the estimated time of the subtask and the standard time. In the above manner, the present application uses a pre-trained linear regression model and combines the system resource parameters corresponding to the cluster system, the queue-related parameters corresponding to the queue of to-be-processed subtasks, and the task-related parameters corresponding to the to-be-processed subtasks to determine the to-be-processed subtasks. Process the estimated time of the sub-task corresponding to the sub-task, and compare the estimated time of the sub-task with the standard time for the task to be completed when the resources are reasonable, so as to determine whether the current resources of the sub-task to be processed are reasonable, and based on the comparison As a result, resource scheduling reduces task completion time, improves resource scheduling efficiency, and solves the technical problem of low scheduling efficiency of existing cluster queue resources.

Summary of the invention

technical problem

The solution to the problem

The beneficial effects of the invention

Brief description of the drawings

Description of the drawings

FIG. 1 is a schematic diagram of the hardware structure of the cluster queue resource scheduling device involved in the solution of the embodiment of the application;

2 is a schematic flowchart of a first embodiment of a method for scheduling cluster queue resources in an application;

FIG. 3 is a schematic flowchart of a second embodiment of a method for scheduling cluster queue resources according to the application;

4 is a schematic flowchart of a third embodiment of a method for scheduling cluster queue resources in an application;

FIG. 5 is a schematic diagram of functional modules of a first embodiment of a scheduling apparatus for cluster queue resources of this application.

Detailed ways

It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.

The scheduling method of cluster queue resources involved in the embodiments of the present application is mainly applied to scheduling equipment of cluster queue resources. The scheduling equipment of cluster queue resources may be devices with display and processing functions such as PCs, portable computers, and mobile terminals.

Referring to FIG. 1, FIG. 1 is a schematic diagram of the hardware structure of the cluster queue resource scheduling device involved in the solution of the embodiment of the application. In this embodiment of the present application, the cluster queue resource scheduling device may include a processor 1001 (for example, a CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Among them, the communication bus 1002 is used to realize the connection and communication between these components; the user interface 1003 may include a display (Display), an input unit such as a keyboard (Keyboard); the network interface 1004 may optionally include a standard wired interface, a wireless interface (Such as WI-FI interface); the memory 1005 can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a disk memory. The memory 1005 can optionally be a storage device independent of the aforementioned processor 1001 .

Those skilled in the art can understand that the hardware structure shown in FIG. 1 does not constitute a limitation on the scheduling equipment of cluster queue resources, and may include more or less components than shown in the figure, or a combination of certain components, or different components. Component arrangement.

Continuing to refer to FIG. 1, the memory 1005 as a computer-readable storage medium in FIG. 1 may include an operating system, a network communication module, and a cluster queue resource scheduler.

In Figure 1, the network communication module is mainly used to connect to the server and communicate with the server; and the processor 1001 can call the scheduler of the cluster queue resource stored in the memory 1005, when the scheduler of the cluster queue resource is When 1001 is executed, the following steps are implemented:

The embodiment of the present application provides a method for scheduling cluster queue resources.

Referring to FIG. 2, FIG. 2 is a schematic flowchart of a first embodiment of a method for scheduling cluster queue resources according to this application.

In this embodiment, the method for scheduling cluster queue resources is applied to a cluster system, and the method for scheduling cluster queue resources includes the following steps:

Step S10: Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and acquire the system resource parameters of the cluster system and the pending subtask queue The queue-related parameters of and the task-related parameters of the to-be-processed subtask;

In the existing cluster system, the progress of the computing task may be delayed due to some reasons (such as cluster environment problems, pre-job failure, etc.), the processing resources of the queue cannot be adjusted in time, and the accumulation of tasks is prone to cause the computing task to fail. It is completed within the specified time, which reduces the scheduling efficiency of cluster queue resources. In order to solve the above technical problems, in this embodiment, the pre-trained linear regression model is used in combination with the system resource parameters corresponding to the cluster system, the queue-related parameters corresponding to the queue of subtasks to be processed, and the task-related parameters corresponding to the subtasks to be processed. Parameter, determine the estimated time of the subtask corresponding to the subtask to be processed, and compare the estimated time of the subtask with the standard time for the completion of the task when the resource is reasonable, so as to determine whether the current resource of the subtask to be processed is Reasonable, and perform resource scheduling based on the comparison result, reducing task completion time and improving resource scheduling efficiency. Specifically, the cluster system includes a master node and a common node. The master node is responsible for splitting the computing task submitted by the user into multiple small tasks and submitting them to multiple CPUs for execution, and is responsible for recording the start time and time of the computing task. Information such as completion time. The cluster system sets up a queue for each user, and allocates corresponding resources to the queue, including cpu and memory. Determine in real time each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue. Then obtain the system resource parameters in the cluster system, such as the number of CPUs currently available in the system and the number of memory available in the system, and the queue-related parameters of the pending subtask queue, such as the maximum number of tasks that the user can submit currently , That is, each queue is configured with the maximum number of tasks that can be submitted. According to the number of tasks submitted by the current queue, the maximum number of tasks that can be submitted by the current user can be calculated; the number of CPUs remaining in the queue; the priority of the queue, that is The priority of the cluster system processing queue; the scheduling strategy for the tasks in the queue, the scheduling strategy includes first-in first-out, fair scheduling and capacity scheduling, etc., the task-related parameters of the subtasks to be processed, such as task type: the calculation of the processing task Engine type, including calculation engine using high-speed memory processing method and calculation engine using hard disk processing; task language: the language of the task code, such as java, phyton or c language; the size of the input data set of the task; the execution parameters of the task: Including the number of tasks divided into subtasks, the size of the application heap in java, and the parallelism of multiple tasks.

In this step, the cluster system is the Yarn system, which is a resource scheduling platform, including the following modules:

1. ResourceManager (RM for short) is a global resource manager responsible for resource management and allocation of the entire system.

2. Each application submitted by ApplicationManager (AM for short) users contains 1 AM, and is responsible for coordinating with RM to obtain resources, assigning the obtained tasks to internal tasks, communicating with Nodemanager to start or stop tasks, and monitor all Task status.

3. Nodemanager, the resource and task manager on each computing node, will regularly report the resource usage of the node, such as CPU, memory, etc., to the RM. In addition, it receives and processes start/stop requests from AM's containner.

4. Container, which belongs to the place where computing tasks are actually performed, is an abstract resource of yarn, which encapsulates the multi-dimensional resources of a computing node, such as CPU, disk, network, etc., when AM applies for resources from RM, RM is returned by AM Resources are represented by Container. Yarn will assign a Container to each task, and the task can only use the resources described in the Container.

In this step, the ApplicationManager and Nodemanager in Yarn store the aforementioned queue data and task data in the form of logs, and the ResourceManager in Yarn also stores the aforementioned cluster system resource data in the form of logs.

Kafka obtains the queue data, task data, and cluster system resource data required in this step by collecting Yarn logs. Kafka is a distributed publishing and message subscription system, which belongs to message middleware and includes the following modules:

1. Broker, the server node of kafka. Broker stores topic data.

2. Topic, each message published to the Kafka cluster has a category, this category is topic, which can be understood as a topic.

3. Producer, the producer and publisher of messages, is a role concept that publishes messages to Kafka topics.

4. Consumer, the consumer of the message, is also a role concept. It reads data from the broker and stores it on the local disk.

In this step, a Yarn Broker node is created in Kafka, and a topic is created in the Yarn broker node. The topic is used to collect Yarn log information that records the above task data, queue data, and cluster system data. It should be noted that Yarn supports sending the generated logs to Kafka through log4j Appender. Configure the specified Kafka consumer address and topic in the relevant configuration file of Yarn to complete the real-time sending of the logs generated by Yarn to Kafka. Realize the collection of Yarn log information by Kafka.

In addition, Kafka stores the log information collected in the cluster system Yarn in Hbase in real time. Hbase is a highly reliable, high-performance, column-oriented, and scalable distributed storage system built on hdfs, including the following modules:

1. HMaster: The management service of the HBase cluster, which is mainly used to manage the user's addition, deletion, modification, and query operations on the Table, manage the load balancing of the HRegionserver, adjust the region distribution, and the region split and merge migration.

2. HRegionserver: The core module of the Hbase cluster, manages a series of HRegion objects allocated by HMaster, responds to user I/O requests, and reads and writes data to HDFS.

3. HRegion: Each Region object corresponds to a Region in the Table, which is the result of the horizontal split of the Table. Each HRegion is composed of multiple HStores;

4. HStore: It is the core of Hbase's storage, which is where the region data is actually stored. A region is composed of multiple stores. The store includes the memstore in the memory and the storefile on the disk. When the memstore reaches a certain threshold, it will be written to the disk storefile, and the storefile will be stored in the HDFS in HFile format.

5. HLog: Stored on HDFS, data will be written to HLog before being written to memstore. The main function of HLog is to prevent the data written to memstore from being lost when the host is down, which is used for data recovery.

The interaction between Kafka and HBase is mainly to insert the data collected by Kafka into HBase in real time, and call Kafka through a java program, and the Hbase API is implemented:

1. Pull the yarn log consumption data in Kafka every 10s.

2. Split the read data into key: value format, and format it, such as date format. 3. Open access to HBase, and insert the processed data into the designed table.

It should be noted that in the actual environment, the designated ports of the physical machines where the yarn, kafka, and Hbas services are located can access each other. Further, in order to minimize network transmission services, in this embodiment, the yarn, kafka, and Hbas are located The physical machines are set on the same network segment and the same switch.

Step S20: Input the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtain the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;

In this embodiment, according to the preset cycle, for each task currently submitted to the cluster system queue, the real-time information of the queue, task, and cluster system resources is collected according to the preset cycle, and the real-time information is input into linear Regression model is used to predict the remaining completion time of the task. That is, after acquiring system resource parameters, queue-related parameters, and task-related parameters, input the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, that is, a pre-trained linear regression model. According to the linear regression model, the time for the subtask to be processed to complete the remaining tasks is budgeted, and the estimated time of the subtask corresponding to the subtask to be processed is obtained.

Further, before step S20, the method includes:

Determining the model to be trained and the model training data corresponding to the subtask to be processed in the preset model training data;

Acquiring system resource training parameters, queue-related training parameters, and task-related training parameters in the model training data as independent variable parameters in the model to be trained;

Acquiring the estimated time of the target subtask in the model training data as a dependent variable parameter in the model to be trained;

According to the linear regression formula, the independent variable parameters and the dependent variable parameters, the model to be trained is trained to generate the linear regression model.

Wherein, the independent variable parameters and the dependent variable parameters are input into the linear regression formula to obtain the initial regression parameters after training, wherein the linear regression formula is:

y=b0+b1X1+b2X2+...+bnXn, X1, X2, Xn are independent variable parameters, y is dependent variable parameters, b0, b1, bn are initial regression parameters;

Adjust the initial regression parameters according to the least squares estimation algorithm to generate target regression parameters;

According to the target regression parameter and the model to be trained, the linear regression model is generated.

In this embodiment, the training data is collected in advance and input into the linear regression model for training; firstly, the data of system resources, queue parameters, and task parameters are collected and input into the linear regression model as independent variables. Among them, the above-mentioned queue and The resource-related information of the cluster system is collected according to a preset cycle, for example, every 30 seconds. The above-mentioned task-related information is collected when the task is created. Then, the estimated time of the target subtask in the model training data, that is, the current remaining execution time of the task, is collected as the dependent variable of the linear regression model. Finally, input the data collected in the independent variable and the dependent variable into the linear regression model. The formula of the linear regression model is as follows, where y is the dependent variable and x1～xn are the independent variables:

y=b0+b1X1+b2X2+...+bnXn;

In the linear regression model, based on the above linear regression formula, the estimated values of the regression parameters b0, b1, b2...bn are initially obtained, and then the least squares estimation algorithm is used to calculate the regression parameters b0, b1, b2..... .bn is adjusted step by step to improve the accuracy of the model.

Step S30: Compare the predicted time of the subtask with a preset standard time, and compare the queue resources and system resources of the subtask to be processed according to the comparison result of the predicted time of the subtask and the standard time. Schedule.

In this embodiment, after the estimated time of the subtask is obtained through a linear regression model, the estimated time of the subtask is compared with a pre-designed standard time. Wherein, the standard time is the time for the subtask to be processed to complete the task when the resources are reasonable. Then according to the comparison result, if the estimated time of the subtask is greater than the standard time, it means that the resources of the subtask to be processed are reasonable and no scheduling is required. If the estimated time of the subtask is less than the standard time, it means the subtask to be processed Insufficient resources, you can increase resources for it. In a specific embodiment, for a certain task that has been submitted to the queue, the remaining completion time is continuously estimated for multiple times according to the preset period, so as to obtain the predicted value of its overall execution time. If the predicted value for multiple consecutive times is high At the historical average level, adding queue resources for the task means increasing the number of CPUs for the queue. Among them, while increasing the number of CPUs, the corresponding memory resources are automatically increased in proportion, and the management users can be notified via email at the same time.

This embodiment provides a method for scheduling cluster queue resources. The method for scheduling cluster queue resources is applied to a cluster system. The method for scheduling cluster queue resources determines each subtask queue to be processed and all subtask queues in the cluster system. Each of the to-be-processed sub-tasks in the to-be-processed sub-task queue is obtained, and the system resource parameters of the cluster system, the queue-related parameters of the to-be-processed sub-task queue, and the task-related parameters of the to-be-processed sub-task are obtained; The system resource parameters, queue-related parameters, and task-related parameters are input to a preset linear regression model, and the estimated time of the subtask corresponding to the subtask to be processed is obtained through the linear regression model; The standard time is set for comparison, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the estimated time of the subtask and the standard time. In the above manner, the present application uses a pre-trained linear regression model and combines the system resource parameters corresponding to the cluster system, the queue-related parameters corresponding to the queue of to-be-processed subtasks, and the task-related parameters corresponding to the to-be-processed subtasks to determine the to-be-processed subtasks. Process the estimated time of the sub-task corresponding to the sub-task, and compare the estimated time of the sub-task with the standard time for the task to be completed when the resources are reasonable, so as to determine whether the current resources of the sub-task to be processed are reasonable, and based on the comparison As a result, resource scheduling reduces task completion time, improves resource scheduling efficiency, and solves the technical problem of low scheduling efficiency of existing cluster queue resources.

Referring to FIG. 3, FIG. 3 is a schematic flowchart of a second embodiment of a method for scheduling cluster queue resources of this application.

Based on the embodiment shown in FIG. 2 above, in this embodiment, the step S20 specifically includes:

Step S21: Obtain system resource parameters, queue-related parameters, and task-related parameters in a preset period, and calculate the estimated times of multiple subtasks corresponding to the subtasks to be processed in the preset period through the linear regression model;

In this embodiment, in order to reduce the time budget error, the estimated time of multiple subtasks is calculated according to the preset period, so as to obtain the predicted value of the overall execution time. If the expected time of multiple consecutive subtasks is higher than the standard time, queue resources should be added for the task. Specifically, for each task currently submitted to the cluster system queue, collect real-time information of the queue, task, and cluster system resources according to a preset period, that is, system real-time resource parameters, queue real-time related parameters, and task real-time related parameters , Input the real-time information into the linear regression model to obtain the prediction of each remaining completion time corresponding to the to-be-processed subtask, that is, the estimated time of multiple subtasks.

Further, the step S30 specifically includes:

Step S31, comparing the estimated times of the multiple subtasks with the standard time;

In step S32, if the estimated time of the subtasks exceeding the preset number is higher than the standard time, the queue resources and system resources of the subtasks to be processed are increased.

In this embodiment, the estimated times of the multiple subtasks are respectively compared with the standard time to determine whether the expected times of the subtasks of the to-be-processed subtasks are higher than the standard time for multiple consecutive times. Wherein, the completion time of multiple historical tasks of the subtask to be processed within a preset period is acquired, and the average value of the completion time of the multiple historical tasks is calculated as the standard time. When the number of predicted subtasks that are higher than the standard time exceeds the preset number, it means that the predicted value of the overall execution time of the subtask to be processed is higher than the reasonable time, and the subtask to be processed should be increased H.

Further, if the estimated time of the subtasks exceeding the preset number is higher than the standard time, the step of increasing the queue resources and system resources of the subtasks to be processed specifically includes:

If the predicted time of the subtasks exceeding the preset number is higher than the standard time, obtaining the average time difference between the predicted time of the preset number of subtasks and the standard time;

Determine the resource to be added corresponding to the subtask to be processed according to a preset resource scheduling table and the average time difference, and increase the queue resource and system resource of the subtask to be processed according to the resource to be added.

In this embodiment, if the estimated time of the subtasks exceeding the preset number is higher than the standard time, it means that the overall execution time of the subtasks to be processed has timed out, and resources need to be added for them. Obtain the average value of the expected time of the multiple subtasks of the subtask to be processed, and calculate the difference between the average value and the standard time as the average time difference. In order to facilitate resource scheduling, the corresponding resource scheduling relationship is set in advance according to the difference between the actual task processing time of the subtask to be processed and the standard time. The resource scheduling table can be automatically set based on big data analysis, or it can be set manually according to actual needs. After determining the resources to be added for the to-be-processed sub-task, first determine the maximum number of resources in the queue of the to-be-processed task to which the to-be-processed sub-task belongs, and determine whether the to-be-added resources of the to-be-processed sub-task exceed the total number of resources. If the maximum number of resources is not exceeded, it is determined whether the remaining resources in the queue meet the allocation of resources to be added; if not, the scheduling is performed from the system resources in the cluster system to which they belong.

Referring to FIG. 4, FIG. 4 is a schematic flowchart of a third embodiment of a method for scheduling cluster queue resources according to this application.

Based on the embodiment shown in FIG. 3, in this embodiment, after the step S30, the method further includes:

Step S40: Determine the current estimated time of the subtask to be processed according to the scheduled resource parameters and the linear regression model, and start a timer to monitor the scheduled execution time of the subtask to be processed;

Step S50: When detecting that the execution time reaches the estimated time of the current subtask, detect whether the subtask to be processed is executed successfully;

In step S60, if the subtask to be processed is executed successfully, the queue resources and system resources occupied by the subtask to be processed are released.

In this embodiment, in order to improve resource utilization, after adding resources to the subtask to be processed, a timer is started to monitor the task execution status of the subtask to be processed. And according to the monitoring results, the task resources are released and recovered. That is, the scheduled real-time resource parameters are obtained, and the real-time resource parameters are input to the linear regression model, so as to determine the current subtask estimated time of the subtask to be processed. And when the timer reaches the estimated time of the current subtask, it is checked whether the task is completed. If the task is executed, the added queue resource is recovered, and if the task is not executed, the added queue resource is not recovered.

In addition, the embodiment of the present application also provides an apparatus for scheduling cluster queue resources.

Referring to FIG. 5, FIG. 5 is a schematic diagram of functional modules of a first embodiment of a cluster queue resource scheduling apparatus of this application.

In this embodiment, the device for scheduling cluster queue resources is applied to a cluster system, and the device for scheduling cluster queue resources includes:

The resource parameter acquisition module 10 is used to determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system, the Queue-related parameters of the queue of to-be-processed subtasks and task-related parameters of the to-be-processed subtasks;

The estimated time calculation module 20 is configured to input the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtain the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model ；

The task resource scheduling module 30 is configured to compare the estimated time of the subtask with a preset standard time, and according to the comparison result of the estimated time of the subtask and the standard time, determine the status of the subtask to be processed Queue resources and system resources are scheduled.

Further, the device for scheduling cluster queue resources further includes a model training module, and the model training module is configured to:

Further, the model training module is also used for:

Input the independent variable parameters and the dependent variable parameters into the linear regression formula to obtain the initial regression parameters after training, wherein the linear regression formula is:

Further, the resource parameter acquisition module 10 is also used for:

Acquiring system resource parameters, queue-related parameters, and task-related parameters in a preset period, and calculating, through the linear regression model, the estimated times of multiple subtasks corresponding to the subtasks to be processed in the preset period;

Further, the task resource scheduling module 30 is also used for:

Comparing the estimated time of the multiple subtasks with the standard time;

If the estimated time of the subtasks exceeding the preset number is higher than the standard time, the queue resources and system resources of the subtasks to be processed are increased.

Further, the task resource scheduling module 30 is also used for:

Further, the estimated time calculation module 20 is also used for:

Acquire the completion time of multiple historical tasks of the subtask to be processed within a preset period, and calculate an average value of the completion time of the multiple historical tasks as the standard time.

Further, the device for scheduling cluster queue resources further includes a resource recovery module, and the resource recovery module is configured to:

According to the scheduled resource parameters and the linear regression model, determine the current estimated time of the subtask to be processed, and start a timer to monitor the execution time of the scheduled subtask;

When it is detected that the execution time reaches the estimated time of the current subtask, detecting whether the execution of the subtask to be processed is successful;

If the subtask to be processed is executed successfully, the queue resources and system resources occupied by the subtask to be processed are released.

Among them, each module in the above-mentioned cluster queue resource scheduling device corresponds to each step in the above-mentioned cluster queue resource scheduling method embodiment, and its functions and implementation processes will not be repeated here.

In addition, the embodiment of the present application also provides a computer-readable storage medium.

The computer-readable storage medium of the present application stores a scheduler for cluster queue resources, where the scheduler for cluster queue resources is executed by a processor, the following steps are implemented:

Among them, the method implemented when the scheduling program of the cluster queue resource is executed can refer to the various embodiments of the scheduling method of the cluster queue resource of the present application, which will not be repeated here.

It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or system including a series of elements not only includes those elements, It also includes other elements that are not explicitly listed, or elements inherent to the process, method, article, or system. Without more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or system that includes the element.

The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the superiority or inferiority of the embodiments.

Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a computer-readable storage medium as described above (such as In ROM/RAM, magnetic disk, optical disk), the computer-readable storage medium can be non-volatile or volatile, and includes a number of instructions to enable a terminal device (which can be a mobile phone, a computer, a server, An air conditioner, or a network device, etc.) execute the method described in each embodiment of the present application.

The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims

A method for scheduling cluster queue resources, wherein the method for scheduling cluster queue resources is applied to a cluster system, and the method for scheduling cluster queue resources includes the following steps:

Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system and the queue correlation of the pending subtask queue Parameters and task-related parameters of the subtasks to be processed;

Inputting the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtaining the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;

The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
The method for scheduling cluster queue resources according to claim 1, wherein said inputting said system resource parameters, queue-related parameters and task-related parameters into a preset linear regression model, and obtaining said linear regression model through said linear regression model. Before the step of the estimated time of the subtask corresponding to the subtask to be processed, it also includes:

Determining the model to be trained and the model training data corresponding to the subtask to be processed in the preset model training data;

Acquiring system resource training parameters, queue-related training parameters, and task-related training parameters in the model training data as independent variable parameters in the model to be trained;

Acquiring the estimated time of the target subtask in the model training data as a dependent variable parameter in the model to be trained;

According to the linear regression formula, the independent variable parameters and the dependent variable parameters, the model to be trained is trained to generate the linear regression model.
The method for scheduling cluster queue resources according to claim 2, wherein the step of training the model to be trained to generate the linear regression model according to a linear regression formula, the independent variable parameter and the dependent variable parameter Specifically:

Input the independent variable parameters and the dependent variable parameters into the linear regression formula to obtain the initial regression parameters after training, wherein the linear regression formula is:

y=b0+b1X1+b2X2+...+bnXn, X1, X2, Xn are independent variable parameters, y is dependent variable parameters, b0, b1, bn are initial regression parameters;

Adjust the initial regression parameters according to the least squares estimation algorithm to generate target regression parameters;

According to the target regression parameter and the model to be trained, the linear regression model is generated.
The method for scheduling cluster queue resources according to claim 1, wherein said inputting said system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtaining said linear regression model through said linear regression model. The steps for the estimated time of the subtask corresponding to the subtask to be processed specifically include:

Acquiring system resource parameters, queue-related parameters, and task-related parameters in a preset period, and calculating, through the linear regression model, the estimated times of multiple subtasks corresponding to the subtasks to be processed in the preset period;

The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are compared according to the comparison result of the predicted time of the subtask and the standard time. The scheduling steps specifically include:

Comparing the estimated time of the multiple subtasks with the standard time;

If the estimated time of the subtasks exceeding the preset number is higher than the standard time, the queue resources and system resources of the subtasks to be processed are increased.
The method for scheduling cluster queue resources according to claim 4, wherein if the estimated time of the subtasks exceeding the preset number is higher than the standard time, the queue resources of the subtasks to be processed and the system are increased. The resource steps specifically include:

If the predicted time of the subtasks exceeding the preset number is higher than the standard time, obtaining the average time difference between the predicted time of the preset number of subtasks and the standard time;

Determine the resource to be added corresponding to the subtask to be processed according to a preset resource scheduling table and the average time difference, and increase the queue resource and system resource of the subtask to be processed according to the resource to be added.
The method for scheduling cluster queue resources according to claim 4, wherein before the step of comparing the estimated times of the multiple subtasks with the standard time, the method further comprises:

Acquire the completion time of multiple historical tasks of the subtask to be processed within a preset period, and calculate an average value of the completion time of the multiple historical tasks as the standard time.
The method for scheduling cluster queue resources according to any one of claims 1 to 6, wherein the predicted time of the subtask is compared with a preset standard time, and the predicted time of the subtask is compared with the The comparison result of the standard time, after the step of scheduling the queue resources and system resources of the to-be-processed subtasks, further includes:

According to the scheduled resource parameters and the linear regression model, determine the current estimated time of the subtask to be processed, and start a timer to monitor the execution time of the scheduled subtask;

When it is detected that the execution time reaches the estimated time of the current subtask, detecting whether the execution of the subtask to be processed is successful;

If the subtask to be processed is executed successfully, the queue resources and system resources occupied by the subtask to be processed are released.
A scheduling device for cluster queue resources, wherein the scheduling device for cluster queue resources is applied to a cluster system, and the scheduling device for cluster queue resources includes:

The resource parameter acquisition module is used to determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system, the pending subtasks Processing the queue related parameters of the subtask queue and the task related parameters of the to-be-processed subtask;

An estimated time calculation module, configured to input the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtain the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;

The task resource scheduling module is used to compare the estimated time of the subtask with a preset standard time, and according to the comparison result of the estimated time of the subtask and the standard time, compare the queue of the subtask to be processed Resources and system resources are scheduled.
The device for scheduling cluster queue resources according to claim 8, wherein the device further comprises a model training module, and the model training module is used for:

Determining the model to be trained and the model training data corresponding to the subtask to be processed in the preset model training data;

Acquiring system resource training parameters, queue-related training parameters, and task-related training parameters in the model training data as independent variable parameters in the model to be trained;

Acquiring the estimated time of the target subtask in the model training data as a dependent variable parameter in the model to be trained;

According to the linear regression formula, the independent variable parameters and the dependent variable parameters, the model to be trained is trained to generate the linear regression model.
The cluster queue resource scheduling device according to claim 9, wherein the model training module is further used for:

Input the independent variable parameters and the dependent variable parameters into the linear regression formula to obtain the initial regression parameters after training, wherein the linear regression formula is:

y=b0+b1X1+b2X2+...+bnXn, X1, X2, Xn are independent variable parameters, y is dependent variable parameters, b0, b1, bn are initial regression parameters;

Adjust the initial regression parameters according to the least squares estimation algorithm to generate target regression parameters;

According to the target regression parameter and the model to be trained, the linear regression model is generated.
The cluster queue resource scheduling device according to claim 8, wherein the estimated time calculation module is specifically configured to:

Acquiring system resource parameters, queue-related parameters, and task-related parameters in a preset period, and using the linear regression model to calculate the estimated time of multiple subtasks corresponding to the subtasks to be processed in the preset period;

The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are compared according to the comparison result of the predicted time of the subtask and the standard time. The scheduling steps specifically include:

Comparing the estimated time of the multiple subtasks with the standard time;

If the estimated time of the subtasks exceeding the preset number is higher than the standard time, the queue resources and system resources of the subtasks to be processed are increased.
The cluster queue resource scheduling device according to claim 11, wherein the estimated time calculation module increases the to-be-processed time if the estimated time of the subtasks exceeding the preset number is higher than the standard time. When the queue resource of the subtask and the function of the system resource, it is specifically used for:

If the predicted time of the subtasks exceeding the preset number is higher than the standard time, obtaining the average time difference between the predicted time of the preset number of subtasks and the standard time;

Determine the resource to be added corresponding to the subtask to be processed according to a preset resource scheduling table and the average time difference, and increase the queue resource and system resource of the subtask to be processed according to the resource to be added.
The cluster queue resource scheduling device according to claim 11, wherein the estimated time calculation module is further configured to:

Acquire the completion time of multiple historical tasks of the subtask to be processed within a preset period, and calculate an average value of the completion time of the multiple historical tasks as the standard time.
The device for scheduling cluster queue resources according to any one of claims 8 to 13, wherein the device further comprises a resource recovery module, and the resource recovery module is configured to:

According to the scheduled resource parameters and the linear regression model, determine the current estimated time of the subtask to be processed, and start a timer to monitor the execution time of the scheduled subtask;

When it is detected that the execution time reaches the estimated time of the current subtask, detecting whether the execution of the subtask to be processed is successful;

If the subtask to be processed is executed successfully, the queue resources and system resources occupied by the subtask to be processed are released.
A scheduling device for cluster queue resources, wherein the scheduling device for cluster queue resources includes a processor, a memory, and a cluster queue resource scheduler stored on the memory and executable by the processor, wherein When the scheduler of the cluster queue resource is executed by the processor, the following steps are implemented:

Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system and the queue correlation of the pending subtask queue Parameters and task-related parameters of the subtasks to be processed;

Inputting the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtaining the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;

The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
The cluster queue resource scheduling device according to claim 15, wherein, in the said system resource parameters, queue-related parameters and task-related parameters are input to a preset linear regression model, and all the parameters are obtained through the linear regression model. Before the step of estimating the time of the subtask corresponding to the to-be-processed subtask, the following steps are also implemented when the scheduler of the cluster queue resource is executed by the processor:

Determining the model to be trained and the model training data corresponding to the subtask to be processed in the preset model training data;

Acquiring system resource training parameters, queue-related training parameters, and task-related training parameters in the model training data as independent variable parameters in the model to be trained;

Acquiring the estimated time of the target subtask in the model training data as a dependent variable parameter in the model to be trained;

According to the linear regression formula, the independent variable parameters and the dependent variable parameters, the model to be trained is trained to generate the linear regression model.
The cluster queue resource scheduling device according to claim 16, wherein, after implementing the linear regression formula, the independent variable parameters, and the dependent variable parameters, the model to be trained is trained to generate the linear regression model The specific steps include:

Input the independent variable parameters and the dependent variable parameters into the linear regression formula to obtain the initial regression parameters after training, wherein the linear regression formula is:

y=b0+b1X1+b2X2+...+bnXn, X1, X2, Xn are independent variable parameters, y is dependent variable parameters, b0, b1, bn are initial regression parameters;

Adjust the initial regression parameters according to the least squares estimation algorithm to generate target regression parameters;

According to the target regression parameter and the model to be trained, the linear regression model is generated.
A computer-readable storage medium, wherein a cluster queue resource scheduler is stored on the computer-readable storage medium, and when the cluster queue resource scheduler is executed by a processor, the following steps are implemented:

Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system and the queue correlation of the pending subtask queue Parameters and task-related parameters of the subtasks to be processed;

Inputting the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtaining the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;

The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
The computer-readable storage medium according to claim 18, wherein in the said system resource parameters, queue-related parameters and task-related parameters are input into a preset linear regression model, and the said linear regression model is used to obtain the Before the step of estimating the time of the subtask corresponding to the subtask to be processed, when the scheduler of the cluster queue resource is executed by the processor, the following steps are also implemented:

Determining the model to be trained and the model training data corresponding to the subtask to be processed in the preset model training data;

Acquiring system resource training parameters, queue-related training parameters, and task-related training parameters in the model training data as independent variable parameters in the model to be trained;

Acquiring the estimated time of the target subtask in the model training data as a dependent variable parameter in the model to be trained;

According to the linear regression formula, the independent variable parameters and the dependent variable parameters, the model to be trained is trained to generate the linear regression model.
The computer-readable storage medium according to claim 19, wherein in the realization of said linear regression formula, said independent variable parameters and said dependent variable parameters, training said model to be trained to generate said linear regression model The steps include:

Input the independent variable parameters and the dependent variable parameters into the linear regression formula to obtain the initial regression parameters after training, wherein the linear regression formula is:

y=b0+b1X1+b2X2+...+bnXn, X1, X2, Xn are independent variable parameters, y is dependent variable parameters, b0, b1, bn are initial regression parameters;

Adjust the initial regression parameters according to the least squares estimation algorithm to generate target regression parameters;

According to the target regression parameter and the model to be trained, the linear regression model is generated.