WO2021159638A1 - Method, apparatus and device for scheduling cluster queue resources, and storage medium - Google Patents

Method, apparatus and device for scheduling cluster queue resources, and storage medium Download PDF

Info

Publication number
WO2021159638A1
WO2021159638A1 PCT/CN2020/093185 CN2020093185W WO2021159638A1 WO 2021159638 A1 WO2021159638 A1 WO 2021159638A1 CN 2020093185 W CN2020093185 W CN 2020093185W WO 2021159638 A1 WO2021159638 A1 WO 2021159638A1
Authority
WO
WIPO (PCT)
Prior art keywords
subtask
parameters
queue
model
processed
Prior art date
Application number
PCT/CN2020/093185
Other languages
French (fr)
Chinese (zh)
Inventor
张国庆
贺波
万书武
李均
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021159638A1 publication Critical patent/WO2021159638A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Definitions

  • This application relates to the technical field of task scheduling, and in particular to a method, device, device, and computer-readable storage medium for scheduling cluster queue resources.
  • a queue is generally set up for each business user, and corresponding processing resources, including cpu and memory, are fixed in advance for each queue.
  • Some business logic tasks require the completion of some computing tasks.
  • the inventor found that the progress of the above-mentioned computing tasks may be delayed due to some reasons (such as cluster environment problems, pre-job failure, etc.), and the processing resources of the queue cannot be adjusted in time, and the accumulation of tasks is prone to occur.
  • the calculation task cannot be completed within the specified time, which reduces the scheduling efficiency of cluster queue resources. Therefore, how to solve the low scheduling efficiency of the existing cluster queue resources has become a technical problem to be solved urgently.
  • the main purpose of this application is to provide a method, device, device, and computer-readable storage medium for scheduling cluster queue resources, aiming to solve the technical problem of low scheduling efficiency of existing cluster queue resources.
  • the present application provides a method for scheduling cluster queue resources.
  • the method for scheduling cluster queue resources is applied to a cluster system.
  • the method for scheduling cluster queue resources includes the following steps:
  • the predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
  • this application also provides a cluster queue resource scheduling device, the cluster queue resource scheduling device is applied to a cluster system, and the cluster queue resource scheduling device includes:
  • the resource parameter acquisition module is used to determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system, the pending subtasks Processing the queue related parameters of the subtask queue and the task related parameters of the to-be-processed subtask;
  • An estimated time calculation module configured to input the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtain the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;
  • the task resource scheduling module is used to compare the estimated time of the subtask with a preset standard time, and according to the comparison result of the estimated time of the subtask and the standard time, compare the queue of the subtask to be processed Resources and system resources are scheduled.
  • this application also provides a scheduling device for cluster queue resources.
  • the scheduling device for cluster queue resources includes a processor, a memory, and a device that is stored on the memory and can be executed by the processor.
  • a scheduler of cluster queue resources wherein when the scheduler of cluster queue resources is executed by the processor, the following steps are implemented:
  • the predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
  • the present application also provides a computer-readable storage medium on which a scheduler for cluster queue resources is stored, wherein when the scheduler for cluster queue resources is executed by a processor , To achieve the following steps:
  • the predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
  • This application provides a method for scheduling cluster queue resources.
  • the method for scheduling cluster queue resources is applied to a cluster system.
  • the method for scheduling cluster queue resources determines each subtask queue to be processed in the cluster system and the Each of the to-be-processed sub-tasks in the to-be-processed sub-task queue, and obtain the system resource parameters of the cluster system, the queue-related parameters of the to-be-processed sub-task queue, and the task-related parameters of the to-be-processed sub-task;
  • the system resource parameters, queue-related parameters, and task-related parameters are input to a preset linear regression model, and the estimated time of the subtask corresponding to the subtask to be processed is obtained through the linear regression model; and the estimated time of the subtask is compared with the preset
  • the standard time is compared, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the estimated time of the subtask and the standard time.
  • the present application uses a pre-trained linear regression model and combines the system resource parameters corresponding to the cluster system, the queue-related parameters corresponding to the queue of to-be-processed subtasks, and the task-related parameters corresponding to the to-be-processed subtasks to determine the to-be-processed subtasks.
  • Process the estimated time of the sub-task corresponding to the sub-task and compare the estimated time of the sub-task with the standard time for the task to be completed when the resources are reasonable, so as to determine whether the current resources of the sub-task to be processed are reasonable, and based on the comparison
  • resource scheduling reduces task completion time, improves resource scheduling efficiency, and solves the technical problem of low scheduling efficiency of existing cluster queue resources.
  • FIG. 1 is a schematic diagram of the hardware structure of the cluster queue resource scheduling device involved in the solution of the embodiment of the application;
  • FIG. 2 is a schematic flowchart of a first embodiment of a method for scheduling cluster queue resources in an application
  • FIG. 3 is a schematic flowchart of a second embodiment of a method for scheduling cluster queue resources according to the application
  • FIG. 4 is a schematic flowchart of a third embodiment of a method for scheduling cluster queue resources in an application
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of a scheduling apparatus for cluster queue resources of this application.
  • the scheduling method of cluster queue resources involved in the embodiments of the present application is mainly applied to scheduling equipment of cluster queue resources.
  • the scheduling equipment of cluster queue resources may be devices with display and processing functions such as PCs, portable computers, and mobile terminals.
  • FIG. 1 is a schematic diagram of the hardware structure of the cluster queue resource scheduling device involved in the solution of the embodiment of the application.
  • the cluster queue resource scheduling device may include a processor 1001 (for example, a CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • the communication bus 1002 is used to realize the connection and communication between these components;
  • the user interface 1003 may include a display (Display), an input unit such as a keyboard (Keyboard);
  • the network interface 1004 may optionally include a standard wired interface, a wireless interface (Such as WI-FI interface);
  • the memory 1005 can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a disk memory.
  • the memory 1005 can optionally be a storage device independent of the aforementioned processor 1001 .
  • FIG. 1 does not constitute a limitation on the scheduling equipment of cluster queue resources, and may include more or less components than shown in the figure, or a combination of certain components, or different components. Component arrangement.
  • the memory 1005 as a computer-readable storage medium in FIG. 1 may include an operating system, a network communication module, and a cluster queue resource scheduler.
  • the network communication module is mainly used to connect to the server and communicate with the server; and the processor 1001 can call the scheduler of the cluster queue resource stored in the memory 1005, when the scheduler of the cluster queue resource is When 1001 is executed, the following steps are implemented:
  • the predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
  • the embodiment of the present application provides a method for scheduling cluster queue resources.
  • FIG. 2 is a schematic flowchart of a first embodiment of a method for scheduling cluster queue resources according to this application.
  • the method for scheduling cluster queue resources is applied to a cluster system, and the method for scheduling cluster queue resources includes the following steps:
  • Step S10 Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and acquire the system resource parameters of the cluster system and the pending subtask queue The queue-related parameters of and the task-related parameters of the to-be-processed subtask;
  • the progress of the computing task may be delayed due to some reasons (such as cluster environment problems, pre-job failure, etc.), the processing resources of the queue cannot be adjusted in time, and the accumulation of tasks is prone to cause the computing task to fail. It is completed within the specified time, which reduces the scheduling efficiency of cluster queue resources.
  • the pre-trained linear regression model is used in combination with the system resource parameters corresponding to the cluster system, the queue-related parameters corresponding to the queue of subtasks to be processed, and the task-related parameters corresponding to the subtasks to be processed.
  • the cluster system includes a master node and a common node.
  • the master node is responsible for splitting the computing task submitted by the user into multiple small tasks and submitting them to multiple CPUs for execution, and is responsible for recording the start time and time of the computing task. Information such as completion time.
  • the cluster system sets up a queue for each user, and allocates corresponding resources to the queue, including cpu and memory.
  • each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue obtain the system resource parameters in the cluster system, such as the number of CPUs currently available in the system and the number of memory available in the system, and the queue-related parameters of the pending subtask queue, such as the maximum number of tasks that the user can submit currently , That is, each queue is configured with the maximum number of tasks that can be submitted.
  • the scheduling strategy for the tasks in the queue includes first-in first-out, fair scheduling and capacity scheduling, etc., the task-related parameters of the subtasks to be processed, such as task type: the calculation of the processing task Engine type, including calculation engine using high-speed memory processing method and calculation engine using hard disk processing; task language: the language of the task code, such as java, phyton or c language; the size of the input data set of the task; the execution parameters of the task: Including the number of tasks divided into subtasks, the size of the application heap in java, and the parallelism of multiple tasks.
  • task type the calculation of the processing task Engine type, including calculation engine using high-speed memory processing method and calculation engine using hard disk processing
  • task language the language of the task code, such as java, phyton or c language
  • the size of the input data set of the task such as java, phyton or c language
  • the execution parameters of the task Including the number of tasks divided into subtasks, the size of the application heap in java, and the parallelism of
  • the cluster system is the Yarn system, which is a resource scheduling platform, including the following modules:
  • ResourceManager (RM for short) is a global resource manager responsible for resource management and allocation of the entire system.
  • Each application submitted by ApplicationManager contains 1 AM, and is responsible for coordinating with RM to obtain resources, assigning the obtained tasks to internal tasks, communicating with Nodemanager to start or stop tasks, and monitor all Task status.
  • Nodemanager the resource and task manager on each computing node, will regularly report the resource usage of the node, such as CPU, memory, etc., to the RM. In addition, it receives and processes start/stop requests from AM's containner.
  • Container which belongs to the place where computing tasks are actually performed, is an abstract resource of yarn, which encapsulates the multi-dimensional resources of a computing node, such as CPU, disk, network, etc., when AM applies for resources from RM, RM is returned by AM Resources are represented by Container. Yarn will assign a Container to each task, and the task can only use the resources described in the Container.
  • the ApplicationManager and Nodemanager in Yarn store the aforementioned queue data and task data in the form of logs, and the ResourceManager in Yarn also stores the aforementioned cluster system resource data in the form of logs.
  • Kafka obtains the queue data, task data, and cluster system resource data required in this step by collecting Yarn logs.
  • Kafka is a distributed publishing and message subscription system, which belongs to message middleware and includes the following modules:
  • Broker the server node of kafka. Broker stores topic data.
  • Topic each message published to the Kafka cluster has a category, this category is topic, which can be understood as a topic.
  • Producer the producer and publisher of messages, is a role concept that publishes messages to Kafka topics.
  • Consumer the consumer of the message, is also a role concept. It reads data from the broker and stores it on the local disk.
  • a Yarn Broker node is created in Kafka, and a topic is created in the Yarn broker node.
  • the topic is used to collect Yarn log information that records the above task data, queue data, and cluster system data.
  • Yarn supports sending the generated logs to Kafka through log4j Appender. Configure the specified Kafka consumer address and topic in the relevant configuration file of Yarn to complete the real-time sending of the logs generated by Yarn to Kafka. Realize the collection of Yarn log information by Kafka.
  • Kafka stores the log information collected in the cluster system Yarn in Hbase in real time.
  • Hbase is a highly reliable, high-performance, column-oriented, and scalable distributed storage system built on hdfs, including the following modules:
  • HMaster The management service of the HBase cluster, which is mainly used to manage the user's addition, deletion, modification, and query operations on the Table, manage the load balancing of the HRegionserver, adjust the region distribution, and the region split and merge migration.
  • HRegionserver The core module of the Hbase cluster, manages a series of HRegion objects allocated by HMaster, responds to user I/O requests, and reads and writes data to HDFS.
  • Each Region object corresponds to a Region in the Table, which is the result of the horizontal split of the Table.
  • Each HRegion is composed of multiple HStores;
  • HStore It is the core of Hbase's storage, which is where the region data is actually stored. A region is composed of multiple stores.
  • the store includes the memstore in the memory and the storefile on the disk. When the memstore reaches a certain threshold, it will be written to the disk storefile, and the storefile will be stored in the HDFS in HFile format.
  • HLog Stored on HDFS, data will be written to HLog before being written to memstore.
  • the main function of HLog is to prevent the data written to memstore from being lost when the host is down, which is used for data recovery.
  • Kafka and HBase The interaction between Kafka and HBase is mainly to insert the data collected by Kafka into HBase in real time, and call Kafka through a java program, and the Hbase API is implemented:
  • the designated ports of the physical machines where the yarn, kafka, and Hbas services are located can access each other. Further, in order to minimize network transmission services, in this embodiment, the yarn, kafka, and Hbas are located The physical machines are set on the same network segment and the same switch.
  • Step S20 Input the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtain the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;
  • the real-time information of the queue, task, and cluster system resources is collected according to the preset cycle, and the real-time information is input into linear Regression model is used to predict the remaining completion time of the task. That is, after acquiring system resource parameters, queue-related parameters, and task-related parameters, input the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, that is, a pre-trained linear regression model. According to the linear regression model, the time for the subtask to be processed to complete the remaining tasks is budgeted, and the estimated time of the subtask corresponding to the subtask to be processed is obtained.
  • step S20 the method includes:
  • the independent variable parameters and the dependent variable parameters, the model to be trained is trained to generate the linear regression model.
  • the independent variable parameters and the dependent variable parameters are input into the linear regression formula to obtain the initial regression parameters after training, wherein the linear regression formula is:
  • y b0+b1X1+b2X2+...+bnXn, X1, X2, Xn are independent variable parameters, y is dependent variable parameters, b0, b1, bn are initial regression parameters;
  • the linear regression model is generated.
  • the training data is collected in advance and input into the linear regression model for training; firstly, the data of system resources, queue parameters, and task parameters are collected and input into the linear regression model as independent variables.
  • the above-mentioned queue and The resource-related information of the cluster system is collected according to a preset cycle, for example, every 30 seconds.
  • the above-mentioned task-related information is collected when the task is created.
  • the estimated time of the target subtask in the model training data that is, the current remaining execution time of the task, is collected as the dependent variable of the linear regression model.
  • the formula of the linear regression model is as follows, where y is the dependent variable and x1 ⁇ xn are the independent variables:
  • the estimated values of the regression parameters b0, b1, b2...bn are initially obtained, and then the least squares estimation algorithm is used to calculate the regression parameters b0, b1, b2 across .bn is adjusted step by step to improve the accuracy of the model.
  • Step S30 Compare the predicted time of the subtask with a preset standard time, and compare the queue resources and system resources of the subtask to be processed according to the comparison result of the predicted time of the subtask and the standard time. Schedule.
  • the estimated time of the subtask is compared with a pre-designed standard time.
  • the standard time is the time for the subtask to be processed to complete the task when the resources are reasonable. Then according to the comparison result, if the estimated time of the subtask is greater than the standard time, it means that the resources of the subtask to be processed are reasonable and no scheduling is required. If the estimated time of the subtask is less than the standard time, it means the subtask to be processed Insufficient resources, you can increase resources for it.
  • the remaining completion time is continuously estimated for multiple times according to the preset period, so as to obtain the predicted value of its overall execution time. If the predicted value for multiple consecutive times is high
  • adding queue resources for the task means increasing the number of CPUs for the queue. Among them, while increasing the number of CPUs, the corresponding memory resources are automatically increased in proportion, and the management users can be notified via email at the same time.
  • This embodiment provides a method for scheduling cluster queue resources.
  • the method for scheduling cluster queue resources is applied to a cluster system.
  • the method for scheduling cluster queue resources determines each subtask queue to be processed and all subtask queues in the cluster system.
  • Each of the to-be-processed sub-tasks in the to-be-processed sub-task queue is obtained, and the system resource parameters of the cluster system, the queue-related parameters of the to-be-processed sub-task queue, and the task-related parameters of the to-be-processed sub-task are obtained;
  • the system resource parameters, queue-related parameters, and task-related parameters are input to a preset linear regression model, and the estimated time of the subtask corresponding to the subtask to be processed is obtained through the linear regression model;
  • the standard time is set for comparison, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the estimated time of the subtask and the standard time.
  • the present application uses a pre-trained linear regression model and combines the system resource parameters corresponding to the cluster system, the queue-related parameters corresponding to the queue of to-be-processed subtasks, and the task-related parameters corresponding to the to-be-processed subtasks to determine the to-be-processed subtasks.
  • Process the estimated time of the sub-task corresponding to the sub-task and compare the estimated time of the sub-task with the standard time for the task to be completed when the resources are reasonable, so as to determine whether the current resources of the sub-task to be processed are reasonable, and based on the comparison
  • resource scheduling reduces task completion time, improves resource scheduling efficiency, and solves the technical problem of low scheduling efficiency of existing cluster queue resources.
  • FIG. 3 is a schematic flowchart of a second embodiment of a method for scheduling cluster queue resources of this application.
  • the step S20 specifically includes:
  • Step S21 Obtain system resource parameters, queue-related parameters, and task-related parameters in a preset period, and calculate the estimated times of multiple subtasks corresponding to the subtasks to be processed in the preset period through the linear regression model;
  • the estimated time of multiple subtasks is calculated according to the preset period, so as to obtain the predicted value of the overall execution time. If the expected time of multiple consecutive subtasks is higher than the standard time, queue resources should be added for the task. Specifically, for each task currently submitted to the cluster system queue, collect real-time information of the queue, task, and cluster system resources according to a preset period, that is, system real-time resource parameters, queue real-time related parameters, and task real-time related parameters , Input the real-time information into the linear regression model to obtain the prediction of each remaining completion time corresponding to the to-be-processed subtask, that is, the estimated time of multiple subtasks.
  • step S30 specifically includes:
  • Step S31 comparing the estimated times of the multiple subtasks with the standard time
  • step S32 if the estimated time of the subtasks exceeding the preset number is higher than the standard time, the queue resources and system resources of the subtasks to be processed are increased.
  • the estimated times of the multiple subtasks are respectively compared with the standard time to determine whether the expected times of the subtasks of the to-be-processed subtasks are higher than the standard time for multiple consecutive times.
  • the completion time of multiple historical tasks of the subtask to be processed within a preset period is acquired, and the average value of the completion time of the multiple historical tasks is calculated as the standard time.
  • the number of predicted subtasks that are higher than the standard time exceeds the preset number, it means that the predicted value of the overall execution time of the subtask to be processed is higher than the reasonable time, and the subtask to be processed should be increased H.
  • the step of increasing the queue resources and system resources of the subtasks to be processed specifically includes:
  • the estimated time of the subtasks exceeding the preset number is higher than the standard time, it means that the overall execution time of the subtasks to be processed has timed out, and resources need to be added for them.
  • the corresponding resource scheduling relationship is set in advance according to the difference between the actual task processing time of the subtask to be processed and the standard time.
  • the resource scheduling table can be automatically set based on big data analysis, or it can be set manually according to actual needs.
  • the resources to be added for the to-be-processed sub-task After determining the resources to be added for the to-be-processed sub-task, first determine the maximum number of resources in the queue of the to-be-processed task to which the to-be-processed sub-task belongs, and determine whether the to-be-added resources of the to-be-processed sub-task exceed the total number of resources. If the maximum number of resources is not exceeded, it is determined whether the remaining resources in the queue meet the allocation of resources to be added; if not, the scheduling is performed from the system resources in the cluster system to which they belong.
  • FIG. 4 is a schematic flowchart of a third embodiment of a method for scheduling cluster queue resources according to this application.
  • the method further includes:
  • Step S40 Determine the current estimated time of the subtask to be processed according to the scheduled resource parameters and the linear regression model, and start a timer to monitor the scheduled execution time of the subtask to be processed;
  • Step S50 When detecting that the execution time reaches the estimated time of the current subtask, detect whether the subtask to be processed is executed successfully;
  • step S60 if the subtask to be processed is executed successfully, the queue resources and system resources occupied by the subtask to be processed are released.
  • a timer is started to monitor the task execution status of the subtask to be processed. And according to the monitoring results, the task resources are released and recovered. That is, the scheduled real-time resource parameters are obtained, and the real-time resource parameters are input to the linear regression model, so as to determine the current subtask estimated time of the subtask to be processed. And when the timer reaches the estimated time of the current subtask, it is checked whether the task is completed. If the task is executed, the added queue resource is recovered, and if the task is not executed, the added queue resource is not recovered.
  • the embodiment of the present application also provides an apparatus for scheduling cluster queue resources.
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of a cluster queue resource scheduling apparatus of this application.
  • the device for scheduling cluster queue resources is applied to a cluster system, and the device for scheduling cluster queue resources includes:
  • the resource parameter acquisition module 10 is used to determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system, the Queue-related parameters of the queue of to-be-processed subtasks and task-related parameters of the to-be-processed subtasks;
  • the estimated time calculation module 20 is configured to input the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtain the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model ;
  • the task resource scheduling module 30 is configured to compare the estimated time of the subtask with a preset standard time, and according to the comparison result of the estimated time of the subtask and the standard time, determine the status of the subtask to be processed Queue resources and system resources are scheduled.
  • the device for scheduling cluster queue resources further includes a model training module, and the model training module is configured to:
  • the independent variable parameters and the dependent variable parameters, the model to be trained is trained to generate the linear regression model.
  • model training module is also used for:
  • y b0+b1X1+b2X2+...+bnXn, X1, X2, Xn are independent variable parameters, y is dependent variable parameters, b0, b1, bn are initial regression parameters;
  • the linear regression model is generated.
  • resource parameter acquisition module 10 is also used for:
  • task resource scheduling module 30 is also used for:
  • the queue resources and system resources of the subtasks to be processed are increased.
  • task resource scheduling module 30 is also used for:
  • estimated time calculation module 20 is also used for:
  • the device for scheduling cluster queue resources further includes a resource recovery module, and the resource recovery module is configured to:
  • the scheduled resource parameters and the linear regression model determine the current estimated time of the subtask to be processed, and start a timer to monitor the execution time of the scheduled subtask;
  • the queue resources and system resources occupied by the subtask to be processed are released.
  • each module in the above-mentioned cluster queue resource scheduling device corresponds to each step in the above-mentioned cluster queue resource scheduling method embodiment, and its functions and implementation processes will not be repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium of the present application stores a scheduler for cluster queue resources, where the scheduler for cluster queue resources is executed by a processor, the following steps are implemented:
  • the predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
  • the method implemented when the scheduling program of the cluster queue resource is executed can refer to the various embodiments of the scheduling method of the cluster queue resource of the present application, which will not be repeated here.
  • the technical solution of this application essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a computer-readable storage medium as described above (such as In ROM/RAM, magnetic disk, optical disk), the computer-readable storage medium can be non-volatile or volatile, and includes a number of instructions to enable a terminal device (which can be a mobile phone, a computer, a server, An air conditioner, or a network device, etc.) execute the method described in each embodiment of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, An air conditioner, or a network device, etc.

Abstract

The present application provides a method, apparatus and device for scheduling cluster queue resources, and a storage medium. The method comprises: determining each sub-task queue to be processed in the cluster system and each sub-task to be processed in said sub-task queue, and obtaining system resource parameters of the cluster system, queue related parameters of said sub-task queue and task related parameters of said sub-task; inputting the system resource parameters, the queue related parameters and the task related parameters into a preset linear regression model, and obtaining sub-task estimated time corresponding to said sub-task by means of the linear regression model; and comparing the sub-task estimated time with preset standard time, and scheduling queue resources and system resources of said sub-task according to the comparison result of the sub-task estimated time and the standard time. According to the present application, the task completion time is reduced, and the resource scheduling efficiency is improved.

Description

集群队列资源的调度方法、装置、设备及存储介质Method, device, equipment and storage medium for scheduling resource of cluster queue
相关申请的交叉引用Cross-references to related applications
本申请申明享有2020年02月12日递交的申请号为202010089180.0、名称为“集群队列资源的调度方法、装置、设备及存储介质”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。This application affirms that it enjoys the priority of the Chinese patent application with the application number 202010089180.0 and the name "cluster queue resource scheduling method, device, equipment and storage medium" filed on February 12, 2020. The overall content of the Chinese patent application is based on The reference method is incorporated in this application.
技术领域Technical field
本申请涉及任务调度技术领域,尤其涉及一种集群队列资源的调度方法、装置、设备及计算机可读存储介质。This application relates to the technical field of task scheduling, and in particular to a method, device, device, and computer-readable storage medium for scheduling cluster queue resources.
背景技术Background technique
在现有的集群系统中,一般为每一个业务用户设置一个队列,并为每个队列预先固定分配相应的处理资源,包括cpu和内存,而某些业务逻辑的任务对于一些计算任务的完成要求较高,发明人发现,在上述计算任务的进度可能由于一些原因(如集群环境问题,前置作业失败等)会发生延时,不能及时调整队列的处理资源,容易出现任务堆积,从而导致该计算任务无法在规定时间内完成,降低了集群队列资源的调度效率。因此,如何解决现有集群队列资源的调度效率低下,成为了目前亟待解决的技术问题。In the existing cluster system, a queue is generally set up for each business user, and corresponding processing resources, including cpu and memory, are fixed in advance for each queue. Some business logic tasks require the completion of some computing tasks. The inventor found that the progress of the above-mentioned computing tasks may be delayed due to some reasons (such as cluster environment problems, pre-job failure, etc.), and the processing resources of the queue cannot be adjusted in time, and the accumulation of tasks is prone to occur. The calculation task cannot be completed within the specified time, which reduces the scheduling efficiency of cluster queue resources. Therefore, how to solve the low scheduling efficiency of the existing cluster queue resources has become a technical problem to be solved urgently.
发明内容Summary of the invention
本申请的主要目的在于提供一种集群队列资源的调度方法、装置、设备及计算机可读存储介质,旨在解决现有集群队列资源的调度效率低下的技术问题。The main purpose of this application is to provide a method, device, device, and computer-readable storage medium for scheduling cluster queue resources, aiming to solve the technical problem of low scheduling efficiency of existing cluster queue resources.
为实现上述目的,本申请提供一种集群队列资源的调度方法,所述集群队列资源的调度方法应用于集群系统,所述集群队列资源的调度方法包括以下步骤:In order to achieve the above objective, the present application provides a method for scheduling cluster queue resources. The method for scheduling cluster queue resources is applied to a cluster system. The method for scheduling cluster queue resources includes the following steps:
确定所述集群系统中的各个待处理子任务队列以及所述待处理子任务队列中的各个待处理子任务,并获取所述集群系统的系统资源参数、所述待处理子任务队列的队列相关参数以及所述待处理子任务的任务相关参数;Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system and the queue correlation of the pending subtask queue Parameters and task-related parameters of the subtasks to be processed;
将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间;Inputting the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtaining the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;
将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度。The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
此外,为实现上述目的,本申请还提供一种集群队列资源的调度装置,所述集群队列资源的调度装置应用于集群系统,所述集群队列资源的调度装置包括:In addition, in order to achieve the above object, this application also provides a cluster queue resource scheduling device, the cluster queue resource scheduling device is applied to a cluster system, and the cluster queue resource scheduling device includes:
资源参数获取模块,用于确定所述集群系统中的各个待处理子任务队列以及所述待处理子任务队列中的各个待处理子任务,并获取所述集群系统的系统资源参数、所述待处理子任务队列的队列相关参数以及所述待处理子任务的任务相关参数;The resource parameter acquisition module is used to determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system, the pending subtasks Processing the queue related parameters of the subtask queue and the task related parameters of the to-be-processed subtask;
预计时间计算模块,用于将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间;An estimated time calculation module, configured to input the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtain the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;
任务资源调度模块,用于将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度。The task resource scheduling module is used to compare the estimated time of the subtask with a preset standard time, and according to the comparison result of the estimated time of the subtask and the standard time, compare the queue of the subtask to be processed Resources and system resources are scheduled.
此外,为实现上述目的,本申请还提供一种集群队列资源的调度设备,所述集群队列资源的调度设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的集群队列资源的调度程序,其中所述集群队列资源的调度程序被所述处理器执行时,实现以下步骤:In addition, in order to achieve the above object, this application also provides a scheduling device for cluster queue resources. The scheduling device for cluster queue resources includes a processor, a memory, and a device that is stored on the memory and can be executed by the processor. A scheduler of cluster queue resources, wherein when the scheduler of cluster queue resources is executed by the processor, the following steps are implemented:
确定所述集群系统中的各个待处理子任务队列以及所述待处理子任务队列中的各个待处理子任务,并获取所述集群系统的系统资源参数、所述待处理子任务队列的队列相关参数以及所述待处理子任务的任务相关参数;Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system and the queue correlation of the pending subtask queue Parameters and task-related parameters of the subtasks to be processed;
将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间;Inputting the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtaining the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;
将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度。The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机 可读存储介质上存储有集群队列资源的调度程序,其中所述集群队列资源的调度程序被处理器执行时,实现以下步骤:In addition, in order to achieve the above-mentioned object, the present application also provides a computer-readable storage medium on which a scheduler for cluster queue resources is stored, wherein when the scheduler for cluster queue resources is executed by a processor , To achieve the following steps:
确定所述集群系统中的各个待处理子任务队列以及所述待处理子任务队列中的各个待处理子任务,并获取所述集群系统的系统资源参数、所述待处理子任务队列的队列相关参数以及所述待处理子任务的任务相关参数;Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system and the queue correlation of the pending subtask queue Parameters and task-related parameters of the subtasks to be processed;
将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间;Inputting the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtaining the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;
将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度。The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
本申请提供一种集群队列资源的调度方法,所述集群队列资源的调度方法应用于集群系统,所述集群队列资源的调度方法通过确定所述集群系统中的各个待处理子任务队列以及所述待处理子任务队列中的各个待处理子任务,并获取所述集群系统的系统资源参数、所述待处理子任务队列的队列相关参数以及所述待处理子任务的任务相关参数;将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间;将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度。通过上述方式,本申请通过预先训练的线性回归模型,并结合所述集群系统对应的系统资源参数、待处理子任务队列对应的队列相关参数以及待处理子任务对应任务相关参数,确定所述待处理子任务对应的子任务预计时间,并将该子任务预计时间与资源合理时待处理子任务完成任务的标准时间进行比对,从而确定待处理子任务的当前资源是否合理,并根据比对结果进行资源调度,减少了任务完成时间,提高了资源调度效率,解决了现有集群队列资源的调度效率低下的技术问题。This application provides a method for scheduling cluster queue resources. The method for scheduling cluster queue resources is applied to a cluster system. The method for scheduling cluster queue resources determines each subtask queue to be processed in the cluster system and the Each of the to-be-processed sub-tasks in the to-be-processed sub-task queue, and obtain the system resource parameters of the cluster system, the queue-related parameters of the to-be-processed sub-task queue, and the task-related parameters of the to-be-processed sub-task; The system resource parameters, queue-related parameters, and task-related parameters are input to a preset linear regression model, and the estimated time of the subtask corresponding to the subtask to be processed is obtained through the linear regression model; and the estimated time of the subtask is compared with the preset The standard time is compared, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the estimated time of the subtask and the standard time. In the above manner, the present application uses a pre-trained linear regression model and combines the system resource parameters corresponding to the cluster system, the queue-related parameters corresponding to the queue of to-be-processed subtasks, and the task-related parameters corresponding to the to-be-processed subtasks to determine the to-be-processed subtasks. Process the estimated time of the sub-task corresponding to the sub-task, and compare the estimated time of the sub-task with the standard time for the task to be completed when the resources are reasonable, so as to determine whether the current resources of the sub-task to be processed are reasonable, and based on the comparison As a result, resource scheduling reduces task completion time, improves resource scheduling efficiency, and solves the technical problem of low scheduling efficiency of existing cluster queue resources.
发明概述Summary of the invention
技术问题technical problem
问题的解决方案The solution to the problem
发明的有益效果The beneficial effects of the invention
对附图的简要说明Brief description of the drawings
附图说明Description of the drawings
图1为本申请实施例方案中涉及的集群队列资源的调度设备的硬件结构示意图;FIG. 1 is a schematic diagram of the hardware structure of the cluster queue resource scheduling device involved in the solution of the embodiment of the application;
图2为本申请集群队列资源的调度方法第一实施例的流程示意图;2 is a schematic flowchart of a first embodiment of a method for scheduling cluster queue resources in an application;
图3为本申请集群队列资源的调度方法第二实施例的流程示意图;FIG. 3 is a schematic flowchart of a second embodiment of a method for scheduling cluster queue resources according to the application;
图4为本申请集群队列资源的调度方法第三实施例的流程示意图;4 is a schematic flowchart of a third embodiment of a method for scheduling cluster queue resources in an application;
图5为本申请集群队列资源的调度装置第一实施例的功能模块示意图。FIG. 5 is a schematic diagram of functional modules of a first embodiment of a scheduling apparatus for cluster queue resources of this application.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.
本申请实施例涉及的集群队列资源的调度方法主要应用于集群队列资源的调度设备,该集群队列资源的调度设备可以是PC、便携计算机、移动终端等具有显示和处理功能的设备。The scheduling method of cluster queue resources involved in the embodiments of the present application is mainly applied to scheduling equipment of cluster queue resources. The scheduling equipment of cluster queue resources may be devices with display and processing functions such as PCs, portable computers, and mobile terminals.
参照图1,图1为本申请实施例方案中涉及的集群队列资源的调度设备的硬件结构示意图。本申请实施例中,集群队列资源的调度设备可以包括处理器1001(例如CPU),通信总线1002,用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信;用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard);网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口);存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器,存储器1005可选的还可以是独立于前述处理器1001的存储装置。Referring to FIG. 1, FIG. 1 is a schematic diagram of the hardware structure of the cluster queue resource scheduling device involved in the solution of the embodiment of the application. In this embodiment of the present application, the cluster queue resource scheduling device may include a processor 1001 (for example, a CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Among them, the communication bus 1002 is used to realize the connection and communication between these components; the user interface 1003 may include a display (Display), an input unit such as a keyboard (Keyboard); the network interface 1004 may optionally include a standard wired interface, a wireless interface (Such as WI-FI interface); the memory 1005 can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a disk memory. The memory 1005 can optionally be a storage device independent of the aforementioned processor 1001 .
本领域技术人员可以理解,图1中示出的硬件结构并不构成对集群队列资源的调度设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the hardware structure shown in FIG. 1 does not constitute a limitation on the scheduling equipment of cluster queue resources, and may include more or less components than shown in the figure, or a combination of certain components, or different components. Component arrangement.
继续参照图1,图1中作为一种计算机可读存储介质的存储器1005可以包括操作系统、网络通信模块以及集群队列资源的调度程序。Continuing to refer to FIG. 1, the memory 1005 as a computer-readable storage medium in FIG. 1 may include an operating system, a network communication module, and a cluster queue resource scheduler.
在图1中,网络通信模块主要用于连接服务器,与服务器进行数据通信;而处理器1001可以调用存储器1005中存储的集群队列资源的调度程序,当所述集群队列资源的调度程序被处理器1001执行时,实现以下步骤:In Figure 1, the network communication module is mainly used to connect to the server and communicate with the server; and the processor 1001 can call the scheduler of the cluster queue resource stored in the memory 1005, when the scheduler of the cluster queue resource is When 1001 is executed, the following steps are implemented:
确定所述集群系统中的各个待处理子任务队列以及所述待处理子任务队列中的各个待处理子任务,并获取所述集群系统的系统资源参数、所述待处理子任务队列的队列相关参数以及所述待处理子任务的任务相关参数;Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system and the queue correlation of the pending subtask queue Parameters and task-related parameters of the subtasks to be processed;
将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间;Inputting the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtaining the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;
将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度。The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
本申请实施例提供了一种集群队列资源的调度方法。The embodiment of the present application provides a method for scheduling cluster queue resources.
参照图2,图2为本申请集群队列资源的调度方法第一实施例的流程示意图。Referring to FIG. 2, FIG. 2 is a schematic flowchart of a first embodiment of a method for scheduling cluster queue resources according to this application.
本实施例中,所述集群队列资源的调度方法应用于集群系统,所述集群队列资源的调度方法包括以下步骤:In this embodiment, the method for scheduling cluster queue resources is applied to a cluster system, and the method for scheduling cluster queue resources includes the following steps:
步骤S10,确定所述集群系统中的各个待处理子任务队列以及所述待处理子任务队列中的各个待处理子任务,并获取所述集群系统的系统资源参数、所述待处理子任务队列的队列相关参数以及所述待处理子任务的任务相关参数;Step S10: Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and acquire the system resource parameters of the cluster system and the pending subtask queue The queue-related parameters of and the task-related parameters of the to-be-processed subtask;
现有集群系统中,计算任务的进度可能由于一些原因(如集群环境问题,前置作业失败等)会发生延时,不能及时调整队列的处理资源,容易出现任务堆积,从而导致该计算任务无法在规定时间内完成,降低了集群队列资源的调度效率。为了解决上述技术问题,通过本实施例中,通过预先训练的线性回归模型,并结合所述集群系统对应的系统资源参数、待处理子任务队列对应的队列相关参数以及待处理子任务对应任务相关参数,确定所述待处理子任务对应的子任务预计时间,并将该子任务预计时间与资源合理时待处理子任务完成任务的标准时间进行比对,从而确定待处理子任务的当前资源是否合理,并根据比对结果进行资源调度,减少了任务完成时间,提高了资源调度效率。具体地,集群系统中包括主控节点和普通节点,其中,主控节点负责将用户提交的计算任 务拆分成多个小任务,提交到多个cpu上执行,负责记录计算任务的开始时间和完成时间等信息。集群系统为每一个用户设置一个队列,并为该队列分配相应的资源,包括cpu和内存。实时确定所述集群系统中的各个待处理子任务队列,以及所述待处理子任务队列中的各个待处理子任务。然后获取所述集群系统中的系统资源参数,如系统当前剩余可用的cpu数目以及系统当前剩余可用的内存数目,所述待处理子任务队列的队列相关参数,如用户当前可提交的最大任务数量,即每个队列配置有最大可提交的任务数量,根据当前队列已提交的任务数量,可以计算出当前用户可提交的最大任务数量;队列当前剩余的可用的cpu数目;队列的优先级,即集群系统处理队列的优先级;对队列中任务的调度策略,调度策略包括先进先出、公平调度以及容量调度等,所述待处理子任务的任务相关参数,如任务类型:即处理任务的计算引擎类型,包括采用高速内存处理方法的计算引擎和采用硬盘处理的计算引擎;任务语言:即任务的代码编写语言,比如java、phyton或c语言;任务的输入数据集大小;任务的执行参数:包括任务拆分成子任务的数目、java中申请堆的大小以及多个任务的并行度。In the existing cluster system, the progress of the computing task may be delayed due to some reasons (such as cluster environment problems, pre-job failure, etc.), the processing resources of the queue cannot be adjusted in time, and the accumulation of tasks is prone to cause the computing task to fail. It is completed within the specified time, which reduces the scheduling efficiency of cluster queue resources. In order to solve the above technical problems, in this embodiment, the pre-trained linear regression model is used in combination with the system resource parameters corresponding to the cluster system, the queue-related parameters corresponding to the queue of subtasks to be processed, and the task-related parameters corresponding to the subtasks to be processed. Parameter, determine the estimated time of the subtask corresponding to the subtask to be processed, and compare the estimated time of the subtask with the standard time for the completion of the task when the resource is reasonable, so as to determine whether the current resource of the subtask to be processed is Reasonable, and perform resource scheduling based on the comparison result, reducing task completion time and improving resource scheduling efficiency. Specifically, the cluster system includes a master node and a common node. The master node is responsible for splitting the computing task submitted by the user into multiple small tasks and submitting them to multiple CPUs for execution, and is responsible for recording the start time and time of the computing task. Information such as completion time. The cluster system sets up a queue for each user, and allocates corresponding resources to the queue, including cpu and memory. Determine in real time each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue. Then obtain the system resource parameters in the cluster system, such as the number of CPUs currently available in the system and the number of memory available in the system, and the queue-related parameters of the pending subtask queue, such as the maximum number of tasks that the user can submit currently , That is, each queue is configured with the maximum number of tasks that can be submitted. According to the number of tasks submitted by the current queue, the maximum number of tasks that can be submitted by the current user can be calculated; the number of CPUs remaining in the queue; the priority of the queue, that is The priority of the cluster system processing queue; the scheduling strategy for the tasks in the queue, the scheduling strategy includes first-in first-out, fair scheduling and capacity scheduling, etc., the task-related parameters of the subtasks to be processed, such as task type: the calculation of the processing task Engine type, including calculation engine using high-speed memory processing method and calculation engine using hard disk processing; task language: the language of the task code, such as java, phyton or c language; the size of the input data set of the task; the execution parameters of the task: Including the number of tasks divided into subtasks, the size of the application heap in java, and the parallelism of multiple tasks.
在本步骤中,集群系统为Yarn系统,Yarn是一个资源调度平台,包括以下模块:In this step, the cluster system is the Yarn system, which is a resource scheduling platform, including the following modules:
1、ResourceManager(简称RM)是一个全局的资源管理器,负责整个系统的资源管理和分配。1. ResourceManager (RM for short) is a global resource manager responsible for resource management and allocation of the entire system.
2、ApplicationManager(简称AM)用户提交的每个应用程序均包含1个AM,并负责与RM协调获取资源,将得到的任务近一步分配给内部的任务,与Nodemanager通信启动或停止任务,监控所有任务状态。2. Each application submitted by ApplicationManager (AM for short) users contains 1 AM, and is responsible for coordinating with RM to obtain resources, assigning the obtained tasks to internal tasks, communicating with Nodemanager to start or stop tasks, and monitor all Task status.
3、Nodemanager,是每个计算节点上的资源和任务管理器,会定时向RM汇报本节点的资源使用情况,如CPU,内存等。另外接收并处理来自AM的containner的启动/停止等要求。3. Nodemanager, the resource and task manager on each computing node, will regularly report the resource usage of the node, such as CPU, memory, etc., to the RM. In addition, it receives and processes start/stop requests from AM's containner.
4、Container,是属于真正执行计算任务的地方,属于yarn的抽象资源,封装了某个计算节点的多维资源,如CPU,磁盘,网络等,当AM向RM申请资源时,RM为AM返回的资源便是Container表示的。Yarn会为每个任务分配一个Container,且该任务只能使用该Container中描述的资源。4. Container, which belongs to the place where computing tasks are actually performed, is an abstract resource of yarn, which encapsulates the multi-dimensional resources of a computing node, such as CPU, disk, network, etc., when AM applies for resources from RM, RM is returned by AM Resources are represented by Container. Yarn will assign a Container to each task, and the task can only use the resources described in the Container.
在本步骤中,Yarn中的ApplicationManager和Nodemanager以日志的形式存储有上述队列数据和任务数据,Yarn中的ResourceManager也以日志的形式存储上述集群系统资源数据。In this step, the ApplicationManager and Nodemanager in Yarn store the aforementioned queue data and task data in the form of logs, and the ResourceManager in Yarn also stores the aforementioned cluster system resource data in the form of logs.
由Kafka通过采集Yarn的日志来获取本步骤中所需要的队列数据、任务数据以及集群系统资源数据。Kafka是一个分布式发布、消息订阅系统,属于消息中间件,包含以下模块:Kafka obtains the queue data, task data, and cluster system resource data required in this step by collecting Yarn logs. Kafka is a distributed publishing and message subscription system, which belongs to message middleware and includes the following modules:
1.Broker,即kafka的服务器节点。Broker存储topic的数据。1. Broker, the server node of kafka. Broker stores topic data.
2.Topic,每条发布到kafka集群的消息都有一个类别,这个类别就是topic,可以理解为主题。2. Topic, each message published to the Kafka cluster has a category, this category is topic, which can be understood as a topic.
3.Producer,消息的生产者、发布者,是一种角色概念,该角色将消息发布到kafka的topic中。3. Producer, the producer and publisher of messages, is a role concept that publishes messages to Kafka topics.
4.Consumer,消息的消费者,也是一中角色概念,从broker中读取数据,并存储到本地磁盘上。4. Consumer, the consumer of the message, is also a role concept. It reads data from the broker and stores it on the local disk.
在本步骤中,在Kafka中创建一个Yarn的Broker节点,在该Yarn broker节点中创建topic,该topic用来搜集Yarn的记录了上述任务数据、队列数据以及集群系统数据的日志信息。需要说明的是,Yarn支持通过log4j Appender将产生的日志发送到kafka,在yarn的相关配置文件中进行配置指定的kafka消费端地址和topic,即可完成将yarn产生的日志实时发送到Kafka中,实现Kafka对Yarn的日志信息的搜集。In this step, a Yarn Broker node is created in Kafka, and a topic is created in the Yarn broker node. The topic is used to collect Yarn log information that records the above task data, queue data, and cluster system data. It should be noted that Yarn supports sending the generated logs to Kafka through log4j Appender. Configure the specified Kafka consumer address and topic in the relevant configuration file of Yarn to complete the real-time sending of the logs generated by Yarn to Kafka. Realize the collection of Yarn log information by Kafka.
此外,Kafka通过将收集到集群系统Yarn的日志信息实时存储到Hbase中。Hbase是构建在hdfs上的高可靠性、高性能、面向列存储、可伸缩的分布式存储系统,包括以下模块:In addition, Kafka stores the log information collected in the cluster system Yarn in Hbase in real time. Hbase is a highly reliable, high-performance, column-oriented, and scalable distributed storage system built on hdfs, including the following modules:
1.HMaster:HBase集群的管理服务,主要用来管理用户对Table的增删改查操作,管理HRegionserver的负载均衡,调整Region分布,Region拆分合并迁移等。1. HMaster: The management service of the HBase cluster, which is mainly used to manage the user's addition, deletion, modification, and query operations on the Table, manage the load balancing of the HRegionserver, adjust the region distribution, and the region split and merge migration.
2.HRegionserver:Hbase集群的核心模块,管理一系列HMaster分配的HRegion对象,响应用户的I/O请求,向HDFS读写数据。2. HRegionserver: The core module of the Hbase cluster, manages a series of HRegion objects allocated by HMaster, responds to user I/O requests, and reads and writes data to HDFS.
3.HRegion:每个Region对象对应Table中的一个Region,是Table水平拆分的结果,每个HRegion由多个HStore组成;3. HRegion: Each Region object corresponds to a Region in the Table, which is the result of the horizontal split of the Table. Each HRegion is composed of multiple HStores;
4.HStore:是Hbase的存储的核心,也就是真正存region数据的地方。一个region由多个store组成,store包括内存中的memstore和位于磁盘的storefile,当memstore到达一定阈值会写入磁盘storefile中,storefile以HFile格式保存在HDFS上。4. HStore: It is the core of Hbase's storage, which is where the region data is actually stored. A region is composed of multiple stores. The store includes the memstore in the memory and the storefile on the disk. When the memstore reaches a certain threshold, it will be written to the disk storefile, and the storefile will be stored in the HDFS in HFile format.
5.HLog:存储在HDFS上,数据在写入memstore之前会先写入HLog中。HLog的主要作用是放置主机宕机时,当写入到memstore中的数据丢失,用于数据恢复。5. HLog: Stored on HDFS, data will be written to HLog before being written to memstore. The main function of HLog is to prevent the data written to memstore from being lost when the host is down, which is used for data recovery.
Kafka与HBase交互主要是将kafka收集到的数据实时插入HBase中,通过java程序调用kafka,Hbase API实现:The interaction between Kafka and HBase is mainly to insert the data collected by Kafka into HBase in real time, and call Kafka through a java program, and the Hbase API is implemented:
1.每10s拉取一次kafka中的yarn日志消费数据。1. Pull the yarn log consumption data in Kafka every 10s.
2.将读取到的数据拆分成key:value格式,并进行格式化处理,如日期格式。3.开启访问HBase,并将处理好的数据插入到设计好的表中。2. Split the read data into key: value format, and format it, such as date format. 3. Open access to HBase, and insert the processed data into the designed table.
需要说明的是,实际环境中yarn、kafka以及Hbas这些服务所在物理机器的指定端口之间可以互相访问,进一步地,为了尽可能减少网络传输服务,本实施例中将yarn、kafka以及Hbas所在的物理机器设置在同一网段,同一个交换机。It should be noted that in the actual environment, the designated ports of the physical machines where the yarn, kafka, and Hbas services are located can access each other. Further, in order to minimize network transmission services, in this embodiment, the yarn, kafka, and Hbas are located The physical machines are set on the same network segment and the same switch.
步骤S20,将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间;Step S20: Input the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtain the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;
本实施例中,按照预设的周期,对于每一个当前已提交到集群系统队列中的任务,按照预设周期采集所述的队列、任务、集群系统资源的实时信息,将该实时信息输入线性回归模型,得到对任务的剩余完成时间的预测。即在获取系统资源参数、队列相关参数以及任务相关参数之后,将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,即预先训练完成的线性回归模型。通过所述线性回归模型,对所述待处理子任务完成剩余任务的时间进行预算,得到所述待处理子任务对应的子任务预计时间。In this embodiment, according to the preset cycle, for each task currently submitted to the cluster system queue, the real-time information of the queue, task, and cluster system resources is collected according to the preset cycle, and the real-time information is input into linear Regression model is used to predict the remaining completion time of the task. That is, after acquiring system resource parameters, queue-related parameters, and task-related parameters, input the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, that is, a pre-trained linear regression model. According to the linear regression model, the time for the subtask to be processed to complete the remaining tasks is budgeted, and the estimated time of the subtask corresponding to the subtask to be processed is obtained.
进一步地,步骤S20之前,包括:Further, before step S20, the method includes:
在预设模型训练数据中确定所述待处理子任务对应的待训练模型以及模型训练数据;Determining the model to be trained and the model training data corresponding to the subtask to be processed in the preset model training data;
获取所述模型训练数据中的系统资源训练参数、队列相关训练参数以及任务相 关训练参数,作为所述待训练模型中的自变量参数;Acquiring system resource training parameters, queue-related training parameters, and task-related training parameters in the model training data as independent variable parameters in the model to be trained;
获取所述模型训练数据中的目标子任务预计时间,作为所述待训练模型中的因变量参数;Acquiring the estimated time of the target subtask in the model training data as a dependent variable parameter in the model to be trained;
根据线性回归公式、所述自变量参数以及所述因变量参数,将所述待训练模型训练生成所述线性回归模型。According to the linear regression formula, the independent variable parameters and the dependent variable parameters, the model to be trained is trained to generate the linear regression model.
其中,将所述自变量参数以及所述因变量参数输入至所述线性回归公式,以得到训练后的初始回归参数,其中,所述线性回归公式为:Wherein, the independent variable parameters and the dependent variable parameters are input into the linear regression formula to obtain the initial regression parameters after training, wherein the linear regression formula is:
y=b0+b1X1+b2X2+...+bnXn,X1、X2、Xn为自变量参数,y为因变量参数,b0、b1、bn为初始回归参数;y=b0+b1X1+b2X2+...+bnXn, X1, X2, Xn are independent variable parameters, y is dependent variable parameters, b0, b1, bn are initial regression parameters;
根据最小二乘估计算法,对所述初始回归参数进行调整,生成目标回归参数;Adjust the initial regression parameters according to the least squares estimation algorithm to generate target regression parameters;
根据所述目标回归参数以及所述待训练模型,生成所述线性回归模型。According to the target regression parameter and the model to be trained, the linear regression model is generated.
本实施例中,预先收集训练数据并将其输入线性回归模型进行训练;首先,收集系统资源、队列参数以及任务参数三个方面的数据输入线性回归模型,作为自变量,其中,上述队列和集群系统资源相关信息是按照预设周期进行采集,例如每隔30秒采集一次,上述任务相关信息是在任务创建时采集而得。然后,收集所述模型训练数据中的目标子任务预计时间,即任务当前剩余执行时间作为线性回归模型的因变量。最后,将自变量与因变量中收集的数据输入线性回归模型,线性回归模型的公式如下所示,其中,y为因变量,x1~xn为自变量:In this embodiment, the training data is collected in advance and input into the linear regression model for training; firstly, the data of system resources, queue parameters, and task parameters are collected and input into the linear regression model as independent variables. Among them, the above-mentioned queue and The resource-related information of the cluster system is collected according to a preset cycle, for example, every 30 seconds. The above-mentioned task-related information is collected when the task is created. Then, the estimated time of the target subtask in the model training data, that is, the current remaining execution time of the task, is collected as the dependent variable of the linear regression model. Finally, input the data collected in the independent variable and the dependent variable into the linear regression model. The formula of the linear regression model is as follows, where y is the dependent variable and x1~xn are the independent variables:
y=b0+b1X1+b2X2+...+bnXn;y=b0+b1X1+b2X2+...+bnXn;
线性回归模型中基于上述线性回归公式,初步得到回归参数b0、b1、b2......bn的估计值,然后采用最小二乘估计算法对回归参数b0、b1、b2......bn进行逐步调整,提升模型精度。In the linear regression model, based on the above linear regression formula, the estimated values of the regression parameters b0, b1, b2...bn are initially obtained, and then the least squares estimation algorithm is used to calculate the regression parameters b0, b1, b2..... .bn is adjusted step by step to improve the accuracy of the model.
步骤S30,将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度。Step S30: Compare the predicted time of the subtask with a preset standard time, and compare the queue resources and system resources of the subtask to be processed according to the comparison result of the predicted time of the subtask and the standard time. Schedule.
本实施例中,通过线性回归模型得到所述子任务预计时间后,将所述子任务预计时间与预先设计的标准时间进行比对。其中,所述标准时间为所述待处理子任务在资源合理时,完成任务的时间。然后根据比对结果,如子任务预计时间 大于所述标准时间,即表示所述待处理子任务的资源合理,无需调度,若子任务预计时间小于所述标准时间,即表示所述待处理子任务的资源不足,可为其增加资源。具体实施例中,对于某一个已经提交到队列中的任务,根据预设周期连续多次估计其剩余完成时间,从而得到对其整体执行时间的预测值,若这连续多次的预测值均高于历史平均水平,则为该任务增加队列资源,即为该队列增加cpu数目。其中,增加cpu数目的同时,自动按比例增加相应的内存资源,并可以同时通过邮件通知管理用户。In this embodiment, after the estimated time of the subtask is obtained through a linear regression model, the estimated time of the subtask is compared with a pre-designed standard time. Wherein, the standard time is the time for the subtask to be processed to complete the task when the resources are reasonable. Then according to the comparison result, if the estimated time of the subtask is greater than the standard time, it means that the resources of the subtask to be processed are reasonable and no scheduling is required. If the estimated time of the subtask is less than the standard time, it means the subtask to be processed Insufficient resources, you can increase resources for it. In a specific embodiment, for a certain task that has been submitted to the queue, the remaining completion time is continuously estimated for multiple times according to the preset period, so as to obtain the predicted value of its overall execution time. If the predicted value for multiple consecutive times is high At the historical average level, adding queue resources for the task means increasing the number of CPUs for the queue. Among them, while increasing the number of CPUs, the corresponding memory resources are automatically increased in proportion, and the management users can be notified via email at the same time.
本实施例提供一种集群队列资源的调度方法,所述集群队列资源的调度方法应用于集群系统,所述集群队列资源的调度方法通过确定所述集群系统中的各个待处理子任务队列以及所述待处理子任务队列中的各个待处理子任务,并获取所述集群系统的系统资源参数、所述待处理子任务队列的队列相关参数以及所述待处理子任务的任务相关参数;将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间;将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度。通过上述方式,本申请通过预先训练的线性回归模型,并结合所述集群系统对应的系统资源参数、待处理子任务队列对应的队列相关参数以及待处理子任务对应任务相关参数,确定所述待处理子任务对应的子任务预计时间,并将该子任务预计时间与资源合理时待处理子任务完成任务的标准时间进行比对,从而确定待处理子任务的当前资源是否合理,并根据比对结果进行资源调度,减少了任务完成时间,提高了资源调度效率,解决了现有集群队列资源的调度效率低下的技术问题。This embodiment provides a method for scheduling cluster queue resources. The method for scheduling cluster queue resources is applied to a cluster system. The method for scheduling cluster queue resources determines each subtask queue to be processed and all subtask queues in the cluster system. Each of the to-be-processed sub-tasks in the to-be-processed sub-task queue is obtained, and the system resource parameters of the cluster system, the queue-related parameters of the to-be-processed sub-task queue, and the task-related parameters of the to-be-processed sub-task are obtained; The system resource parameters, queue-related parameters, and task-related parameters are input to a preset linear regression model, and the estimated time of the subtask corresponding to the subtask to be processed is obtained through the linear regression model; The standard time is set for comparison, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the estimated time of the subtask and the standard time. In the above manner, the present application uses a pre-trained linear regression model and combines the system resource parameters corresponding to the cluster system, the queue-related parameters corresponding to the queue of to-be-processed subtasks, and the task-related parameters corresponding to the to-be-processed subtasks to determine the to-be-processed subtasks. Process the estimated time of the sub-task corresponding to the sub-task, and compare the estimated time of the sub-task with the standard time for the task to be completed when the resources are reasonable, so as to determine whether the current resources of the sub-task to be processed are reasonable, and based on the comparison As a result, resource scheduling reduces task completion time, improves resource scheduling efficiency, and solves the technical problem of low scheduling efficiency of existing cluster queue resources.
参照图3,图3为本申请集群队列资源的调度方法第二实施例的流程示意图。Referring to FIG. 3, FIG. 3 is a schematic flowchart of a second embodiment of a method for scheduling cluster queue resources of this application.
基于上述图2所示实施例,本实施例中,所述步骤S20具体包括:Based on the embodiment shown in FIG. 2 above, in this embodiment, the step S20 specifically includes:
步骤S21,获取预设周期内的系统资源参数、队列相关参数以及任务相关参数,并通过所述线性回归模型计算所述预设周期内所述待处理子任务对应的多个子任务预计时间;Step S21: Obtain system resource parameters, queue-related parameters, and task-related parameters in a preset period, and calculate the estimated times of multiple subtasks corresponding to the subtasks to be processed in the preset period through the linear regression model;
本实施例中,为了减少时间预算误差,按照预设周期计算多个子任务预计时间 ,从而得到对其整体执行时间的预测值。若连续多次的子任务预计时间均高于标准时间,则应当为该任务增加队列资源。具体地,对于每一个当前已提交到集群系统队列中的任务,按照预设周期采集所述队列、任务、集群系统资源的实时信息,即系统实时资源参数、队列实时相关参数以及任务实时相关参数,将该实时信息输入线性回归模型,得到所述待处理子任务的对应的各个剩余完成时间的预测,即多个子任务预计时间。In this embodiment, in order to reduce the time budget error, the estimated time of multiple subtasks is calculated according to the preset period, so as to obtain the predicted value of the overall execution time. If the expected time of multiple consecutive subtasks is higher than the standard time, queue resources should be added for the task. Specifically, for each task currently submitted to the cluster system queue, collect real-time information of the queue, task, and cluster system resources according to a preset period, that is, system real-time resource parameters, queue real-time related parameters, and task real-time related parameters , Input the real-time information into the linear regression model to obtain the prediction of each remaining completion time corresponding to the to-be-processed subtask, that is, the estimated time of multiple subtasks.
进一步地,所述步骤S30具体包括:Further, the step S30 specifically includes:
步骤S31,将所述多个子任务预计时间分别与所述标准时间进行比对;Step S31, comparing the estimated times of the multiple subtasks with the standard time;
步骤S32,若超过预设个数的子任务预计时间高于所述标准时间,则增加所述待处理子任务的队列资源以及系统资源。In step S32, if the estimated time of the subtasks exceeding the preset number is higher than the standard time, the queue resources and system resources of the subtasks to be processed are increased.
本实施例中,将所述多个子任务预计时间分别与所述标准时间进行比对,以判断所述待处理子任务是否连续多次的子任务预计时间均高于标准时间。其中,获取所述待处理子任务在预设周期内的多个历史任务完成时间,并计算所述多个历史任务完成时间的平均值,作为所述标准时间。在高于所述标准时间的子任务预计时间的个数超过预设个数时,即表示所述待处理子任务的整体执行时间的预测值高于合理时间,应该增加所述待处理子任务的资源。In this embodiment, the estimated times of the multiple subtasks are respectively compared with the standard time to determine whether the expected times of the subtasks of the to-be-processed subtasks are higher than the standard time for multiple consecutive times. Wherein, the completion time of multiple historical tasks of the subtask to be processed within a preset period is acquired, and the average value of the completion time of the multiple historical tasks is calculated as the standard time. When the number of predicted subtasks that are higher than the standard time exceeds the preset number, it means that the predicted value of the overall execution time of the subtask to be processed is higher than the reasonable time, and the subtask to be processed should be increased H.
进一步地,所述将若超过预设个数的子任务预计时间高于所述标准时间,则增加所述待处理子任务的队列资源以及系统资源的步骤具体包括:Further, if the estimated time of the subtasks exceeding the preset number is higher than the standard time, the step of increasing the queue resources and system resources of the subtasks to be processed specifically includes:
若超过预设个数的子任务预计时间高于所述标准时间,则获取所述预设个数的子任务预计时间与所述标准时间的平均时间差值;If the predicted time of the subtasks exceeding the preset number is higher than the standard time, obtaining the average time difference between the predicted time of the preset number of subtasks and the standard time;
根据预设资源调度表以及所述平均时间差值,确定所述待处理子任务对应的待增加资源,并根据所述待增加资源增加所述待处理子任务的队列资源以及系统资源。Determine the resource to be added corresponding to the subtask to be processed according to a preset resource scheduling table and the average time difference, and increase the queue resource and system resource of the subtask to be processed according to the resource to be added.
本实施例中,若超过预设个数的子任务预计时间高于所述标准时间,即表示所述待处理子任务整体执行时间超时,需要为其增加资源。获取所述待处理子任务的多个子任务预计时间的平均值,并计算所述平均值与所述标准时间的差值,作为平均时间差。为了便于资源调度,预先根据待处理子任务的任务实际处理时间与标准时间的差值,设置对应的资源调度关系。所述资源调度表可为根 据大数据分析,自动设置,也可以根据实际需要人为设置。在确定所述待处理子任务需要增加的资源后,优先确定所述待处理子任务所属待处理任务队列中的队列最大资源数,并判断所述待处理子任务的待增加的资源是否超过所述最大资源数,若不超过,则判断队列剩余资源是否满足待增加的资源的分配,若不满足,则从所属集群系统中的系统资源进行调度。In this embodiment, if the estimated time of the subtasks exceeding the preset number is higher than the standard time, it means that the overall execution time of the subtasks to be processed has timed out, and resources need to be added for them. Obtain the average value of the expected time of the multiple subtasks of the subtask to be processed, and calculate the difference between the average value and the standard time as the average time difference. In order to facilitate resource scheduling, the corresponding resource scheduling relationship is set in advance according to the difference between the actual task processing time of the subtask to be processed and the standard time. The resource scheduling table can be automatically set based on big data analysis, or it can be set manually according to actual needs. After determining the resources to be added for the to-be-processed sub-task, first determine the maximum number of resources in the queue of the to-be-processed task to which the to-be-processed sub-task belongs, and determine whether the to-be-added resources of the to-be-processed sub-task exceed the total number of resources. If the maximum number of resources is not exceeded, it is determined whether the remaining resources in the queue meet the allocation of resources to be added; if not, the scheduling is performed from the system resources in the cluster system to which they belong.
参照图4,图4为本申请集群队列资源的调度方法第三实施例的流程示意图。Referring to FIG. 4, FIG. 4 is a schematic flowchart of a third embodiment of a method for scheduling cluster queue resources according to this application.
基于上述图3所示实施例,本实施例中,所述步骤S30之后,还包括:Based on the embodiment shown in FIG. 3, in this embodiment, after the step S30, the method further includes:
步骤S40,根据调度后的资源参数以及所述线性回归模型,确定所述待处理子任务的当前子任务预计时间,并启动定时器,对调度后的待处理子任务的执行时间进行监测;Step S40: Determine the current estimated time of the subtask to be processed according to the scheduled resource parameters and the linear regression model, and start a timer to monitor the scheduled execution time of the subtask to be processed;
步骤S50,在检测到所述执行时间达到所述当前子任务预计时间时,检测所述待处理子任务是否执行成功;Step S50: When detecting that the execution time reaches the estimated time of the current subtask, detect whether the subtask to be processed is executed successfully;
步骤S60,若所述待处理子任务执行成功,则释放所述待处理子任务占用的队列资源与系统资源。In step S60, if the subtask to be processed is executed successfully, the queue resources and system resources occupied by the subtask to be processed are released.
本实施例中,为了提高资源利用率,为待处理子任务增加资源后,启动一个定时器,对所述待处理子任务的任务执行情况进行监测。并根据监测结果对任务资源进行释放以及回收。即获取调度后的实时资源参数,并将所述实时资源参数输入至所述线性回归模型,从而确定所述待处理子任务的当前子任务预计时间。并当定时器达到当前子任务预计时间时,检测该任务是否执行完毕。若该任务执行完毕,回收增加的队列资源,若该任务未执行完毕,不回收增加的队列资源。In this embodiment, in order to improve resource utilization, after adding resources to the subtask to be processed, a timer is started to monitor the task execution status of the subtask to be processed. And according to the monitoring results, the task resources are released and recovered. That is, the scheduled real-time resource parameters are obtained, and the real-time resource parameters are input to the linear regression model, so as to determine the current subtask estimated time of the subtask to be processed. And when the timer reaches the estimated time of the current subtask, it is checked whether the task is completed. If the task is executed, the added queue resource is recovered, and if the task is not executed, the added queue resource is not recovered.
此外,本申请实施例还提供一种集群队列资源的调度装置。In addition, the embodiment of the present application also provides an apparatus for scheduling cluster queue resources.
参照图5,图5为本申请集群队列资源的调度装置第一实施例的功能模块示意图。Referring to FIG. 5, FIG. 5 is a schematic diagram of functional modules of a first embodiment of a cluster queue resource scheduling apparatus of this application.
本实施例中,所述集群队列资源的调度装置应用于集群系统,所述集群队列资源的调度装置包括:In this embodiment, the device for scheduling cluster queue resources is applied to a cluster system, and the device for scheduling cluster queue resources includes:
资源参数获取模块10,用于确定所述集群系统中的各个待处理子任务队列以及所述待处理子任务队列中的各个待处理子任务,并获取所述集群系统的系统资 源参数、所述待处理子任务队列的队列相关参数以及所述待处理子任务的任务相关参数;The resource parameter acquisition module 10 is used to determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system, the Queue-related parameters of the queue of to-be-processed subtasks and task-related parameters of the to-be-processed subtasks;
预计时间计算模块20,用于将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间;The estimated time calculation module 20 is configured to input the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtain the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model ;
任务资源调度模块30,用于将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度。The task resource scheduling module 30 is configured to compare the estimated time of the subtask with a preset standard time, and according to the comparison result of the estimated time of the subtask and the standard time, determine the status of the subtask to be processed Queue resources and system resources are scheduled.
进一步地,所述集群队列资源的调度装置还包括模型训练模块,所述模型训练模块用于:Further, the device for scheduling cluster queue resources further includes a model training module, and the model training module is configured to:
在预设模型训练数据中确定所述待处理子任务对应的待训练模型以及模型训练数据;Determining the model to be trained and the model training data corresponding to the subtask to be processed in the preset model training data;
获取所述模型训练数据中的系统资源训练参数、队列相关训练参数以及任务相关训练参数,作为所述待训练模型中的自变量参数;Acquiring system resource training parameters, queue-related training parameters, and task-related training parameters in the model training data as independent variable parameters in the model to be trained;
获取所述模型训练数据中的目标子任务预计时间,作为所述待训练模型中的因变量参数;Acquiring the estimated time of the target subtask in the model training data as a dependent variable parameter in the model to be trained;
根据线性回归公式、所述自变量参数以及所述因变量参数,将所述待训练模型训练生成所述线性回归模型。According to the linear regression formula, the independent variable parameters and the dependent variable parameters, the model to be trained is trained to generate the linear regression model.
进一步地,所述模型训练模块还用于:Further, the model training module is also used for:
将所述自变量参数以及所述因变量参数输入至所述线性回归公式,以得到训练后的初始回归参数,其中,所述线性回归公式为:Input the independent variable parameters and the dependent variable parameters into the linear regression formula to obtain the initial regression parameters after training, wherein the linear regression formula is:
y=b0+b1X1+b2X2+...+bnXn,X1、X2、Xn为自变量参数,y为因变量参数,b0、b1、bn为初始回归参数;y=b0+b1X1+b2X2+...+bnXn, X1, X2, Xn are independent variable parameters, y is dependent variable parameters, b0, b1, bn are initial regression parameters;
根据最小二乘估计算法,对所述初始回归参数进行调整,生成目标回归参数;Adjust the initial regression parameters according to the least squares estimation algorithm to generate target regression parameters;
根据所述目标回归参数以及所述待训练模型,生成所述线性回归模型。According to the target regression parameter and the model to be trained, the linear regression model is generated.
进一步地,所述资源参数获取模块10还用于:Further, the resource parameter acquisition module 10 is also used for:
获取预设周期内的系统资源参数、队列相关参数以及任务相关参数,并通过所述线性回归模型计算所述预设周期内所述待处理子任务对应的多个子任务预计 时间;Acquiring system resource parameters, queue-related parameters, and task-related parameters in a preset period, and calculating, through the linear regression model, the estimated times of multiple subtasks corresponding to the subtasks to be processed in the preset period;
进一步地,所述任务资源调度模块30还用于:Further, the task resource scheduling module 30 is also used for:
将所述多个子任务预计时间分别与所述标准时间进行比对;Comparing the estimated time of the multiple subtasks with the standard time;
若超过预设个数的子任务预计时间高于所述标准时间,则增加所述待处理子任务的队列资源以及系统资源。If the estimated time of the subtasks exceeding the preset number is higher than the standard time, the queue resources and system resources of the subtasks to be processed are increased.
进一步地,所述任务资源调度模块30还用于:Further, the task resource scheduling module 30 is also used for:
若超过预设个数的子任务预计时间高于所述标准时间,则获取所述预设个数的子任务预计时间与所述标准时间的平均时间差值;If the predicted time of the subtasks exceeding the preset number is higher than the standard time, obtaining the average time difference between the predicted time of the preset number of subtasks and the standard time;
根据预设资源调度表以及所述平均时间差值,确定所述待处理子任务对应的待增加资源,并根据所述待增加资源增加所述待处理子任务的队列资源以及系统资源。Determine the resource to be added corresponding to the subtask to be processed according to a preset resource scheduling table and the average time difference, and increase the queue resource and system resource of the subtask to be processed according to the resource to be added.
进一步地,所述预计时间计算模块20还用于:Further, the estimated time calculation module 20 is also used for:
获取所述待处理子任务在预设周期内的多个历史任务完成时间,并计算所述多个历史任务完成时间的平均值,作为所述标准时间。Acquire the completion time of multiple historical tasks of the subtask to be processed within a preset period, and calculate an average value of the completion time of the multiple historical tasks as the standard time.
进一步地,所述集群队列资源的调度装置还包括资源回收模块,所述资源回收模块用于:Further, the device for scheduling cluster queue resources further includes a resource recovery module, and the resource recovery module is configured to:
根据调度后的资源参数以及所述线性回归模型,确定所述待处理子任务的当前子任务预计时间,并启动定时器,对调度后的待处理子任务的执行时间进行监测;According to the scheduled resource parameters and the linear regression model, determine the current estimated time of the subtask to be processed, and start a timer to monitor the execution time of the scheduled subtask;
在检测到所述执行时间达到所述当前子任务预计时间时,检测所述待处理子任务是否执行成功;When it is detected that the execution time reaches the estimated time of the current subtask, detecting whether the execution of the subtask to be processed is successful;
若所述待处理子任务执行成功,则释放所述待处理子任务占用的队列资源与系统资源。If the subtask to be processed is executed successfully, the queue resources and system resources occupied by the subtask to be processed are released.
其中,上述集群队列资源的调度装置中各个模块与上述集群队列资源的调度方法实施例中各步骤相对应,其功能和实现过程在此处不再一一赘述。Among them, each module in the above-mentioned cluster queue resource scheduling device corresponds to each step in the above-mentioned cluster queue resource scheduling method embodiment, and its functions and implementation processes will not be repeated here.
此外,本申请实施例还提供一种计算机可读存储介质。In addition, the embodiment of the present application also provides a computer-readable storage medium.
本申请计算机可读存储介质上存储有集群队列资源的调度程序,其中所述集群队列资源的调度程序被处理器执行时,实现以下步骤:The computer-readable storage medium of the present application stores a scheduler for cluster queue resources, where the scheduler for cluster queue resources is executed by a processor, the following steps are implemented:
确定所述集群系统中的各个待处理子任务队列以及所述待处理子任务队列中的各个待处理子任务,并获取所述集群系统的系统资源参数、所述待处理子任务队列的队列相关参数以及所述待处理子任务的任务相关参数;Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system and the queue correlation of the pending subtask queue Parameters and task-related parameters of the subtasks to be processed;
将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间;Inputting the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtaining the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;
将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度。The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
其中,集群队列资源的调度程序被执行时所实现的方法可参照本申请集群队列资源的调度方法的各个实施例,此处不再赘述。Among them, the method implemented when the scheduling program of the cluster queue resource is executed can refer to the various embodiments of the scheduling method of the cluster queue resource of the present application, which will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or system including a series of elements not only includes those elements, It also includes other elements that are not explicitly listed, or elements inherent to the process, method, article, or system. Without more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or system that includes the element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the superiority or inferiority of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个计算机可读存储介质(如ROM/RAM、磁碟、光盘)中,所述计算机可读存储介质可以是非易失性,也可以是易失性,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a computer-readable storage medium as described above (such as In ROM/RAM, magnetic disk, optical disk), the computer-readable storage medium can be non-volatile or volatile, and includes a number of instructions to enable a terminal device (which can be a mobile phone, a computer, a server, An air conditioner, or a network device, etc.) execute the method described in each embodiment of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (20)

  1. 一种集群队列资源的调度方法,其中,所述集群队列资源的调度方法应用于集群系统,所述集群队列资源的调度方法包括以下步骤:A method for scheduling cluster queue resources, wherein the method for scheduling cluster queue resources is applied to a cluster system, and the method for scheduling cluster queue resources includes the following steps:
    确定所述集群系统中的各个待处理子任务队列以及所述待处理子任务队列中的各个待处理子任务,并获取所述集群系统的系统资源参数、所述待处理子任务队列的队列相关参数以及所述待处理子任务的任务相关参数;Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system and the queue correlation of the pending subtask queue Parameters and task-related parameters of the subtasks to be processed;
    将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间;Inputting the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtaining the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;
    将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度。The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
  2. 如权利要求1所述的集群队列资源的调度方法,其中,所述将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间的步骤之前,还包括:The method for scheduling cluster queue resources according to claim 1, wherein said inputting said system resource parameters, queue-related parameters and task-related parameters into a preset linear regression model, and obtaining said linear regression model through said linear regression model. Before the step of the estimated time of the subtask corresponding to the subtask to be processed, it also includes:
    在预设模型训练数据中确定所述待处理子任务对应的待训练模型以及模型训练数据;Determining the model to be trained and the model training data corresponding to the subtask to be processed in the preset model training data;
    获取所述模型训练数据中的系统资源训练参数、队列相关训练参数以及任务相关训练参数,作为所述待训练模型中的自变量参数;Acquiring system resource training parameters, queue-related training parameters, and task-related training parameters in the model training data as independent variable parameters in the model to be trained;
    获取所述模型训练数据中的目标子任务预计时间,作为所述待训练模型中的因变量参数;Acquiring the estimated time of the target subtask in the model training data as a dependent variable parameter in the model to be trained;
    根据线性回归公式、所述自变量参数以及所述因变量参数,将所述待训练模型训练生成所述线性回归模型。According to the linear regression formula, the independent variable parameters and the dependent variable parameters, the model to be trained is trained to generate the linear regression model.
  3. 如权利要求2所述的集群队列资源的调度方法,其中,所述根据线 性回归公式、所述自变量参数以及所述因变量参数,将所述待训练模型训练生成所述线性回归模型的步骤具体包括:The method for scheduling cluster queue resources according to claim 2, wherein the step of training the model to be trained to generate the linear regression model according to a linear regression formula, the independent variable parameter and the dependent variable parameter Specifically:
    将所述自变量参数以及所述因变量参数输入至所述线性回归公式,以得到训练后的初始回归参数,其中,所述线性回归公式为:Input the independent variable parameters and the dependent variable parameters into the linear regression formula to obtain the initial regression parameters after training, wherein the linear regression formula is:
    y=b0+b1X1+b2X2+...+bnXn,X1、X2、Xn为自变量参数,y为因变量参数,b0、b1、bn为初始回归参数;y=b0+b1X1+b2X2+...+bnXn, X1, X2, Xn are independent variable parameters, y is dependent variable parameters, b0, b1, bn are initial regression parameters;
    根据最小二乘估计算法,对所述初始回归参数进行调整,生成目标回归参数;Adjust the initial regression parameters according to the least squares estimation algorithm to generate target regression parameters;
    根据所述目标回归参数以及所述待训练模型,生成所述线性回归模型。According to the target regression parameter and the model to be trained, the linear regression model is generated.
  4. 如权利要求1所述的集群队列资源的调度方法,其中,所述将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间的步骤具体包括:The method for scheduling cluster queue resources according to claim 1, wherein said inputting said system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtaining said linear regression model through said linear regression model. The steps for the estimated time of the subtask corresponding to the subtask to be processed specifically include:
    获取预设周期内的系统资源参数、队列相关参数以及任务相关参数,并通过所述线性回归模型计算所述预设周期内所述待处理子任务对应的多个子任务预计时间;Acquiring system resource parameters, queue-related parameters, and task-related parameters in a preset period, and calculating, through the linear regression model, the estimated times of multiple subtasks corresponding to the subtasks to be processed in the preset period;
    所述将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度的步骤具体包括:The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are compared according to the comparison result of the predicted time of the subtask and the standard time. The scheduling steps specifically include:
    将所述多个子任务预计时间分别与所述标准时间进行比对;Comparing the estimated time of the multiple subtasks with the standard time;
    若超过预设个数的子任务预计时间高于所述标准时间,则增加所述待处理子任务的队列资源以及系统资源。If the estimated time of the subtasks exceeding the preset number is higher than the standard time, the queue resources and system resources of the subtasks to be processed are increased.
  5. 如权利要求4所述的集群队列资源的调度方法,其中,所述将若超过预设个数的子任务预计时间高于所述标准时间,则增加所述待处理子任务的队列资源以及系统资源的步骤具体包括:The method for scheduling cluster queue resources according to claim 4, wherein if the estimated time of the subtasks exceeding the preset number is higher than the standard time, the queue resources of the subtasks to be processed and the system are increased. The resource steps specifically include:
    若超过预设个数的子任务预计时间高于所述标准时间,则获取所述预设个数的子任务预计时间与所述标准时间的平均时间差值;If the predicted time of the subtasks exceeding the preset number is higher than the standard time, obtaining the average time difference between the predicted time of the preset number of subtasks and the standard time;
    根据预设资源调度表以及所述平均时间差值,确定所述待处理子任务对应的待增加资源,并根据所述待增加资源增加所述待处理子任务的队列资源以及系统资源。Determine the resource to be added corresponding to the subtask to be processed according to a preset resource scheduling table and the average time difference, and increase the queue resource and system resource of the subtask to be processed according to the resource to be added.
  6. 如权利要求4所述的集群队列资源的调度方法,其中,所述将所述多个子任务预计时间分别与所述标准时间进行比对的步骤之前,还包括:The method for scheduling cluster queue resources according to claim 4, wherein before the step of comparing the estimated times of the multiple subtasks with the standard time, the method further comprises:
    获取所述待处理子任务在预设周期内的多个历史任务完成时间,并计算所述多个历史任务完成时间的平均值,作为所述标准时间。Acquire the completion time of multiple historical tasks of the subtask to be processed within a preset period, and calculate an average value of the completion time of the multiple historical tasks as the standard time.
  7. 如权利要求1至6任意一项所述的集群队列资源的调度方法,其中,所述将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度的步骤之后,还包括:The method for scheduling cluster queue resources according to any one of claims 1 to 6, wherein the predicted time of the subtask is compared with a preset standard time, and the predicted time of the subtask is compared with the The comparison result of the standard time, after the step of scheduling the queue resources and system resources of the to-be-processed subtasks, further includes:
    根据调度后的资源参数以及所述线性回归模型,确定所述待处理子任务的当前子任务预计时间,并启动定时器,对调度后的待处理子任务的执行时间进行监测;According to the scheduled resource parameters and the linear regression model, determine the current estimated time of the subtask to be processed, and start a timer to monitor the execution time of the scheduled subtask;
    在检测到所述执行时间达到所述当前子任务预计时间时,检测所述待处理子任务是否执行成功;When it is detected that the execution time reaches the estimated time of the current subtask, detecting whether the execution of the subtask to be processed is successful;
    若所述待处理子任务执行成功,则释放所述待处理子任务占用的队列资源与系统资源。If the subtask to be processed is executed successfully, the queue resources and system resources occupied by the subtask to be processed are released.
  8. 一种集群队列资源的调度装置,其中,所述集群队列资源的调度装置应用于集群系统,所述集群队列资源的调度装置包括:A scheduling device for cluster queue resources, wherein the scheduling device for cluster queue resources is applied to a cluster system, and the scheduling device for cluster queue resources includes:
    资源参数获取模块,用于确定所述集群系统中的各个待处理子任务队列以及所述待处理子任务队列中的各个待处理子任务,并获取所述集群系统的系统资源参数、所述待处理子任务队列的队列相关参数以及所述待处理子任务的任务相关参数;The resource parameter acquisition module is used to determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system, the pending subtasks Processing the queue related parameters of the subtask queue and the task related parameters of the to-be-processed subtask;
    预计时间计算模块,用于将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归 模型获取所述待处理子任务对应的子任务预计时间;An estimated time calculation module, configured to input the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtain the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;
    任务资源调度模块,用于将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度。The task resource scheduling module is used to compare the estimated time of the subtask with a preset standard time, and according to the comparison result of the estimated time of the subtask and the standard time, compare the queue of the subtask to be processed Resources and system resources are scheduled.
  9. 如权利要求8所述的集群队列资源的调度装置,其中,所述装置还包括模型训练模块,所述模型训练模块用于:The device for scheduling cluster queue resources according to claim 8, wherein the device further comprises a model training module, and the model training module is used for:
    在预设模型训练数据中确定所述待处理子任务对应的待训练模型以及模型训练数据;Determining the model to be trained and the model training data corresponding to the subtask to be processed in the preset model training data;
    获取所述模型训练数据中的系统资源训练参数、队列相关训练参数以及任务相关训练参数,作为所述待训练模型中的自变量参数;Acquiring system resource training parameters, queue-related training parameters, and task-related training parameters in the model training data as independent variable parameters in the model to be trained;
    获取所述模型训练数据中的目标子任务预计时间,作为所述待训练模型中的因变量参数;Acquiring the estimated time of the target subtask in the model training data as a dependent variable parameter in the model to be trained;
    根据线性回归公式、所述自变量参数以及所述因变量参数,将所述待训练模型训练生成所述线性回归模型。According to the linear regression formula, the independent variable parameters and the dependent variable parameters, the model to be trained is trained to generate the linear regression model.
  10. 如权利要求9所述的集群队列资源的调度装置,其中,所述模型训练模块还用于:The cluster queue resource scheduling device according to claim 9, wherein the model training module is further used for:
    将所述自变量参数以及所述因变量参数输入至所述线性回归公式,以得到训练后的初始回归参数,其中,所述线性回归公式为:Input the independent variable parameters and the dependent variable parameters into the linear regression formula to obtain the initial regression parameters after training, wherein the linear regression formula is:
    y=b0+b1X1+b2X2+...+bnXn,X1、X2、Xn为自变量参数,y为因变量参数,b0、b1、bn为初始回归参数;y=b0+b1X1+b2X2+...+bnXn, X1, X2, Xn are independent variable parameters, y is dependent variable parameters, b0, b1, bn are initial regression parameters;
    根据最小二乘估计算法,对所述初始回归参数进行调整,生成目标回归参数;Adjust the initial regression parameters according to the least squares estimation algorithm to generate target regression parameters;
    根据所述目标回归参数以及所述待训练模型,生成所述线性回归模型。According to the target regression parameter and the model to be trained, the linear regression model is generated.
  11. 如权利要求8所述的集群队列资源的调度装置,其中,所述预计时间计算模块具体用于:The cluster queue resource scheduling device according to claim 8, wherein the estimated time calculation module is specifically configured to:
    获取预设周期内的系统资源参数、队列相关参数以及任务相关参 数,并通过所述线性回归模型计算所述预设周期内所述待处理子任务对应的多个子任务预计时间;Acquiring system resource parameters, queue-related parameters, and task-related parameters in a preset period, and using the linear regression model to calculate the estimated time of multiple subtasks corresponding to the subtasks to be processed in the preset period;
    所述将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度的步骤具体包括:The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are compared according to the comparison result of the predicted time of the subtask and the standard time. The scheduling steps specifically include:
    将所述多个子任务预计时间分别与所述标准时间进行比对;Comparing the estimated time of the multiple subtasks with the standard time;
    若超过预设个数的子任务预计时间高于所述标准时间,则增加所述待处理子任务的队列资源以及系统资源。If the estimated time of the subtasks exceeding the preset number is higher than the standard time, the queue resources and system resources of the subtasks to be processed are increased.
  12. 如权利要求11所述的集群队列资源的调度装置,其中,所述预计时间计算模块在实现所述若超过预设个数的子任务预计时间高于所述标准时间,则增加所述待处理子任务的队列资源以及系统资源的功能时,具体用于:The cluster queue resource scheduling device according to claim 11, wherein the estimated time calculation module increases the to-be-processed time if the estimated time of the subtasks exceeding the preset number is higher than the standard time. When the queue resource of the subtask and the function of the system resource, it is specifically used for:
    若超过预设个数的子任务预计时间高于所述标准时间,则获取所述预设个数的子任务预计时间与所述标准时间的平均时间差值;If the predicted time of the subtasks exceeding the preset number is higher than the standard time, obtaining the average time difference between the predicted time of the preset number of subtasks and the standard time;
    根据预设资源调度表以及所述平均时间差值,确定所述待处理子任务对应的待增加资源,并根据所述待增加资源增加所述待处理子任务的队列资源以及系统资源。Determine the resource to be added corresponding to the subtask to be processed according to a preset resource scheduling table and the average time difference, and increase the queue resource and system resource of the subtask to be processed according to the resource to be added.
  13. 如权利要求11所述的集群队列资源的调度装置,其中,所述预计时间计算模块,还用于:The cluster queue resource scheduling device according to claim 11, wherein the estimated time calculation module is further configured to:
    获取所述待处理子任务在预设周期内的多个历史任务完成时间,并计算所述多个历史任务完成时间的平均值,作为所述标准时间。Acquire the completion time of multiple historical tasks of the subtask to be processed within a preset period, and calculate an average value of the completion time of the multiple historical tasks as the standard time.
  14. 如权利要求8至13任意一项所述的集群队列资源的调度装置,其中,所述装置还包括资源回收模块,所述资源回收模块用于:The device for scheduling cluster queue resources according to any one of claims 8 to 13, wherein the device further comprises a resource recovery module, and the resource recovery module is configured to:
    根据调度后的资源参数以及所述线性回归模型,确定所述待处理子任务的当前子任务预计时间,并启动定时器,对调度后的待处理子任务的执行时间进行监测;According to the scheduled resource parameters and the linear regression model, determine the current estimated time of the subtask to be processed, and start a timer to monitor the execution time of the scheduled subtask;
    在检测到所述执行时间达到所述当前子任务预计时间时,检测所 述待处理子任务是否执行成功;When it is detected that the execution time reaches the estimated time of the current subtask, detecting whether the execution of the subtask to be processed is successful;
    若所述待处理子任务执行成功,则释放所述待处理子任务占用的队列资源与系统资源。If the subtask to be processed is executed successfully, the queue resources and system resources occupied by the subtask to be processed are released.
  15. 一种集群队列资源的调度设备,其中,所述集群队列资源的调度设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的集群队列资源的调度程序,其中所述集群队列资源的调度程序被所述处理器执行时,实现以下步骤:A scheduling device for cluster queue resources, wherein the scheduling device for cluster queue resources includes a processor, a memory, and a cluster queue resource scheduler stored on the memory and executable by the processor, wherein When the scheduler of the cluster queue resource is executed by the processor, the following steps are implemented:
    确定所述集群系统中的各个待处理子任务队列以及所述待处理子任务队列中的各个待处理子任务,并获取所述集群系统的系统资源参数、所述待处理子任务队列的队列相关参数以及所述待处理子任务的任务相关参数;Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system and the queue correlation of the pending subtask queue Parameters and task-related parameters of the subtasks to be processed;
    将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间;Inputting the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtaining the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;
    将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度。The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
  16. 如权利要求15所述的集群队列资源的调度设备,其中,在所述将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间的步骤之前,所述集群队列资源的调度程序被所述处理器执行时还实现以下步骤:The cluster queue resource scheduling device according to claim 15, wherein, in the said system resource parameters, queue-related parameters and task-related parameters are input to a preset linear regression model, and all the parameters are obtained through the linear regression model. Before the step of estimating the time of the subtask corresponding to the to-be-processed subtask, the following steps are also implemented when the scheduler of the cluster queue resource is executed by the processor:
    在预设模型训练数据中确定所述待处理子任务对应的待训练模型以及模型训练数据;Determining the model to be trained and the model training data corresponding to the subtask to be processed in the preset model training data;
    获取所述模型训练数据中的系统资源训练参数、队列相关训练参数以及任务相关训练参数,作为所述待训练模型中的自变量参数;Acquiring system resource training parameters, queue-related training parameters, and task-related training parameters in the model training data as independent variable parameters in the model to be trained;
    获取所述模型训练数据中的目标子任务预计时间,作为所述待训 练模型中的因变量参数;Acquiring the estimated time of the target subtask in the model training data as a dependent variable parameter in the model to be trained;
    根据线性回归公式、所述自变量参数以及所述因变量参数,将所述待训练模型训练生成所述线性回归模型。According to the linear regression formula, the independent variable parameters and the dependent variable parameters, the model to be trained is trained to generate the linear regression model.
  17. 如权利要求16所述的集群队列资源的调度设备,其中,在实现所述根据线性回归公式、所述自变量参数以及所述因变量参数,将所述待训练模型训练生成所述线性回归模型的步骤时,具体包括:The cluster queue resource scheduling device according to claim 16, wherein, after implementing the linear regression formula, the independent variable parameters, and the dependent variable parameters, the model to be trained is trained to generate the linear regression model The specific steps include:
    将所述自变量参数以及所述因变量参数输入至所述线性回归公式,以得到训练后的初始回归参数,其中,所述线性回归公式为:Input the independent variable parameters and the dependent variable parameters into the linear regression formula to obtain the initial regression parameters after training, wherein the linear regression formula is:
    y=b0+b1X1+b2X2+...+bnXn,X1、X2、Xn为自变量参数,y为因变量参数,b0、b1、bn为初始回归参数;y=b0+b1X1+b2X2+...+bnXn, X1, X2, Xn are independent variable parameters, y is dependent variable parameters, b0, b1, bn are initial regression parameters;
    根据最小二乘估计算法,对所述初始回归参数进行调整,生成目标回归参数;Adjust the initial regression parameters according to the least squares estimation algorithm to generate target regression parameters;
    根据所述目标回归参数以及所述待训练模型,生成所述线性回归模型。According to the target regression parameter and the model to be trained, the linear regression model is generated.
  18. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有集群队列资源的调度程序,其中所述集群队列资源的调度程序被处理器执行时,实现以下步骤:A computer-readable storage medium, wherein a cluster queue resource scheduler is stored on the computer-readable storage medium, and when the cluster queue resource scheduler is executed by a processor, the following steps are implemented:
    确定所述集群系统中的各个待处理子任务队列以及所述待处理子任务队列中的各个待处理子任务,并获取所述集群系统的系统资源参数、所述待处理子任务队列的队列相关参数以及所述待处理子任务的任务相关参数;Determine each pending subtask queue in the cluster system and each pending subtask in the pending subtask queue, and obtain the system resource parameters of the cluster system and the queue correlation of the pending subtask queue Parameters and task-related parameters of the subtasks to be processed;
    将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间;Inputting the system resource parameters, queue-related parameters, and task-related parameters into a preset linear regression model, and obtaining the estimated time of the subtask corresponding to the subtask to be processed through the linear regression model;
    将所述子任务预计时间与预设标准时间进行比对,并根据所述子任务预计时间与所述标准时间的比对结果,对所述待处理子任务的队列资源以及系统资源进行调度。The predicted time of the subtask is compared with a preset standard time, and the queue resources and system resources of the subtask to be processed are scheduled according to the comparison result of the predicted time of the subtask and the standard time.
  19. 如权利要求18所述的计算机可读存储介质,其中,在所述将所述系统资源参数、队列相关参数以及任务相关参数输入至预设线性回归模型,并通过所述线性回归模型获取所述待处理子任务对应的子任务预计时间的步骤之前,所述集群队列资源的调度程序被处理器执行时,还实现以下步骤:The computer-readable storage medium according to claim 18, wherein in the said system resource parameters, queue-related parameters and task-related parameters are input into a preset linear regression model, and the said linear regression model is used to obtain the Before the step of estimating the time of the subtask corresponding to the subtask to be processed, when the scheduler of the cluster queue resource is executed by the processor, the following steps are also implemented:
    在预设模型训练数据中确定所述待处理子任务对应的待训练模型以及模型训练数据;Determining the model to be trained and the model training data corresponding to the subtask to be processed in the preset model training data;
    获取所述模型训练数据中的系统资源训练参数、队列相关训练参数以及任务相关训练参数,作为所述待训练模型中的自变量参数;Acquiring system resource training parameters, queue-related training parameters, and task-related training parameters in the model training data as independent variable parameters in the model to be trained;
    获取所述模型训练数据中的目标子任务预计时间,作为所述待训练模型中的因变量参数;Acquiring the estimated time of the target subtask in the model training data as a dependent variable parameter in the model to be trained;
    根据线性回归公式、所述自变量参数以及所述因变量参数,将所述待训练模型训练生成所述线性回归模型。According to the linear regression formula, the independent variable parameters and the dependent variable parameters, the model to be trained is trained to generate the linear regression model.
  20. 如权利要求19所述的计算机可读存储介质,其中,在实现所述根据线性回归公式、所述自变量参数以及所述因变量参数,将所述待训练模型训练生成所述线性回归模型的步骤时,具体包括:The computer-readable storage medium according to claim 19, wherein in the realization of said linear regression formula, said independent variable parameters and said dependent variable parameters, training said model to be trained to generate said linear regression model The steps include:
    将所述自变量参数以及所述因变量参数输入至所述线性回归公式,以得到训练后的初始回归参数,其中,所述线性回归公式为:Input the independent variable parameters and the dependent variable parameters into the linear regression formula to obtain the initial regression parameters after training, wherein the linear regression formula is:
    y=b0+b1X1+b2X2+...+bnXn,X1、X2、Xn为自变量参数,y为因变量参数,b0、b1、bn为初始回归参数;y=b0+b1X1+b2X2+...+bnXn, X1, X2, Xn are independent variable parameters, y is dependent variable parameters, b0, b1, bn are initial regression parameters;
    根据最小二乘估计算法,对所述初始回归参数进行调整,生成目标回归参数;Adjust the initial regression parameters according to the least squares estimation algorithm to generate target regression parameters;
    根据所述目标回归参数以及所述待训练模型,生成所述线性回归模型。According to the target regression parameter and the model to be trained, the linear regression model is generated.
PCT/CN2020/093185 2020-02-12 2020-05-29 Method, apparatus and device for scheduling cluster queue resources, and storage medium WO2021159638A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010089180.0A CN111338791A (en) 2020-02-12 2020-02-12 Method, device and equipment for scheduling cluster queue resources and storage medium
CN202010089180.0 2020-02-12

Publications (1)

Publication Number Publication Date
WO2021159638A1 true WO2021159638A1 (en) 2021-08-19

Family

ID=71181543

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093185 WO2021159638A1 (en) 2020-02-12 2020-05-29 Method, apparatus and device for scheduling cluster queue resources, and storage medium

Country Status (2)

Country Link
CN (1) CN111338791A (en)
WO (1) WO2021159638A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111970326B (en) * 2020-07-22 2023-06-09 深圳市欢太科技有限公司 Cluster flow balancing method and device, storage medium and terminal
CN111880922A (en) * 2020-08-07 2020-11-03 北京达佳互联信息技术有限公司 Processing method, device and equipment for concurrent tasks
CN114253695A (en) * 2020-09-21 2022-03-29 中国移动通信有限公司研究院 Method for updating resource information of computing node, node and storage medium
CN115061794A (en) * 2020-09-29 2022-09-16 展讯通信(上海)有限公司 Method, device, terminal and medium for scheduling task and training neural network model
CN112463341A (en) * 2020-12-11 2021-03-09 奇瑞汽车股份有限公司 CAE operation running time prediction method and device based on high performance computing cluster HPC
CN112948113A (en) * 2021-03-01 2021-06-11 上海微盟企业发展有限公司 Cluster resource management scheduling method, device, equipment and readable storage medium
CN113204692A (en) * 2021-05-27 2021-08-03 北京深演智能科技股份有限公司 Method and device for monitoring execution progress of data processing task
CN113190341A (en) * 2021-05-31 2021-07-30 内蒙古豆蔻网络科技有限公司 Server resource scheduling method and system
CN117076555B (en) * 2023-05-08 2024-03-22 深圳市优友网络科技有限公司 Distributed task management system and method based on calculation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521056A (en) * 2011-12-28 2012-06-27 用友软件股份有限公司 Task allocation device and task allocation method
CN102831012A (en) * 2011-06-16 2012-12-19 日立(中国)研究开发有限公司 Task scheduling device and task scheduling method in multimode distributive system
US20130212277A1 (en) * 2012-02-14 2013-08-15 Microsoft Corporation Computing cluster with latency control
CN103729246A (en) * 2013-12-31 2014-04-16 浪潮(北京)电子信息产业有限公司 Method and device for dispatching tasks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831012A (en) * 2011-06-16 2012-12-19 日立(中国)研究开发有限公司 Task scheduling device and task scheduling method in multimode distributive system
CN102521056A (en) * 2011-12-28 2012-06-27 用友软件股份有限公司 Task allocation device and task allocation method
US20130212277A1 (en) * 2012-02-14 2013-08-15 Microsoft Corporation Computing cluster with latency control
CN103729246A (en) * 2013-12-31 2014-04-16 浪潮(北京)电子信息产业有限公司 Method and device for dispatching tasks

Also Published As

Publication number Publication date
CN111338791A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
WO2021159638A1 (en) Method, apparatus and device for scheduling cluster queue resources, and storage medium
CN112162865B (en) Scheduling method and device of server and server
Kc et al. Scheduling hadoop jobs to meet deadlines
US8424007B1 (en) Prioritizing tasks from virtual machines
Xu et al. Adaptive task scheduling strategy based on dynamic workload adjustment for heterogeneous Hadoop clusters
CN107861796B (en) Virtual machine scheduling method supporting energy consumption optimization of cloud data center
CN109614227B (en) Task resource allocation method and device, electronic equipment and computer readable medium
CN111625331A (en) Task scheduling method, device, platform, server and storage medium
US8312466B2 (en) Restricting resources consumed by ghost agents
US11838384B2 (en) Intelligent scheduling apparatus and method
WO2024016596A1 (en) Container cluster scheduling method and apparatus, device, and storage medium
CN107515784A (en) A kind of method and apparatus of computing resource in a distributed system
CN115373835A (en) Task resource adjusting method and device for Flink cluster and electronic equipment
CN115543624A (en) Heterogeneous computing power arrangement scheduling method, system, equipment and storage medium
Iglesias et al. A methodology for online consolidation of tasks through more accurate resource estimations
CN110928659A (en) Numerical value pool system remote multi-platform access method with self-adaptive function
JP2016501392A (en) Resource management system, resource management method, and program
Sheetal et al. Secured Data Transmission with Integrated Fault Reduction Scheduling in Cloud Computing.
Ray et al. Is high performance computing (HPC) ready to handle big data?
CN114860449A (en) Data processing method, device, equipment and storage medium
CN115033377A (en) Service resource prediction method and device based on cluster server and electronic equipment
Poltavtseva et al. Planning of aggregation and normalization of data from the Internet of Things for processing on a multiprocessor cluster
CN110297693B (en) Distributed software task allocation method and system
De Mello et al. A new migration model based on the evaluation of processes load and lifetime on heterogeneous computing environments
Kalogeraki et al. Dynamic migration algorithms for distributed object systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20919214

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20919214

Country of ref document: EP

Kind code of ref document: A1