CN114281528A

CN114281528A - Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster

Info

Publication number: CN114281528A
Application number: CN202111505917.3A
Authority: CN
Inventors: 李鸿健; 罗浩; 段小林; 邹洋; 熊安萍; 徐瑄航; 马建勇; 王田田
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-12-10
Filing date: 2021-12-10
Publication date: 2022-04-05

Abstract

The invention belongs to the field of reinforcement learning and big data processing, and particularly relates to an energy-saving scheduling method and system based on deep reinforcement learning and a heterogeneous Spark cluster; the method comprises the following steps: acquiring online data information under real load on a Spark cluster in real time, inputting the data information into a trained Q network, carrying out energy consumption-time target prediction on the data information by the Q network, and selecting a scheme with the lowest energy consumption-time target by a system for resource allocation according to the energy consumption-time target prediction; the method and the device consider the problem of resource priority allocation caused by different energy consumption due to cluster isomerism, find the lowest energy consumption-time target under the condition of ensuring that the response time of the user is met, perform resource scheduling according to the lowest energy consumption-time target, optimize the energy consumption target or multiple SLA targets, save energy and reduce emission as far as possible, have important significance for balancing the cloud service provider cost and the response time between users through the method, and have good economic benefit.

Description

Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster

Technical Field

The invention belongs to the field of reinforcement learning and big data processing, and particularly relates to an energy-saving scheduling method and system based on deep reinforcement learning and a heterogeneous Spark cluster.

Background

The distributed big data processing framework Spark is widely applied to the analysis work of research and industry, stores the intermediate result in the memory to accelerate the processing speed, has higher expansibility than other frameworks, and is suitable for running various complex analysis tasks; and cloud computing provides cheaper and more manageable computing resources, many enterprises are turning to the cloud to deploy large data computing clusters, it is important for the enterprises to efficiently utilize computing clusters, waste of tens of millions of funds can be avoided on a large scale even with minor improvements in utilization, and the implementation of a good cluster scheduler is key to avoiding such waste.

Therefore, it is necessary to optimize the energy consumption generated in the spare job operation scheduling mechanism. After the Spark cluster is deployed, the task scheduler scheduling process of Spark can be simply abstracted in that the scheduler allocates a resource block executor for the job, where the resource block executor includes physical resources such as cpu, memory, and the like. The existing Spark defaults to adopt a simple heuristic algorithm FIFO and Fair to schedule, and an executive is created in a distributed mode, so that clusters are used in a balanced mode, and the universality of the use is considered, but the problem of resource priority allocation caused by different energy consumption due to cluster heterogeneity is not considered in the existing Spark. Therefore, the Spark default scheduling policy cannot be optimized for scheduling for a specific SLA objective.

In conclusion, a method capable of improving the resource utilization rate in the spark cluster environment, optimizing the energy consumption target or multiple SLA targets and saving energy and reducing emission as far as possible under the condition that the response time of the user is ensured to be met is found, and the method has important significance for balancing the cost of a Cloud Service Provider (CSP) and the response time between users.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides an energy-saving scheduling method based on deep reinforcement learning and a heterogeneous Spark cluster, which comprises the following steps: acquiring online data information in real time, inputting the data information into a trained Q network, performing energy consumption-time target prediction on the data information by the Q network, and selecting a scheme with the lowest energy consumption-time target by a system for resource allocation according to the energy consumption-time target prediction;

the training process for the Q network is as follows:

s1: acquiring related configuration parameters and execution parameters of job operation; initializing DQN parameters; wherein DQN represents Deep Q-Network, namely Q Network;

s2: calculating a weight coefficient according to the acquired related configuration parameters;

s3: generating an epsilon-Greedy and Boltzmann combined strategy according to the DQN parameter;

s4: the task scheduler performs task scheduling on the working nodes according to the epsilon-Greedy and Boltzmann combined strategy;

s5: constructing an energy consumption-time model according to the weight coefficient and the execution parameter; constructing a reward model according to the task scheduling and the energy consumption-time model, and generating a reward value according to the reward model;

s6: updating the DQN parameters according to the reward values to obtain updated DQN parameters;

s7: steps S3-S6 are repeated, and when the energy consumption-time target converges, the training is completed.

Preferably, the relevant configuration parameters of the job operation include: the number of actuators, cpu resources and memory resources; the execution parameters of the job run include: job arrival time, job identification, job completion time, and job duration.

Preferably, the epsilon-Greedy and Boltzmann combined strategy is:

wherein s represents the resource state of the cluster; a is an action, which represents selecting a specific physical machine to create an executor and allocating resources; a' represents action of the maximum Q value; q (s, a) represents a jackpot value that would be available if state s and action were a; a represents a random action; ε represents the probability and step represents the time step explored by the task scheduler.

Preferably, the task scheduling for the work node includes:

generating an action according to an epsilon-Greedy and Boltzmann combined strategy, and sending the action to a working node according to a scheduling resource;

if the task is partially distributed or the task is inefficiently distributed, feeding back a large amount of negative rewards calculated under the energy consumption model to the task scheduler;

if the task is successfully distributed, the positive reward calculated under the energy consumption model is fed back to the task scheduler.

Preferably, the formula of the energy consumption-time model is as follows:

wherein, C₀、C₁、C₂The weight coefficients are respectively represented by the weight coefficients,

indicates the cpu utilization of the ith node,

the memory utilization rate of the ith node is represented, t represents the working time of the working node i, t' represents the working end time under the current cpu utilization rate and the memory utilization rate, and EA_totalRepresenting the energy consumption of the cluster generation, Avg_TIndicating the average time that all jobs are running,

representing the run time of job j, M representing the number of jobs, n representing the number of nodes, phi representing all jobs, target representing the energy consumption-time target value,

representing the weight to the target.

Further, the cpu utilization is calculated as:

wherein the content of the first and second substances,

indicates the cpu utilization of the ith node, n indicates the number of nodes, i indicates a specific node,

indicates the cpu usage on the ith node,

represents the total amount of cpus on the ith node, and t represents the running time in the case where the current cpu usage of the ith node.

Further, the calculation formula of the memory utilization rate is as follows:

wherein the content of the first and second substances,

the memory utilization rate of the ith node is shown,

indicating the amount of memory usage on the ith node,

represents the total amount of memory on the ith node, and t represents the running time in the case of the current memory usage of the ith node.

Preferably, the reward model is:

wherein EA_totalRepresenting the energy consumption, EA, generated by a complete dispatch cluster_maxIndicating that all the working nodes in the cluster are the energy consumption, EA, generated by running the job under the full load condition_normalizedRepresenting normalized energy consumption, EA_epiRepresenting the part of the reward value related to energy consumption in the track generated by the exploration of the task scheduler for one time, namely in the epamode for one time;

representing the run time of job j, M representing the number of jobs,

represents the minimum average completion time for all jobs,

represents the normalized average job completion time,

the weight to the object is represented by,

represents the portion of the reward value that is related to the average completion time of the job in an epsilon, R_fixedIndicating a fixed prize value, R_epiA true prize value indicating successful assignment of the task.

Preferably, the formula for generating the prize value is:

wherein Reward represents the generated prize value, R_epiA true prize value indicating successful assignment of the task.

An energy-saving scheduling system based on deep reinforcement learning and heterogeneous Spark clusters comprises: the system comprises a task scheduling module, an energy consumption calculation module, a reward generation module and a DQN parameter updating module;

the task scheduling module is used for exploring a cluster environment and scheduling the operation according to the DQN parameter;

the energy consumption calculation module is used for calculating the energy consumption of the system according to the operation scheduling result;

the reward generation module is used for calculating a reward value according to the system energy consumption;

the DQN parameter updating module is used for updating the network DQN parameters by using the reward value and feeding back the DQN parameters to the task scheduling module.

The invention has the beneficial effects that: in the invention, the problem of resource priority allocation caused by different energy consumption due to cluster isomerism is considered, an energy consumption-time target in the process of allocating resources to a system in a heterogeneous spark cluster environment is calculated based on deep reinforcement learning, the lowest energy consumption-time target is searched under the condition of ensuring that the response time of a user is met, and resource scheduling is carried out according to the lowest energy consumption-time target.

Drawings

FIG. 1 is a diagram of a deep reinforcement learning-based system model according to the present invention;

FIG. 2 is a diagram of a resource architecture for Spark node scheduling;

FIG. 3 is a flowchart illustrating an energy-efficient Spark task scheduling based on deep reinforcement learning according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides an energy-saving scheduling method based on deep reinforcement learning and a heterogeneous Spark cluster, as shown in fig. 1, the method comprises the following steps:

acquiring online data information under real load on a Spark cluster in real time, inputting the data information into a trained Q network, carrying out energy consumption-time target prediction on the data information by the Q network, and selecting a scheme with the lowest energy consumption-time target by a system for resource allocation according to the energy consumption-time target prediction;

the training process for the Q network is as follows:

The task energy efficiency scheduling environment comprises the following steps: the Q network training method comprises a task scheduling module, an energy consumption calculation module, a reward generation module and a DQN parameter updating module, wherein in a task energy efficiency scheduling environment, the specific process of Q network training comprises the following steps:

under a real Spark cluster environment, collecting relevant configuration parameters and execution parameters of operation by taking different application programs in a BigDataBench benchmark toolkit as execution loads; relevant configuration parameters for job execution include: the number of executors (executors), cpu resources (cpu) and memory resources (men); the execution parameters of the job run include: job arrival time (arrival _ time), job identification (job _ id), job completion time (finish), and job duration (duration).

As shown in fig. 2, a cluster manager (master) in the task scheduling module issues an allocation instruction for a cluster driver process, and the driver process controls a task scheduler (agent) to allocate resources to each node; the resource allocation case includes: CPU utilization, memory utilization, task type and average running time of application programs; initializing a system cluster environment, DQN parameters, and agent obtaining a state S by observing the cluster; agent explores the environment through an epsilon-Greedy strategy to generate an action, namely an executer executor is created on a work node for a task, and different delay reward values can be obtained when the task is executed completely or fails to be executed; in order to increase the exploration capacity of the agent, the actions are sampled according to probability distribution by adopting a Boltzmann action exploration strategy; firstly, judging that the number of working nodes is N, calculating the value of each action when the state is S, namely the Q value of each action, and limiting the Q value to be [ -100,100]Then calculate P_n＝e^Q(s,a)And

finally according to

Performing non-uniform sampling output action on the N working nodes; the task scheduler performs task scheduling arrangement on the working nodes according to an epsilon-Greedy and Boltzmann combined strategy, wherein the epsilon-Greedy and Boltzmann combined strategy is as follows:

wherein s represents the resource state of the cluster; a is an action, which indicates that a specific physical machine is selected to create an executor and allocate resources; a' represents action of the maximum Q value; q (s, a) represents a jackpot value that may be obtained with state s and action a, and is a desire; a represents a random action; epsilon represents that a random action is selected by the probability epsilon, and the action corresponding to the maximum Q value is selected by the probability 1-epsilon; epsilon is a value between 0 and 1, epsilon is set to be relatively larger at the beginning, action a generated by selecting the maximum value in the Q function with the probability of 1-epsilon exists, and action a is randomly selected in the action space A with the probability of epsilon and enters S' in a certain state space S; the Q function is actually a neural network, the expression means that a certain action a is taken in a certain state S, an expected value of accumulated return is possibly obtained, and the state space S is the idle physical resource of the working node and the state of the current operation; step represents the time step of agent exploration, and epsilon is reduced to 0.75 of the original value along with the iteration of step every 2000 times, so that the random exploration probability is reduced to avoid the situation of unstable convergence when the agent explores enough epsilon, wherein epsilon refers to a track generated by one-time task scheduler exploration.

The physical resource that the working node is free is represented as:

the state of the currently running job is represented as:

{id,cpu,mem,executor}

as shown in fig. 3, task scheduling of a job by a work node includes:

generating an action according to the epsilon-Greedy and Boltzmann combined strategy, and scheduling resources to the working nodes according to the action; wherein, the action can be understood as a resource scheduling command;

if the task is partially distributed or the task is inefficiently distributed, feeding back a large negative reward to the task scheduler;

The formula of the energy consumption-time model is:

indicates the cpu utilization of the ith node,

representing the weight to the target.

For example, for some IO intensive task:

EA_total＝112+9.17U_cpu-19.46U_mem

for some cpu intensive tasks:

EA_total＝103+1.97U_cpu+2.53U_mem

the cpu utilization rate is calculated by the following formula:

wherein the content of the first and second substances,

indicates the cpu usage on the ith node,

The calculation formula of the memory utilization rate is as follows:

wherein the content of the first and second substances,

the memory utilization rate of the ith node is shown,

indicating the amount of memory usage on the ith node,

Calculating a weight coefficient according to the obtained related configuration parameters by adopting a least square method and a stepwise regression method; the specific process is as follows:

using a multiple linear regression model:

wherein, C₀、C₁、C₂Is a regression coefficient, i.e. a weight coefficient that needs to be determined; ξ is an unobservable random error; supposing that n groups of observed values of energy consumption are provided

Then there are:

the stepwise regression method can solve the problem that the optimal solution of the least square method is difficult to find, and the idea is to introduce variables one by one, wherein the introduced condition is that the partial F test of the variables is obvious. Meanwhile, after each variable is introduced, the existing variable is checked. The invention adopts a least square method plus stepwise regression mode, system utilization rate data is substituted into a multiple linear regression model, and the target function is as follows:

estimating the regression coefficient by using a least square method to minimize the target function, thereby obtaining a specific regression equation and further obtaining a weight coefficient C₀、C₁、C₂The value of (c).

The reward model is as follows:

wherein EA_totalRepresenting the energy consumption, EA, generated by a complete dispatch cluster_maxIndicating that all the working nodes in the cluster are the energy consumption, EA, generated by running the job under the full load condition_normalizedRepresenting normalized energy consumption, EA_epiRepresenting the track generated by the exploration of the task scheduler for one time, namely the part of the reward value related to energy consumption in the epamode for one time;

representing the run time of job j, M representing the number of jobs,

represents the minimum average completion time for all jobs,

represents the normalized average job completion time,

the weight to the object is represented by,

represents the portion of the reward value that is related to the average completion time of the job in an epsilon, R_fixedIndicating a fixed prize value, R_epiA true prize value indicating successful assignment of the task; due to R_epiE (0,1), so that agent can have one after selecting the correct schedule aA better quantized forward value of R_fixedDesigned as a large number, e.g. R_fixed10000 was taken.

The value formula for generating the reward is as follows:

for one calculation of true reward value R_epiAnother preferred embodiment of (a) is: true prize value R_epiThe energy efficiency model is obtained by calculation, and the expression of the energy efficiency model is as follows:

wherein, C₁，C₂Are all indicative of the weight coefficient,

represents the average utilization of cpu, T represents the average running time of job,

indicating the average utilization of the memory.

Feeding back the calculated reward value to the agent; updating the DQN parameters according to the reward values to obtain updated DQN parameters; fitting the value of the Q function by the agent according to the DQN parameter; and continuously iterating the process until the current energy consumption-time target is converged or exceeds the set total times, obtaining the trained Q network according to the latest DQN parameter, storing the current network parameter by the DQN, finishing iteration, and selecting a correct action by the agent to execute task scheduling according to the value of the Q function.

In the invention, the problem of resource priority allocation caused by different energy consumption due to cluster isomerism is considered, an energy consumption-time target in the process of allocating resources to a system in a heterogeneous spark cluster environment is calculated based on deep reinforcement learning, the lowest energy consumption-time target is searched under the condition of ensuring that the response time of a user is met, and resource scheduling is carried out according to the lowest energy consumption-time target.

It should be noted that each functional module in each embodiment of the present disclosure may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode or a software functional module mode; the integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be substantially or partially embodied in the form of a software product, or all or part of the technical solution that contributes to the prior art.

The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters is characterized by comprising the following steps: acquiring online data information in real time, inputting the data information into a trained Q network, performing energy consumption-time target prediction on the data information by the Q network, and selecting a scheme with the lowest energy consumption-time target by a system for resource allocation according to the energy consumption-time target prediction;

the training process for the Q network is as follows:

2. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein relevant configuration parameters of job running include: the number of actuators, cpu resources and memory resources; the execution parameters of the job run include: job arrival time, job identification, job completion time, and job duration.

3. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein the epsilon-Greedy and Boltzmann combined strategy is:

4. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein task scheduling for the working nodes comprises:

generating an action according to the epsilon-Greedy and Boltzmann combined strategy, and scheduling resources to the working nodes according to the action;

5. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein the formula of the energy consumption-time model is as follows:

indicates the cpu utilization of the ith node,

representing the weight to the target.

6. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 5, wherein a cpu utilization calculation formula is as follows:

wherein the content of the first and second substances,

indicates the cpu usage on the ith node,

7. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 5, wherein the calculation formula of the memory utilization rate is as follows:

wherein the content of the first and second substances,

the memory utilization rate of the ith node is shown,

indicating the amount of memory usage on the ith node,

8. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein the reward model is:

representing the run time of job j, M representing the number of jobs,

represents the minimum average completion time for all jobs,

represents the normalized average job completion time,

the weight to the object is represented by,

9. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein the formula for generating the reward value is as follows:

10. An energy-saving scheduling system based on deep reinforcement learning and heterogeneous Spark clusters, comprising: the system comprises a task scheduling module, an energy consumption calculation module, a reward generation module and a DQN parameter updating module;