CN114281528A - Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster - Google Patents

Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster Download PDF

Info

Publication number
CN114281528A
CN114281528A CN202111505917.3A CN202111505917A CN114281528A CN 114281528 A CN114281528 A CN 114281528A CN 202111505917 A CN202111505917 A CN 202111505917A CN 114281528 A CN114281528 A CN 114281528A
Authority
CN
China
Prior art keywords
energy consumption
time
task
energy
reward
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111505917.3A
Other languages
Chinese (zh)
Inventor
李鸿健
罗浩
段小林
邹洋
熊安萍
徐瑄航
马建勇
王田田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202111505917.3A priority Critical patent/CN114281528A/en
Publication of CN114281528A publication Critical patent/CN114281528A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the field of reinforcement learning and big data processing, and particularly relates to an energy-saving scheduling method and system based on deep reinforcement learning and a heterogeneous Spark cluster; the method comprises the following steps: acquiring online data information under real load on a Spark cluster in real time, inputting the data information into a trained Q network, carrying out energy consumption-time target prediction on the data information by the Q network, and selecting a scheme with the lowest energy consumption-time target by a system for resource allocation according to the energy consumption-time target prediction; the method and the device consider the problem of resource priority allocation caused by different energy consumption due to cluster isomerism, find the lowest energy consumption-time target under the condition of ensuring that the response time of the user is met, perform resource scheduling according to the lowest energy consumption-time target, optimize the energy consumption target or multiple SLA targets, save energy and reduce emission as far as possible, have important significance for balancing the cloud service provider cost and the response time between users through the method, and have good economic benefit.

Description

Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster
Technical Field
The invention belongs to the field of reinforcement learning and big data processing, and particularly relates to an energy-saving scheduling method and system based on deep reinforcement learning and a heterogeneous Spark cluster.
Background
The distributed big data processing framework Spark is widely applied to the analysis work of research and industry, stores the intermediate result in the memory to accelerate the processing speed, has higher expansibility than other frameworks, and is suitable for running various complex analysis tasks; and cloud computing provides cheaper and more manageable computing resources, many enterprises are turning to the cloud to deploy large data computing clusters, it is important for the enterprises to efficiently utilize computing clusters, waste of tens of millions of funds can be avoided on a large scale even with minor improvements in utilization, and the implementation of a good cluster scheduler is key to avoiding such waste.
Therefore, it is necessary to optimize the energy consumption generated in the spare job operation scheduling mechanism. After the Spark cluster is deployed, the task scheduler scheduling process of Spark can be simply abstracted in that the scheduler allocates a resource block executor for the job, where the resource block executor includes physical resources such as cpu, memory, and the like. The existing Spark defaults to adopt a simple heuristic algorithm FIFO and Fair to schedule, and an executive is created in a distributed mode, so that clusters are used in a balanced mode, and the universality of the use is considered, but the problem of resource priority allocation caused by different energy consumption due to cluster heterogeneity is not considered in the existing Spark. Therefore, the Spark default scheduling policy cannot be optimized for scheduling for a specific SLA objective.
In conclusion, a method capable of improving the resource utilization rate in the spark cluster environment, optimizing the energy consumption target or multiple SLA targets and saving energy and reducing emission as far as possible under the condition that the response time of the user is ensured to be met is found, and the method has important significance for balancing the cost of a Cloud Service Provider (CSP) and the response time between users.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an energy-saving scheduling method based on deep reinforcement learning and a heterogeneous Spark cluster, which comprises the following steps: acquiring online data information in real time, inputting the data information into a trained Q network, performing energy consumption-time target prediction on the data information by the Q network, and selecting a scheme with the lowest energy consumption-time target by a system for resource allocation according to the energy consumption-time target prediction;
the training process for the Q network is as follows:
s1: acquiring related configuration parameters and execution parameters of job operation; initializing DQN parameters; wherein DQN represents Deep Q-Network, namely Q Network;
s2: calculating a weight coefficient according to the acquired related configuration parameters;
s3: generating an epsilon-Greedy and Boltzmann combined strategy according to the DQN parameter;
s4: the task scheduler performs task scheduling on the working nodes according to the epsilon-Greedy and Boltzmann combined strategy;
s5: constructing an energy consumption-time model according to the weight coefficient and the execution parameter; constructing a reward model according to the task scheduling and the energy consumption-time model, and generating a reward value according to the reward model;
s6: updating the DQN parameters according to the reward values to obtain updated DQN parameters;
s7: steps S3-S6 are repeated, and when the energy consumption-time target converges, the training is completed.
Preferably, the relevant configuration parameters of the job operation include: the number of actuators, cpu resources and memory resources; the execution parameters of the job run include: job arrival time, job identification, job completion time, and job duration.
Preferably, the epsilon-Greedy and Boltzmann combined strategy is:
Figure BDA0003403170920000021
Figure BDA0003403170920000022
wherein s represents the resource state of the cluster; a is an action, which represents selecting a specific physical machine to create an executor and allocating resources; a' represents action of the maximum Q value; q (s, a) represents a jackpot value that would be available if state s and action were a; a represents a random action; ε represents the probability and step represents the time step explored by the task scheduler.
Preferably, the task scheduling for the work node includes:
generating an action according to an epsilon-Greedy and Boltzmann combined strategy, and sending the action to a working node according to a scheduling resource;
if the task is partially distributed or the task is inefficiently distributed, feeding back a large amount of negative rewards calculated under the energy consumption model to the task scheduler;
if the task is successfully distributed, the positive reward calculated under the energy consumption model is fed back to the task scheduler.
Preferably, the formula of the energy consumption-time model is as follows:
Figure BDA0003403170920000031
Figure BDA0003403170920000032
Figure BDA0003403170920000033
wherein, C0、C1、C2The weight coefficients are respectively represented by the weight coefficients,
Figure BDA0003403170920000034
indicates the cpu utilization of the ith node,
Figure BDA0003403170920000035
the memory utilization rate of the ith node is represented, t represents the working time of the working node i, t' represents the working end time under the current cpu utilization rate and the memory utilization rate, and EAtotalRepresenting the energy consumption of the cluster generation, AvgTIndicating the average time that all jobs are running,
Figure BDA0003403170920000036
representing the run time of job j, M representing the number of jobs, n representing the number of nodes, phi representing all jobs, target representing the energy consumption-time target value,
Figure BDA0003403170920000037
representing the weight to the target.
Further, the cpu utilization is calculated as:
Figure BDA0003403170920000038
wherein the content of the first and second substances,
Figure BDA0003403170920000039
indicates the cpu utilization of the ith node, n indicates the number of nodes, i indicates a specific node,
Figure BDA00034031709200000310
indicates the cpu usage on the ith node,
Figure BDA00034031709200000311
represents the total amount of cpus on the ith node, and t represents the running time in the case where the current cpu usage of the ith node.
Further, the calculation formula of the memory utilization rate is as follows:
Figure BDA0003403170920000041
wherein the content of the first and second substances,
Figure BDA0003403170920000042
the memory utilization rate of the ith node is shown,
Figure BDA0003403170920000043
indicating the amount of memory usage on the ith node,
Figure BDA0003403170920000044
represents the total amount of memory on the ith node, and t represents the running time in the case of the current memory usage of the ith node.
Preferably, the reward model is:
Figure BDA0003403170920000045
Figure BDA0003403170920000046
Figure BDA0003403170920000047
Figure BDA0003403170920000048
Figure BDA0003403170920000049
Figure BDA00034031709200000410
wherein EAtotalRepresenting the energy consumption, EA, generated by a complete dispatch clustermaxIndicating that all the working nodes in the cluster are the energy consumption, EA, generated by running the job under the full load conditionnormalizedRepresenting normalized energy consumption, EAepiRepresenting the part of the reward value related to energy consumption in the track generated by the exploration of the task scheduler for one time, namely in the epamode for one time;
Figure BDA00034031709200000411
representing the run time of job j, M representing the number of jobs,
Figure BDA00034031709200000412
represents the minimum average completion time for all jobs,
Figure BDA00034031709200000413
represents the normalized average job completion time,
Figure BDA00034031709200000414
the weight to the object is represented by,
Figure BDA00034031709200000415
represents the portion of the reward value that is related to the average completion time of the job in an epsilon, RfixedIndicating a fixed prize value, RepiA true prize value indicating successful assignment of the task.
Preferably, the formula for generating the prize value is:
Figure BDA00034031709200000416
wherein Reward represents the generated prize value, RepiA true prize value indicating successful assignment of the task.
An energy-saving scheduling system based on deep reinforcement learning and heterogeneous Spark clusters comprises: the system comprises a task scheduling module, an energy consumption calculation module, a reward generation module and a DQN parameter updating module;
the task scheduling module is used for exploring a cluster environment and scheduling the operation according to the DQN parameter;
the energy consumption calculation module is used for calculating the energy consumption of the system according to the operation scheduling result;
the reward generation module is used for calculating a reward value according to the system energy consumption;
the DQN parameter updating module is used for updating the network DQN parameters by using the reward value and feeding back the DQN parameters to the task scheduling module.
The invention has the beneficial effects that: in the invention, the problem of resource priority allocation caused by different energy consumption due to cluster isomerism is considered, an energy consumption-time target in the process of allocating resources to a system in a heterogeneous spark cluster environment is calculated based on deep reinforcement learning, the lowest energy consumption-time target is searched under the condition of ensuring that the response time of a user is met, and resource scheduling is carried out according to the lowest energy consumption-time target.
Drawings
FIG. 1 is a diagram of a deep reinforcement learning-based system model according to the present invention;
FIG. 2 is a diagram of a resource architecture for Spark node scheduling;
FIG. 3 is a flowchart illustrating an energy-efficient Spark task scheduling based on deep reinforcement learning according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides an energy-saving scheduling method based on deep reinforcement learning and a heterogeneous Spark cluster, as shown in fig. 1, the method comprises the following steps:
acquiring online data information under real load on a Spark cluster in real time, inputting the data information into a trained Q network, carrying out energy consumption-time target prediction on the data information by the Q network, and selecting a scheme with the lowest energy consumption-time target by a system for resource allocation according to the energy consumption-time target prediction;
the training process for the Q network is as follows:
s1: acquiring related configuration parameters and execution parameters of job operation; initializing DQN parameters; wherein DQN represents Deep Q-Network, namely Q Network;
s2: calculating a weight coefficient according to the acquired related configuration parameters;
s3: generating an epsilon-Greedy and Boltzmann combined strategy according to the DQN parameter;
s4: the task scheduler performs task scheduling on the working nodes according to the epsilon-Greedy and Boltzmann combined strategy;
s5: constructing an energy consumption-time model according to the weight coefficient and the execution parameter; constructing a reward model according to the task scheduling and the energy consumption-time model, and generating a reward value according to the reward model;
s6: updating the DQN parameters according to the reward values to obtain updated DQN parameters;
s7: steps S3-S6 are repeated, and when the energy consumption-time target converges, the training is completed.
The task energy efficiency scheduling environment comprises the following steps: the Q network training method comprises a task scheduling module, an energy consumption calculation module, a reward generation module and a DQN parameter updating module, wherein in a task energy efficiency scheduling environment, the specific process of Q network training comprises the following steps:
under a real Spark cluster environment, collecting relevant configuration parameters and execution parameters of operation by taking different application programs in a BigDataBench benchmark toolkit as execution loads; relevant configuration parameters for job execution include: the number of executors (executors), cpu resources (cpu) and memory resources (men); the execution parameters of the job run include: job arrival time (arrival _ time), job identification (job _ id), job completion time (finish), and job duration (duration).
As shown in fig. 2, a cluster manager (master) in the task scheduling module issues an allocation instruction for a cluster driver process, and the driver process controls a task scheduler (agent) to allocate resources to each node; the resource allocation case includes: CPU utilization, memory utilization, task type and average running time of application programs; initializing a system cluster environment, DQN parameters, and agent obtaining a state S by observing the cluster; agent explores the environment through an epsilon-Greedy strategy to generate an action, namely an executer executor is created on a work node for a task, and different delay reward values can be obtained when the task is executed completely or fails to be executed; in order to increase the exploration capacity of the agent, the actions are sampled according to probability distribution by adopting a Boltzmann action exploration strategy; firstly, judging that the number of working nodes is N, calculating the value of each action when the state is S, namely the Q value of each action, and limiting the Q value to be [ -100,100]Then calculate Pn=eQ(s,a)And
Figure BDA0003403170920000071
finally according to
Figure BDA0003403170920000072
Performing non-uniform sampling output action on the N working nodes; the task scheduler performs task scheduling arrangement on the working nodes according to an epsilon-Greedy and Boltzmann combined strategy, wherein the epsilon-Greedy and Boltzmann combined strategy is as follows:
Figure BDA0003403170920000073
Figure BDA0003403170920000074
wherein s represents the resource state of the cluster; a is an action, which indicates that a specific physical machine is selected to create an executor and allocate resources; a' represents action of the maximum Q value; q (s, a) represents a jackpot value that may be obtained with state s and action a, and is a desire; a represents a random action; epsilon represents that a random action is selected by the probability epsilon, and the action corresponding to the maximum Q value is selected by the probability 1-epsilon; epsilon is a value between 0 and 1, epsilon is set to be relatively larger at the beginning, action a generated by selecting the maximum value in the Q function with the probability of 1-epsilon exists, and action a is randomly selected in the action space A with the probability of epsilon and enters S' in a certain state space S; the Q function is actually a neural network, the expression means that a certain action a is taken in a certain state S, an expected value of accumulated return is possibly obtained, and the state space S is the idle physical resource of the working node and the state of the current operation; step represents the time step of agent exploration, and epsilon is reduced to 0.75 of the original value along with the iteration of step every 2000 times, so that the random exploration probability is reduced to avoid the situation of unstable convergence when the agent explores enough epsilon, wherein epsilon refers to a track generated by one-time task scheduler exploration.
The physical resource that the working node is free is represented as:
Figure BDA0003403170920000075
the state of the currently running job is represented as:
{id,cpu,mem,executor}
as shown in fig. 3, task scheduling of a job by a work node includes:
generating an action according to the epsilon-Greedy and Boltzmann combined strategy, and scheduling resources to the working nodes according to the action; wherein, the action can be understood as a resource scheduling command;
if the task is partially distributed or the task is inefficiently distributed, feeding back a large negative reward to the task scheduler;
if the task is successfully distributed, the positive reward calculated under the energy consumption model is fed back to the task scheduler.
The formula of the energy consumption-time model is:
Figure BDA0003403170920000081
Figure BDA0003403170920000082
Figure BDA0003403170920000083
wherein, C0、C1、C2The weight coefficients are respectively represented by the weight coefficients,
Figure BDA0003403170920000084
indicates the cpu utilization of the ith node,
Figure BDA0003403170920000085
the memory utilization rate of the ith node is represented, t represents the working time of the working node i, t' represents the working end time under the current cpu utilization rate and the memory utilization rate, and EAtotalRepresenting the energy consumption of the cluster generation, AvgTIndicating the average time that all jobs are running,
Figure BDA0003403170920000086
representing the run time of job j, M representing the number of jobs, n representing the number of nodes, phi representing all jobs, target representing the energy consumption-time target value,
Figure BDA0003403170920000087
representing the weight to the target.
For example, for some IO intensive task:
EAtotal=112+9.17Ucpu-19.46Umem
for some cpu intensive tasks:
EAtotal=103+1.97Ucpu+2.53Umem
the cpu utilization rate is calculated by the following formula:
Figure BDA0003403170920000088
wherein the content of the first and second substances,
Figure BDA0003403170920000089
indicates the cpu utilization of the ith node, n indicates the number of nodes, i indicates a specific node,
Figure BDA0003403170920000091
indicates the cpu usage on the ith node,
Figure BDA0003403170920000092
represents the total amount of cpus on the ith node, and t represents the running time in the case where the current cpu usage of the ith node.
The calculation formula of the memory utilization rate is as follows:
Figure BDA0003403170920000093
wherein the content of the first and second substances,
Figure BDA0003403170920000094
the memory utilization rate of the ith node is shown,
Figure BDA0003403170920000095
indicating the amount of memory usage on the ith node,
Figure BDA0003403170920000096
represents the total amount of memory on the ith node, and t represents the running time in the case of the current memory usage of the ith node.
Calculating a weight coefficient according to the obtained related configuration parameters by adopting a least square method and a stepwise regression method; the specific process is as follows:
using a multiple linear regression model:
Figure BDA0003403170920000097
wherein, C0、C1、C2Is a regression coefficient, i.e. a weight coefficient that needs to be determined; ξ is an unobservable random error; supposing that n groups of observed values of energy consumption are provided
Figure BDA0003403170920000098
Then there are:
Figure BDA0003403170920000099
the stepwise regression method can solve the problem that the optimal solution of the least square method is difficult to find, and the idea is to introduce variables one by one, wherein the introduced condition is that the partial F test of the variables is obvious. Meanwhile, after each variable is introduced, the existing variable is checked. The invention adopts a least square method plus stepwise regression mode, system utilization rate data is substituted into a multiple linear regression model, and the target function is as follows:
Figure BDA00034031709200000910
estimating the regression coefficient by using a least square method to minimize the target function, thereby obtaining a specific regression equation and further obtaining a weight coefficient C0、C1、C2The value of (c).
The reward model is as follows:
Figure BDA0003403170920000101
Figure BDA0003403170920000102
Figure BDA0003403170920000103
Figure BDA0003403170920000104
Figure BDA0003403170920000105
Figure BDA0003403170920000106
wherein EAtotalRepresenting the energy consumption, EA, generated by a complete dispatch clustermaxIndicating that all the working nodes in the cluster are the energy consumption, EA, generated by running the job under the full load conditionnormalizedRepresenting normalized energy consumption, EAepiRepresenting the track generated by the exploration of the task scheduler for one time, namely the part of the reward value related to energy consumption in the epamode for one time;
Figure BDA0003403170920000107
representing the run time of job j, M representing the number of jobs,
Figure BDA0003403170920000108
represents the minimum average completion time for all jobs,
Figure BDA0003403170920000109
represents the normalized average job completion time,
Figure BDA00034031709200001010
the weight to the object is represented by,
Figure BDA00034031709200001011
represents the portion of the reward value that is related to the average completion time of the job in an epsilon, RfixedIndicating a fixed prize value, RepiA true prize value indicating successful assignment of the task; due to RepiE (0,1), so that agent can have one after selecting the correct schedule aA better quantized forward value of RfixedDesigned as a large number, e.g. Rfixed10000 was taken.
The value formula for generating the reward is as follows:
Figure BDA00034031709200001012
for one calculation of true reward value RepiAnother preferred embodiment of (a) is: true prize value RepiThe energy efficiency model is obtained by calculation, and the expression of the energy efficiency model is as follows:
Figure BDA00034031709200001013
wherein, C1,C2Are all indicative of the weight coefficient,
Figure BDA00034031709200001014
represents the average utilization of cpu, T represents the average running time of job,
Figure BDA0003403170920000111
indicating the average utilization of the memory.
Feeding back the calculated reward value to the agent; updating the DQN parameters according to the reward values to obtain updated DQN parameters; fitting the value of the Q function by the agent according to the DQN parameter; and continuously iterating the process until the current energy consumption-time target is converged or exceeds the set total times, obtaining the trained Q network according to the latest DQN parameter, storing the current network parameter by the DQN, finishing iteration, and selecting a correct action by the agent to execute task scheduling according to the value of the Q function.
An energy-saving scheduling system based on deep reinforcement learning and heterogeneous Spark clusters comprises: the system comprises a task scheduling module, an energy consumption calculation module, a reward generation module and a DQN parameter updating module;
the task scheduling module is used for exploring a cluster environment and scheduling the operation according to the DQN parameter;
the energy consumption calculation module is used for calculating the energy consumption of the system according to the operation scheduling result;
the reward generation module is used for calculating a reward value according to the system energy consumption;
the DQN parameter updating module is used for updating the network DQN parameters by using the reward value and feeding back the DQN parameters to the task scheduling module.
In the invention, the problem of resource priority allocation caused by different energy consumption due to cluster isomerism is considered, an energy consumption-time target in the process of allocating resources to a system in a heterogeneous spark cluster environment is calculated based on deep reinforcement learning, the lowest energy consumption-time target is searched under the condition of ensuring that the response time of a user is met, and resource scheduling is carried out according to the lowest energy consumption-time target.
It should be noted that each functional module in each embodiment of the present disclosure may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode or a software functional module mode; the integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be substantially or partially embodied in the form of a software product, or all or part of the technical solution that contributes to the prior art.
The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters is characterized by comprising the following steps: acquiring online data information in real time, inputting the data information into a trained Q network, performing energy consumption-time target prediction on the data information by the Q network, and selecting a scheme with the lowest energy consumption-time target by a system for resource allocation according to the energy consumption-time target prediction;
the training process for the Q network is as follows:
s1: acquiring related configuration parameters and execution parameters of job operation; initializing DQN parameters; wherein DQN represents Deep Q-Network, namely Q Network;
s2: calculating a weight coefficient according to the acquired related configuration parameters;
s3: generating an epsilon-Greedy and Boltzmann combined strategy according to the DQN parameter;
s4: the task scheduler performs task scheduling on the working nodes according to the epsilon-Greedy and Boltzmann combined strategy;
s5: constructing an energy consumption-time model according to the weight coefficient and the execution parameter; constructing a reward model according to the task scheduling and the energy consumption-time model, and generating a reward value according to the reward model;
s6: updating the DQN parameters according to the reward values to obtain updated DQN parameters;
s7: steps S3-S6 are repeated, and when the energy consumption-time target converges, the training is completed.
2. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein relevant configuration parameters of job running include: the number of actuators, cpu resources and memory resources; the execution parameters of the job run include: job arrival time, job identification, job completion time, and job duration.
3. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein the epsilon-Greedy and Boltzmann combined strategy is:
Figure FDA0003403170910000011
Figure FDA0003403170910000012
wherein s represents the resource state of the cluster; a is an action, which represents selecting a specific physical machine to create an executor and allocating resources; a' represents action of the maximum Q value; q (s, a) represents a jackpot value that would be available if state s and action were a; a represents a random action; ε represents the probability and step represents the time step explored by the task scheduler.
4. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein task scheduling for the working nodes comprises:
generating an action according to the epsilon-Greedy and Boltzmann combined strategy, and scheduling resources to the working nodes according to the action;
if the task is partially distributed or the task is inefficiently distributed, feeding back a large amount of negative rewards calculated under the energy consumption model to the task scheduler;
if the task is successfully distributed, the positive reward calculated under the energy consumption model is fed back to the task scheduler.
5. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein the formula of the energy consumption-time model is as follows:
Figure FDA0003403170910000021
Figure FDA0003403170910000022
Figure FDA0003403170910000023
wherein, C0、C1、C2The weight coefficients are respectively represented by the weight coefficients,
Figure FDA0003403170910000024
indicates the cpu utilization of the ith node,
Figure FDA0003403170910000025
the memory utilization rate of the ith node is represented, t represents the working time of the working node i, t' represents the working end time under the current cpu utilization rate and the memory utilization rate, and EAtotalRepresenting the energy consumption of the cluster generation, AvgTIndicating the average time that all jobs are running,
Figure FDA0003403170910000026
representing the run time of job j, M representing the number of jobs, n representing the number of nodes, phi representing all jobs, target representing the energy consumption-time target value,
Figure FDA0003403170910000027
representing the weight to the target.
6. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 5, wherein a cpu utilization calculation formula is as follows:
Figure FDA0003403170910000031
wherein the content of the first and second substances,
Figure FDA0003403170910000032
indicates the cpu utilization of the ith node, n indicates the number of nodes, i indicates a specific node,
Figure FDA0003403170910000033
indicates the cpu usage on the ith node,
Figure FDA0003403170910000034
represents the total amount of cpus on the ith node, and t represents the running time in the case where the current cpu usage of the ith node.
7. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 5, wherein the calculation formula of the memory utilization rate is as follows:
Figure FDA0003403170910000035
wherein the content of the first and second substances,
Figure FDA0003403170910000036
the memory utilization rate of the ith node is shown,
Figure FDA0003403170910000037
indicating the amount of memory usage on the ith node,
Figure FDA0003403170910000038
represents the total amount of memory on the ith node, and t represents the running time in the case of the current memory usage of the ith node.
8. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein the reward model is:
Figure FDA0003403170910000039
Figure FDA00034031709100000310
Figure FDA00034031709100000311
Figure FDA00034031709100000312
Figure FDA00034031709100000313
Figure FDA00034031709100000314
wherein EAtotalRepresenting the energy consumption, EA, generated by a complete dispatch clustermaxIndicating that all the working nodes in the cluster are the energy consumption, EA, generated by running the job under the full load conditionnormalizedRepresenting normalized energy consumption, EAepiRepresenting the part of the reward value related to energy consumption in the track generated by the exploration of the task scheduler for one time, namely in the epamode for one time;
Figure FDA00034031709100000315
representing the run time of job j, M representing the number of jobs,
Figure FDA00034031709100000316
represents the minimum average completion time for all jobs,
Figure FDA0003403170910000041
represents the normalized average job completion time,
Figure FDA0003403170910000042
the weight to the object is represented by,
Figure FDA0003403170910000043
represents the portion of the reward value that is related to the average completion time of the job in an epsilon, RfixedIndicating a fixed prize value, RepiA true prize value indicating successful assignment of the task.
9. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein the formula for generating the reward value is as follows:
Figure FDA0003403170910000044
wherein Reward represents the generated prize value, RepiA true prize value indicating successful assignment of the task.
10. An energy-saving scheduling system based on deep reinforcement learning and heterogeneous Spark clusters, comprising: the system comprises a task scheduling module, an energy consumption calculation module, a reward generation module and a DQN parameter updating module;
the task scheduling module is used for exploring a cluster environment and scheduling the operation according to the DQN parameter;
the energy consumption calculation module is used for calculating the energy consumption of the system according to the operation scheduling result;
the reward generation module is used for calculating a reward value according to the system energy consumption;
the DQN parameter updating module is used for updating the network DQN parameters by using the reward value and feeding back the DQN parameters to the task scheduling module.
CN202111505917.3A 2021-12-10 2021-12-10 Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster Pending CN114281528A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111505917.3A CN114281528A (en) 2021-12-10 2021-12-10 Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111505917.3A CN114281528A (en) 2021-12-10 2021-12-10 Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster

Publications (1)

Publication Number Publication Date
CN114281528A true CN114281528A (en) 2022-04-05

Family

ID=80871626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111505917.3A Pending CN114281528A (en) 2021-12-10 2021-12-10 Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster

Country Status (1)

Country Link
CN (1) CN114281528A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114860435A (en) * 2022-04-24 2022-08-05 浙江大学台州研究院 Big data job scheduling method based on task selection process reinforcement learning
CN115408163A (en) * 2022-10-31 2022-11-29 广东电网有限责任公司佛山供电局 Model inference scheduling method and system based on batch processing dynamic adjustment
CN116578403A (en) * 2023-07-10 2023-08-11 安徽思高智能科技有限公司 RPA flow scheduling method and system based on deep reinforcement learning

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101630125B1 (en) * 2015-04-27 2016-06-13 수원대학교산학협력단 Method for resource provisioning in cloud computing resource management system
US20180260700A1 (en) * 2017-03-09 2018-09-13 Alphaics Corporation Method and system for implementing reinforcement learning agent using reinforcement learning processor
CN109117255A (en) * 2018-07-02 2019-01-01 武汉理工大学 Heterogeneous polynuclear embedded system energy optimization dispatching method based on intensified learning
CN110737529A (en) * 2019-09-05 2020-01-31 北京理工大学 cluster scheduling adaptive configuration method for short-time multiple variable-size data jobs
CN111414252A (en) * 2020-03-18 2020-07-14 重庆邮电大学 Task unloading method based on deep reinforcement learning
CN112035251A (en) * 2020-07-14 2020-12-04 中科院计算所西部高等技术研究院 Deep learning training system and method based on reinforcement learning operation layout
CN112966431A (en) * 2021-02-04 2021-06-15 西安交通大学 Data center energy consumption joint optimization method, system, medium and equipment
CN112965813A (en) * 2021-02-10 2021-06-15 山东英信计算机技术有限公司 AI platform resource regulation and control method, system and medium
CN113094159A (en) * 2021-03-22 2021-07-09 西安交通大学 Data center job scheduling method, system, storage medium and computing equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101630125B1 (en) * 2015-04-27 2016-06-13 수원대학교산학협력단 Method for resource provisioning in cloud computing resource management system
US20180260700A1 (en) * 2017-03-09 2018-09-13 Alphaics Corporation Method and system for implementing reinforcement learning agent using reinforcement learning processor
CN109117255A (en) * 2018-07-02 2019-01-01 武汉理工大学 Heterogeneous polynuclear embedded system energy optimization dispatching method based on intensified learning
CN110737529A (en) * 2019-09-05 2020-01-31 北京理工大学 cluster scheduling adaptive configuration method for short-time multiple variable-size data jobs
CN111414252A (en) * 2020-03-18 2020-07-14 重庆邮电大学 Task unloading method based on deep reinforcement learning
CN112035251A (en) * 2020-07-14 2020-12-04 中科院计算所西部高等技术研究院 Deep learning training system and method based on reinforcement learning operation layout
CN112966431A (en) * 2021-02-04 2021-06-15 西安交通大学 Data center energy consumption joint optimization method, system, medium and equipment
CN112965813A (en) * 2021-02-10 2021-06-15 山东英信计算机技术有限公司 AI platform resource regulation and control method, system and medium
CN113094159A (en) * 2021-03-22 2021-07-09 西安交通大学 Data center job scheduling method, system, storage medium and computing equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JILING YAN: ""Dueling-DDQN Based Virtual Machine Placement Algorithm for Cloud Computing Systems"", 《2021 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA(ICCC)》, 8 November 2021 (2021-11-08), pages 294 - 299 *
张可新: ""基于深度强化学习的交通配时优化技术研究"", 《中国优秀硕士学位论文全文数据库 工程科技II辑》, no. 2020, 15 March 2020 (2020-03-15), pages 034 - 728 *
黎明程序员: ""强化学习原理源码解读002:DQN"", Retrieved from the Internet <URL:《https://www.cnblogs.com/itmorn/p/13754579.html》> *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114860435A (en) * 2022-04-24 2022-08-05 浙江大学台州研究院 Big data job scheduling method based on task selection process reinforcement learning
CN114860435B (en) * 2022-04-24 2024-04-05 浙江大学台州研究院 Big data job scheduling method based on task selection process reinforcement learning
CN115408163A (en) * 2022-10-31 2022-11-29 广东电网有限责任公司佛山供电局 Model inference scheduling method and system based on batch processing dynamic adjustment
CN116578403A (en) * 2023-07-10 2023-08-11 安徽思高智能科技有限公司 RPA flow scheduling method and system based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
CN114281528A (en) Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster
CN110737529B (en) Short-time multi-variable-size data job cluster scheduling adaptive configuration method
CN110489223B (en) Task scheduling method and device in heterogeneous cluster and electronic equipment
CN107704069B (en) Spark energy-saving scheduling method based on energy consumption perception
Rodrigues et al. Helping HPC users specify job memory requirements via machine learning
CN109617939B (en) WebIDE cloud server resource allocation method based on task pre-scheduling
Kamthe et al. A stochastic approach to estimating earliest start times of nodes for scheduling DAGs on heterogeneous distributed computing systems
Muhuri et al. On arrival scheduling of real-time precedence constrained tasks on multi-processor systems using genetic algorithm
CN110086855A (en) Spark task Intellisense dispatching method based on ant group algorithm
CN115168027A (en) Calculation power resource measurement method based on deep reinforcement learning
CN116932201A (en) Multi-resource sharing scheduling method for deep learning training task
Babu et al. Energy efficient scheduling algorithm for cloud computing systems based on prediction model
Ghazali et al. A classification of Hadoop job schedulers based on performance optimization approaches
Davami et al. Distributed scheduling method for multiple workflows with parallelism prediction and DAG prioritizing for time constrained cloud applications
Arif et al. Infrastructure-aware tensorflow for heterogeneous datacenters
CN111782466A (en) Big data task resource utilization detection method and device
CN115794405A (en) Dynamic resource allocation method of big data processing framework based on SSA-XGboost algorithm
CN106874215B (en) Serialized storage optimization method based on Spark operator
Al Maruf et al. Optimizing DNNs Model Partitioning for Enhanced Performance on Edge Devices.
Ghose et al. Orchestration of perception systems for reliable performance in heterogeneous platforms
CN111290855B (en) GPU card management method, system and storage medium for multiple GPU servers in distributed environment
Moussa et al. Comprehensive study on machine learning-based container scheduling in cloud
Fan et al. An efficient scheduling algorithm for interdependent tasks in heterogeneous multi-core systems
Chhabra et al. Qualitative Parametric Comparison of Load Balancing Algorithms in Distributed Computing Environment
Qasim et al. Dynamic mapping of application workflows in heterogeneous computing environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination