CN114281528A - Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster - Google Patents
Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster Download PDFInfo
- Publication number
- CN114281528A CN114281528A CN202111505917.3A CN202111505917A CN114281528A CN 114281528 A CN114281528 A CN 114281528A CN 202111505917 A CN202111505917 A CN 202111505917A CN 114281528 A CN114281528 A CN 114281528A
- Authority
- CN
- China
- Prior art keywords
- energy consumption
- time
- task
- energy
- reward
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000002787 reinforcement Effects 0.000 title claims abstract description 24
- 238000005265 energy consumption Methods 0.000 claims abstract description 41
- 238000013468 resource allocation Methods 0.000 claims abstract description 5
- 230000009471 action Effects 0.000 claims description 32
- 238000004364 calculation method Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 8
- 239000000126 substance Substances 0.000 claims description 6
- 230000004044 response Effects 0.000 abstract description 6
- 238000012545 processing Methods 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004886 process control Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to the field of reinforcement learning and big data processing, and particularly relates to an energy-saving scheduling method and system based on deep reinforcement learning and a heterogeneous Spark cluster; the method comprises the following steps: acquiring online data information under real load on a Spark cluster in real time, inputting the data information into a trained Q network, carrying out energy consumption-time target prediction on the data information by the Q network, and selecting a scheme with the lowest energy consumption-time target by a system for resource allocation according to the energy consumption-time target prediction; the method and the device consider the problem of resource priority allocation caused by different energy consumption due to cluster isomerism, find the lowest energy consumption-time target under the condition of ensuring that the response time of the user is met, perform resource scheduling according to the lowest energy consumption-time target, optimize the energy consumption target or multiple SLA targets, save energy and reduce emission as far as possible, have important significance for balancing the cloud service provider cost and the response time between users through the method, and have good economic benefit.
Description
Technical Field
The invention belongs to the field of reinforcement learning and big data processing, and particularly relates to an energy-saving scheduling method and system based on deep reinforcement learning and a heterogeneous Spark cluster.
Background
The distributed big data processing framework Spark is widely applied to the analysis work of research and industry, stores the intermediate result in the memory to accelerate the processing speed, has higher expansibility than other frameworks, and is suitable for running various complex analysis tasks; and cloud computing provides cheaper and more manageable computing resources, many enterprises are turning to the cloud to deploy large data computing clusters, it is important for the enterprises to efficiently utilize computing clusters, waste of tens of millions of funds can be avoided on a large scale even with minor improvements in utilization, and the implementation of a good cluster scheduler is key to avoiding such waste.
Therefore, it is necessary to optimize the energy consumption generated in the spare job operation scheduling mechanism. After the Spark cluster is deployed, the task scheduler scheduling process of Spark can be simply abstracted in that the scheduler allocates a resource block executor for the job, where the resource block executor includes physical resources such as cpu, memory, and the like. The existing Spark defaults to adopt a simple heuristic algorithm FIFO and Fair to schedule, and an executive is created in a distributed mode, so that clusters are used in a balanced mode, and the universality of the use is considered, but the problem of resource priority allocation caused by different energy consumption due to cluster heterogeneity is not considered in the existing Spark. Therefore, the Spark default scheduling policy cannot be optimized for scheduling for a specific SLA objective.
In conclusion, a method capable of improving the resource utilization rate in the spark cluster environment, optimizing the energy consumption target or multiple SLA targets and saving energy and reducing emission as far as possible under the condition that the response time of the user is ensured to be met is found, and the method has important significance for balancing the cost of a Cloud Service Provider (CSP) and the response time between users.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an energy-saving scheduling method based on deep reinforcement learning and a heterogeneous Spark cluster, which comprises the following steps: acquiring online data information in real time, inputting the data information into a trained Q network, performing energy consumption-time target prediction on the data information by the Q network, and selecting a scheme with the lowest energy consumption-time target by a system for resource allocation according to the energy consumption-time target prediction;
the training process for the Q network is as follows:
s1: acquiring related configuration parameters and execution parameters of job operation; initializing DQN parameters; wherein DQN represents Deep Q-Network, namely Q Network;
s2: calculating a weight coefficient according to the acquired related configuration parameters;
s3: generating an epsilon-Greedy and Boltzmann combined strategy according to the DQN parameter;
s4: the task scheduler performs task scheduling on the working nodes according to the epsilon-Greedy and Boltzmann combined strategy;
s5: constructing an energy consumption-time model according to the weight coefficient and the execution parameter; constructing a reward model according to the task scheduling and the energy consumption-time model, and generating a reward value according to the reward model;
s6: updating the DQN parameters according to the reward values to obtain updated DQN parameters;
s7: steps S3-S6 are repeated, and when the energy consumption-time target converges, the training is completed.
Preferably, the relevant configuration parameters of the job operation include: the number of actuators, cpu resources and memory resources; the execution parameters of the job run include: job arrival time, job identification, job completion time, and job duration.
Preferably, the epsilon-Greedy and Boltzmann combined strategy is:
wherein s represents the resource state of the cluster; a is an action, which represents selecting a specific physical machine to create an executor and allocating resources; a' represents action of the maximum Q value; q (s, a) represents a jackpot value that would be available if state s and action were a; a represents a random action; ε represents the probability and step represents the time step explored by the task scheduler.
Preferably, the task scheduling for the work node includes:
generating an action according to an epsilon-Greedy and Boltzmann combined strategy, and sending the action to a working node according to a scheduling resource;
if the task is partially distributed or the task is inefficiently distributed, feeding back a large amount of negative rewards calculated under the energy consumption model to the task scheduler;
if the task is successfully distributed, the positive reward calculated under the energy consumption model is fed back to the task scheduler.
Preferably, the formula of the energy consumption-time model is as follows:
wherein, C0、C1、C2The weight coefficients are respectively represented by the weight coefficients,indicates the cpu utilization of the ith node,the memory utilization rate of the ith node is represented, t represents the working time of the working node i, t' represents the working end time under the current cpu utilization rate and the memory utilization rate, and EAtotalRepresenting the energy consumption of the cluster generation, AvgTIndicating the average time that all jobs are running,representing the run time of job j, M representing the number of jobs, n representing the number of nodes, phi representing all jobs, target representing the energy consumption-time target value,representing the weight to the target.
Further, the cpu utilization is calculated as:
wherein the content of the first and second substances,indicates the cpu utilization of the ith node, n indicates the number of nodes, i indicates a specific node,indicates the cpu usage on the ith node,represents the total amount of cpus on the ith node, and t represents the running time in the case where the current cpu usage of the ith node.
Further, the calculation formula of the memory utilization rate is as follows:
wherein the content of the first and second substances,the memory utilization rate of the ith node is shown,indicating the amount of memory usage on the ith node,represents the total amount of memory on the ith node, and t represents the running time in the case of the current memory usage of the ith node.
Preferably, the reward model is:
wherein EAtotalRepresenting the energy consumption, EA, generated by a complete dispatch clustermaxIndicating that all the working nodes in the cluster are the energy consumption, EA, generated by running the job under the full load conditionnormalizedRepresenting normalized energy consumption, EAepiRepresenting the part of the reward value related to energy consumption in the track generated by the exploration of the task scheduler for one time, namely in the epamode for one time;representing the run time of job j, M representing the number of jobs,represents the minimum average completion time for all jobs,represents the normalized average job completion time,the weight to the object is represented by,represents the portion of the reward value that is related to the average completion time of the job in an epsilon, RfixedIndicating a fixed prize value, RepiA true prize value indicating successful assignment of the task.
Preferably, the formula for generating the prize value is:
wherein Reward represents the generated prize value, RepiA true prize value indicating successful assignment of the task.
An energy-saving scheduling system based on deep reinforcement learning and heterogeneous Spark clusters comprises: the system comprises a task scheduling module, an energy consumption calculation module, a reward generation module and a DQN parameter updating module;
the task scheduling module is used for exploring a cluster environment and scheduling the operation according to the DQN parameter;
the energy consumption calculation module is used for calculating the energy consumption of the system according to the operation scheduling result;
the reward generation module is used for calculating a reward value according to the system energy consumption;
the DQN parameter updating module is used for updating the network DQN parameters by using the reward value and feeding back the DQN parameters to the task scheduling module.
The invention has the beneficial effects that: in the invention, the problem of resource priority allocation caused by different energy consumption due to cluster isomerism is considered, an energy consumption-time target in the process of allocating resources to a system in a heterogeneous spark cluster environment is calculated based on deep reinforcement learning, the lowest energy consumption-time target is searched under the condition of ensuring that the response time of a user is met, and resource scheduling is carried out according to the lowest energy consumption-time target.
Drawings
FIG. 1 is a diagram of a deep reinforcement learning-based system model according to the present invention;
FIG. 2 is a diagram of a resource architecture for Spark node scheduling;
FIG. 3 is a flowchart illustrating an energy-efficient Spark task scheduling based on deep reinforcement learning according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides an energy-saving scheduling method based on deep reinforcement learning and a heterogeneous Spark cluster, as shown in fig. 1, the method comprises the following steps:
acquiring online data information under real load on a Spark cluster in real time, inputting the data information into a trained Q network, carrying out energy consumption-time target prediction on the data information by the Q network, and selecting a scheme with the lowest energy consumption-time target by a system for resource allocation according to the energy consumption-time target prediction;
the training process for the Q network is as follows:
s1: acquiring related configuration parameters and execution parameters of job operation; initializing DQN parameters; wherein DQN represents Deep Q-Network, namely Q Network;
s2: calculating a weight coefficient according to the acquired related configuration parameters;
s3: generating an epsilon-Greedy and Boltzmann combined strategy according to the DQN parameter;
s4: the task scheduler performs task scheduling on the working nodes according to the epsilon-Greedy and Boltzmann combined strategy;
s5: constructing an energy consumption-time model according to the weight coefficient and the execution parameter; constructing a reward model according to the task scheduling and the energy consumption-time model, and generating a reward value according to the reward model;
s6: updating the DQN parameters according to the reward values to obtain updated DQN parameters;
s7: steps S3-S6 are repeated, and when the energy consumption-time target converges, the training is completed.
The task energy efficiency scheduling environment comprises the following steps: the Q network training method comprises a task scheduling module, an energy consumption calculation module, a reward generation module and a DQN parameter updating module, wherein in a task energy efficiency scheduling environment, the specific process of Q network training comprises the following steps:
under a real Spark cluster environment, collecting relevant configuration parameters and execution parameters of operation by taking different application programs in a BigDataBench benchmark toolkit as execution loads; relevant configuration parameters for job execution include: the number of executors (executors), cpu resources (cpu) and memory resources (men); the execution parameters of the job run include: job arrival time (arrival _ time), job identification (job _ id), job completion time (finish), and job duration (duration).
As shown in fig. 2, a cluster manager (master) in the task scheduling module issues an allocation instruction for a cluster driver process, and the driver process controls a task scheduler (agent) to allocate resources to each node; the resource allocation case includes: CPU utilization, memory utilization, task type and average running time of application programs; initializing a system cluster environment, DQN parameters, and agent obtaining a state S by observing the cluster; agent explores the environment through an epsilon-Greedy strategy to generate an action, namely an executer executor is created on a work node for a task, and different delay reward values can be obtained when the task is executed completely or fails to be executed; in order to increase the exploration capacity of the agent, the actions are sampled according to probability distribution by adopting a Boltzmann action exploration strategy; firstly, judging that the number of working nodes is N, calculating the value of each action when the state is S, namely the Q value of each action, and limiting the Q value to be [ -100,100]Then calculate Pn=eQ(s,a)Andfinally according toPerforming non-uniform sampling output action on the N working nodes; the task scheduler performs task scheduling arrangement on the working nodes according to an epsilon-Greedy and Boltzmann combined strategy, wherein the epsilon-Greedy and Boltzmann combined strategy is as follows:
wherein s represents the resource state of the cluster; a is an action, which indicates that a specific physical machine is selected to create an executor and allocate resources; a' represents action of the maximum Q value; q (s, a) represents a jackpot value that may be obtained with state s and action a, and is a desire; a represents a random action; epsilon represents that a random action is selected by the probability epsilon, and the action corresponding to the maximum Q value is selected by the probability 1-epsilon; epsilon is a value between 0 and 1, epsilon is set to be relatively larger at the beginning, action a generated by selecting the maximum value in the Q function with the probability of 1-epsilon exists, and action a is randomly selected in the action space A with the probability of epsilon and enters S' in a certain state space S; the Q function is actually a neural network, the expression means that a certain action a is taken in a certain state S, an expected value of accumulated return is possibly obtained, and the state space S is the idle physical resource of the working node and the state of the current operation; step represents the time step of agent exploration, and epsilon is reduced to 0.75 of the original value along with the iteration of step every 2000 times, so that the random exploration probability is reduced to avoid the situation of unstable convergence when the agent explores enough epsilon, wherein epsilon refers to a track generated by one-time task scheduler exploration.
The physical resource that the working node is free is represented as:
the state of the currently running job is represented as:
{id,cpu,mem,executor}
as shown in fig. 3, task scheduling of a job by a work node includes:
generating an action according to the epsilon-Greedy and Boltzmann combined strategy, and scheduling resources to the working nodes according to the action; wherein, the action can be understood as a resource scheduling command;
if the task is partially distributed or the task is inefficiently distributed, feeding back a large negative reward to the task scheduler;
if the task is successfully distributed, the positive reward calculated under the energy consumption model is fed back to the task scheduler.
The formula of the energy consumption-time model is:
wherein, C0、C1、C2The weight coefficients are respectively represented by the weight coefficients,indicates the cpu utilization of the ith node,the memory utilization rate of the ith node is represented, t represents the working time of the working node i, t' represents the working end time under the current cpu utilization rate and the memory utilization rate, and EAtotalRepresenting the energy consumption of the cluster generation, AvgTIndicating the average time that all jobs are running,representing the run time of job j, M representing the number of jobs, n representing the number of nodes, phi representing all jobs, target representing the energy consumption-time target value,representing the weight to the target.
For example, for some IO intensive task:
EAtotal=112+9.17Ucpu-19.46Umem
for some cpu intensive tasks:
EAtotal=103+1.97Ucpu+2.53Umem
the cpu utilization rate is calculated by the following formula:
wherein the content of the first and second substances,indicates the cpu utilization of the ith node, n indicates the number of nodes, i indicates a specific node,indicates the cpu usage on the ith node,represents the total amount of cpus on the ith node, and t represents the running time in the case where the current cpu usage of the ith node.
The calculation formula of the memory utilization rate is as follows:
wherein the content of the first and second substances,the memory utilization rate of the ith node is shown,indicating the amount of memory usage on the ith node,represents the total amount of memory on the ith node, and t represents the running time in the case of the current memory usage of the ith node.
Calculating a weight coefficient according to the obtained related configuration parameters by adopting a least square method and a stepwise regression method; the specific process is as follows:
using a multiple linear regression model:
wherein, C0、C1、C2Is a regression coefficient, i.e. a weight coefficient that needs to be determined; ξ is an unobservable random error; supposing that n groups of observed values of energy consumption are providedThen there are:
the stepwise regression method can solve the problem that the optimal solution of the least square method is difficult to find, and the idea is to introduce variables one by one, wherein the introduced condition is that the partial F test of the variables is obvious. Meanwhile, after each variable is introduced, the existing variable is checked. The invention adopts a least square method plus stepwise regression mode, system utilization rate data is substituted into a multiple linear regression model, and the target function is as follows:
estimating the regression coefficient by using a least square method to minimize the target function, thereby obtaining a specific regression equation and further obtaining a weight coefficient C0、C1、C2The value of (c).
The reward model is as follows:
wherein EAtotalRepresenting the energy consumption, EA, generated by a complete dispatch clustermaxIndicating that all the working nodes in the cluster are the energy consumption, EA, generated by running the job under the full load conditionnormalizedRepresenting normalized energy consumption, EAepiRepresenting the track generated by the exploration of the task scheduler for one time, namely the part of the reward value related to energy consumption in the epamode for one time;representing the run time of job j, M representing the number of jobs,represents the minimum average completion time for all jobs,represents the normalized average job completion time,the weight to the object is represented by,represents the portion of the reward value that is related to the average completion time of the job in an epsilon, RfixedIndicating a fixed prize value, RepiA true prize value indicating successful assignment of the task; due to RepiE (0,1), so that agent can have one after selecting the correct schedule aA better quantized forward value of RfixedDesigned as a large number, e.g. Rfixed10000 was taken.
The value formula for generating the reward is as follows:
for one calculation of true reward value RepiAnother preferred embodiment of (a) is: true prize value RepiThe energy efficiency model is obtained by calculation, and the expression of the energy efficiency model is as follows:
wherein, C1,C2Are all indicative of the weight coefficient,represents the average utilization of cpu, T represents the average running time of job,indicating the average utilization of the memory.
Feeding back the calculated reward value to the agent; updating the DQN parameters according to the reward values to obtain updated DQN parameters; fitting the value of the Q function by the agent according to the DQN parameter; and continuously iterating the process until the current energy consumption-time target is converged or exceeds the set total times, obtaining the trained Q network according to the latest DQN parameter, storing the current network parameter by the DQN, finishing iteration, and selecting a correct action by the agent to execute task scheduling according to the value of the Q function.
An energy-saving scheduling system based on deep reinforcement learning and heterogeneous Spark clusters comprises: the system comprises a task scheduling module, an energy consumption calculation module, a reward generation module and a DQN parameter updating module;
the task scheduling module is used for exploring a cluster environment and scheduling the operation according to the DQN parameter;
the energy consumption calculation module is used for calculating the energy consumption of the system according to the operation scheduling result;
the reward generation module is used for calculating a reward value according to the system energy consumption;
the DQN parameter updating module is used for updating the network DQN parameters by using the reward value and feeding back the DQN parameters to the task scheduling module.
In the invention, the problem of resource priority allocation caused by different energy consumption due to cluster isomerism is considered, an energy consumption-time target in the process of allocating resources to a system in a heterogeneous spark cluster environment is calculated based on deep reinforcement learning, the lowest energy consumption-time target is searched under the condition of ensuring that the response time of a user is met, and resource scheduling is carried out according to the lowest energy consumption-time target.
It should be noted that each functional module in each embodiment of the present disclosure may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode or a software functional module mode; the integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be substantially or partially embodied in the form of a software product, or all or part of the technical solution that contributes to the prior art.
The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. An energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters is characterized by comprising the following steps: acquiring online data information in real time, inputting the data information into a trained Q network, performing energy consumption-time target prediction on the data information by the Q network, and selecting a scheme with the lowest energy consumption-time target by a system for resource allocation according to the energy consumption-time target prediction;
the training process for the Q network is as follows:
s1: acquiring related configuration parameters and execution parameters of job operation; initializing DQN parameters; wherein DQN represents Deep Q-Network, namely Q Network;
s2: calculating a weight coefficient according to the acquired related configuration parameters;
s3: generating an epsilon-Greedy and Boltzmann combined strategy according to the DQN parameter;
s4: the task scheduler performs task scheduling on the working nodes according to the epsilon-Greedy and Boltzmann combined strategy;
s5: constructing an energy consumption-time model according to the weight coefficient and the execution parameter; constructing a reward model according to the task scheduling and the energy consumption-time model, and generating a reward value according to the reward model;
s6: updating the DQN parameters according to the reward values to obtain updated DQN parameters;
s7: steps S3-S6 are repeated, and when the energy consumption-time target converges, the training is completed.
2. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein relevant configuration parameters of job running include: the number of actuators, cpu resources and memory resources; the execution parameters of the job run include: job arrival time, job identification, job completion time, and job duration.
3. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein the epsilon-Greedy and Boltzmann combined strategy is:
wherein s represents the resource state of the cluster; a is an action, which represents selecting a specific physical machine to create an executor and allocating resources; a' represents action of the maximum Q value; q (s, a) represents a jackpot value that would be available if state s and action were a; a represents a random action; ε represents the probability and step represents the time step explored by the task scheduler.
4. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein task scheduling for the working nodes comprises:
generating an action according to the epsilon-Greedy and Boltzmann combined strategy, and scheduling resources to the working nodes according to the action;
if the task is partially distributed or the task is inefficiently distributed, feeding back a large amount of negative rewards calculated under the energy consumption model to the task scheduler;
if the task is successfully distributed, the positive reward calculated under the energy consumption model is fed back to the task scheduler.
5. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein the formula of the energy consumption-time model is as follows:
wherein, C0、C1、C2The weight coefficients are respectively represented by the weight coefficients,indicates the cpu utilization of the ith node,the memory utilization rate of the ith node is represented, t represents the working time of the working node i, t' represents the working end time under the current cpu utilization rate and the memory utilization rate, and EAtotalRepresenting the energy consumption of the cluster generation, AvgTIndicating the average time that all jobs are running,representing the run time of job j, M representing the number of jobs, n representing the number of nodes, phi representing all jobs, target representing the energy consumption-time target value,representing the weight to the target.
6. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 5, wherein a cpu utilization calculation formula is as follows:
wherein the content of the first and second substances,indicates the cpu utilization of the ith node, n indicates the number of nodes, i indicates a specific node,indicates the cpu usage on the ith node,represents the total amount of cpus on the ith node, and t represents the running time in the case where the current cpu usage of the ith node.
7. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 5, wherein the calculation formula of the memory utilization rate is as follows:
wherein the content of the first and second substances,the memory utilization rate of the ith node is shown,indicating the amount of memory usage on the ith node,represents the total amount of memory on the ith node, and t represents the running time in the case of the current memory usage of the ith node.
8. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein the reward model is:
wherein EAtotalRepresenting the energy consumption, EA, generated by a complete dispatch clustermaxIndicating that all the working nodes in the cluster are the energy consumption, EA, generated by running the job under the full load conditionnormalizedRepresenting normalized energy consumption, EAepiRepresenting the part of the reward value related to energy consumption in the track generated by the exploration of the task scheduler for one time, namely in the epamode for one time;representing the run time of job j, M representing the number of jobs,represents the minimum average completion time for all jobs,represents the normalized average job completion time,the weight to the object is represented by,represents the portion of the reward value that is related to the average completion time of the job in an epsilon, RfixedIndicating a fixed prize value, RepiA true prize value indicating successful assignment of the task.
9. The energy-saving scheduling method based on deep reinforcement learning and heterogeneous Spark clusters according to claim 1, wherein the formula for generating the reward value is as follows:
wherein Reward represents the generated prize value, RepiA true prize value indicating successful assignment of the task.
10. An energy-saving scheduling system based on deep reinforcement learning and heterogeneous Spark clusters, comprising: the system comprises a task scheduling module, an energy consumption calculation module, a reward generation module and a DQN parameter updating module;
the task scheduling module is used for exploring a cluster environment and scheduling the operation according to the DQN parameter;
the energy consumption calculation module is used for calculating the energy consumption of the system according to the operation scheduling result;
the reward generation module is used for calculating a reward value according to the system energy consumption;
the DQN parameter updating module is used for updating the network DQN parameters by using the reward value and feeding back the DQN parameters to the task scheduling module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111505917.3A CN114281528A (en) | 2021-12-10 | 2021-12-10 | Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111505917.3A CN114281528A (en) | 2021-12-10 | 2021-12-10 | Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114281528A true CN114281528A (en) | 2022-04-05 |
Family
ID=80871626
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111505917.3A Pending CN114281528A (en) | 2021-12-10 | 2021-12-10 | Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114281528A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114860435A (en) * | 2022-04-24 | 2022-08-05 | 浙江大学台州研究院 | Big data job scheduling method based on task selection process reinforcement learning |
CN115408163A (en) * | 2022-10-31 | 2022-11-29 | 广东电网有限责任公司佛山供电局 | Model inference scheduling method and system based on batch processing dynamic adjustment |
CN116578403A (en) * | 2023-07-10 | 2023-08-11 | 安徽思高智能科技有限公司 | RPA flow scheduling method and system based on deep reinforcement learning |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101630125B1 (en) * | 2015-04-27 | 2016-06-13 | 수원대학교산학협력단 | Method for resource provisioning in cloud computing resource management system |
US20180260700A1 (en) * | 2017-03-09 | 2018-09-13 | Alphaics Corporation | Method and system for implementing reinforcement learning agent using reinforcement learning processor |
CN109117255A (en) * | 2018-07-02 | 2019-01-01 | 武汉理工大学 | Heterogeneous polynuclear embedded system energy optimization dispatching method based on intensified learning |
CN110737529A (en) * | 2019-09-05 | 2020-01-31 | 北京理工大学 | cluster scheduling adaptive configuration method for short-time multiple variable-size data jobs |
CN111414252A (en) * | 2020-03-18 | 2020-07-14 | 重庆邮电大学 | Task unloading method based on deep reinforcement learning |
CN112035251A (en) * | 2020-07-14 | 2020-12-04 | 中科院计算所西部高等技术研究院 | Deep learning training system and method based on reinforcement learning operation layout |
CN112966431A (en) * | 2021-02-04 | 2021-06-15 | 西安交通大学 | Data center energy consumption joint optimization method, system, medium and equipment |
CN112965813A (en) * | 2021-02-10 | 2021-06-15 | 山东英信计算机技术有限公司 | AI platform resource regulation and control method, system and medium |
CN113094159A (en) * | 2021-03-22 | 2021-07-09 | 西安交通大学 | Data center job scheduling method, system, storage medium and computing equipment |
-
2021
- 2021-12-10 CN CN202111505917.3A patent/CN114281528A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101630125B1 (en) * | 2015-04-27 | 2016-06-13 | 수원대학교산학협력단 | Method for resource provisioning in cloud computing resource management system |
US20180260700A1 (en) * | 2017-03-09 | 2018-09-13 | Alphaics Corporation | Method and system for implementing reinforcement learning agent using reinforcement learning processor |
CN109117255A (en) * | 2018-07-02 | 2019-01-01 | 武汉理工大学 | Heterogeneous polynuclear embedded system energy optimization dispatching method based on intensified learning |
CN110737529A (en) * | 2019-09-05 | 2020-01-31 | 北京理工大学 | cluster scheduling adaptive configuration method for short-time multiple variable-size data jobs |
CN111414252A (en) * | 2020-03-18 | 2020-07-14 | 重庆邮电大学 | Task unloading method based on deep reinforcement learning |
CN112035251A (en) * | 2020-07-14 | 2020-12-04 | 中科院计算所西部高等技术研究院 | Deep learning training system and method based on reinforcement learning operation layout |
CN112966431A (en) * | 2021-02-04 | 2021-06-15 | 西安交通大学 | Data center energy consumption joint optimization method, system, medium and equipment |
CN112965813A (en) * | 2021-02-10 | 2021-06-15 | 山东英信计算机技术有限公司 | AI platform resource regulation and control method, system and medium |
CN113094159A (en) * | 2021-03-22 | 2021-07-09 | 西安交通大学 | Data center job scheduling method, system, storage medium and computing equipment |
Non-Patent Citations (3)
Title |
---|
JILING YAN: ""Dueling-DDQN Based Virtual Machine Placement Algorithm for Cloud Computing Systems"", 《2021 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA(ICCC)》, 8 November 2021 (2021-11-08), pages 294 - 299 * |
张可新: ""基于深度强化学习的交通配时优化技术研究"", 《中国优秀硕士学位论文全文数据库 工程科技II辑》, no. 2020, 15 March 2020 (2020-03-15), pages 034 - 728 * |
黎明程序员: ""强化学习原理源码解读002:DQN"", Retrieved from the Internet <URL:《https://www.cnblogs.com/itmorn/p/13754579.html》> * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114860435A (en) * | 2022-04-24 | 2022-08-05 | 浙江大学台州研究院 | Big data job scheduling method based on task selection process reinforcement learning |
CN114860435B (en) * | 2022-04-24 | 2024-04-05 | 浙江大学台州研究院 | Big data job scheduling method based on task selection process reinforcement learning |
CN115408163A (en) * | 2022-10-31 | 2022-11-29 | 广东电网有限责任公司佛山供电局 | Model inference scheduling method and system based on batch processing dynamic adjustment |
CN116578403A (en) * | 2023-07-10 | 2023-08-11 | 安徽思高智能科技有限公司 | RPA flow scheduling method and system based on deep reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114281528A (en) | Energy-saving scheduling method and system based on deep reinforcement learning and heterogeneous Spark cluster | |
CN110737529B (en) | Short-time multi-variable-size data job cluster scheduling adaptive configuration method | |
CN110489223B (en) | Task scheduling method and device in heterogeneous cluster and electronic equipment | |
CN107704069B (en) | Spark energy-saving scheduling method based on energy consumption perception | |
Rodrigues et al. | Helping HPC users specify job memory requirements via machine learning | |
CN109617939B (en) | WebIDE cloud server resource allocation method based on task pre-scheduling | |
Kamthe et al. | A stochastic approach to estimating earliest start times of nodes for scheduling DAGs on heterogeneous distributed computing systems | |
Muhuri et al. | On arrival scheduling of real-time precedence constrained tasks on multi-processor systems using genetic algorithm | |
CN110086855A (en) | Spark task Intellisense dispatching method based on ant group algorithm | |
CN115168027A (en) | Calculation power resource measurement method based on deep reinforcement learning | |
CN116932201A (en) | Multi-resource sharing scheduling method for deep learning training task | |
Babu et al. | Energy efficient scheduling algorithm for cloud computing systems based on prediction model | |
Ghazali et al. | A classification of Hadoop job schedulers based on performance optimization approaches | |
Davami et al. | Distributed scheduling method for multiple workflows with parallelism prediction and DAG prioritizing for time constrained cloud applications | |
Arif et al. | Infrastructure-aware tensorflow for heterogeneous datacenters | |
CN111782466A (en) | Big data task resource utilization detection method and device | |
CN115794405A (en) | Dynamic resource allocation method of big data processing framework based on SSA-XGboost algorithm | |
CN106874215B (en) | Serialized storage optimization method based on Spark operator | |
Al Maruf et al. | Optimizing DNNs Model Partitioning for Enhanced Performance on Edge Devices. | |
Ghose et al. | Orchestration of perception systems for reliable performance in heterogeneous platforms | |
CN111290855B (en) | GPU card management method, system and storage medium for multiple GPU servers in distributed environment | |
Moussa et al. | Comprehensive study on machine learning-based container scheduling in cloud | |
Fan et al. | An efficient scheduling algorithm for interdependent tasks in heterogeneous multi-core systems | |
Chhabra et al. | Qualitative Parametric Comparison of Load Balancing Algorithms in Distributed Computing Environment | |
Qasim et al. | Dynamic mapping of application workflows in heterogeneous computing environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |