WO2023184939A1 - 基于深度强化学习的云数据中心自适应高效资源分配方法 - Google Patents

基于深度强化学习的云数据中心自适应高效资源分配方法 Download PDF

Info

Publication number
WO2023184939A1
WO2023184939A1 PCT/CN2022/126468 CN2022126468W WO2023184939A1 WO 2023184939 A1 WO2023184939 A1 WO 2023184939A1 CN 2022126468 W CN2022126468 W CN 2022126468W WO 2023184939 A1 WO2023184939 A1 WO 2023184939A1
Authority
WO
WIPO (PCT)
Prior art keywords
job
drl
resource allocation
jobs
cloud data
Prior art date
Application number
PCT/CN2022/126468
Other languages
English (en)
French (fr)
Inventor
陈哲毅
熊兵
陈礼贤
Original Assignee
福州大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 福州大学 filed Critical 福州大学
Publication of WO2023184939A1 publication Critical patent/WO2023184939A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the invention relates to an adaptive and efficient resource allocation method for cloud data centers based on deep reinforcement learning.
  • Cloud computing has quickly grown into one of the most popular computing models.
  • resource allocation refers to the process of allocating computing, storage, and network resources to meet the needs of users and cloud service providers.
  • problems have arisen in cloud resource allocation, such as unreasonable resource allocation and slow response to changes. These issues not only reduce service quality but also result in higher energy consumption and maintenance overhead. Therefore, designing an adaptive and efficient cloud data center resource allocation solution has become a top priority.
  • this is a very challenging task due to the dynamic nature of system states and the diversity of user requirements in cloud computing, as discussed below:
  • Reinforcement Learning is a resource allocation method with high adaptability and low complexity.
  • traditional RL-based methods have problems with high-dimensional state space when dealing with complex cloud environments.
  • deep reinforcement learning that uses deep neural networks to extract low-dimensional representations from high-dimensional state space is proposed.
  • value-based DRL learns deterministic policies by calculating the probability of each operation.
  • jobs may keep arriving, so the operating space may be quite large to continuously meet the requirements of scheduled jobs. Therefore, it is difficult for value-based DRL to quickly converge to the optimal policy.
  • policy-based DRL such as policy gradient PG (Policy Gradient, PG) learns random policies and can better handle the larger action space in cloud data centers by directly outputting actions with probability distributions, but High variance when estimating policy gradients may reduce training efficiency.
  • policy gradient PG Policy Gradient, PG
  • A2C AdvancedActor-Critic
  • the Actor selects behaviors based on the score evaluated by the Critic, the variance of the policy gradient is reduced, and it has a dominance function.
  • A2C uses a single-thread training method, which results in insufficient utilization of computing resources.
  • strong data correlation may occur when using A2C, because when only one DRL agent interacts with the environment, similar training samples will be generated, resulting in unsatisfactory training results.
  • A3C asynchronous advantage Actor-Critic
  • the purpose of the present invention is to provide an adaptive and efficient resource allocation method for cloud data centers based on deep reinforcement learning, which has higher service quality in terms of delay and job discard rate, and is more energy efficient.
  • the technical solution of the present invention is: an adaptive and efficient resource allocation method for cloud data centers based on deep reinforcement learning, and designing a unified resource allocation model.
  • the resource allocation model is based on job delay, dismissal rate and energy. Efficiency is the optimization goal; based on the resource allocation model, the state space, behavior space and reward function of cloud resource allocation are defined as Markov decision processes and used for cloud resource allocation based on DRL (Deep Reinforcement Learning, DRL) Method;
  • DRL Deep Reinforcement Learning, DRL
  • a resource allocation method based on Actor-Critic DRL is proposed to solve the optimal strategy problem of cloud data center job scheduling; in addition, the resource allocation method based on Actor-Critic DRL uses asynchronous updates of policy parameters among multiple DRL agents.
  • the DRL-based cloud resource allocation method is specifically implemented as follows:
  • the resource allocation system RAS (Resource Allocation System, RAS) generates a job scheduling policy based on the resource requests of different user jobs and the current status information of the cloud data center;
  • the resource allocation system RAS includes a DRL-based resource controller, job scheduler, Information collector, energy agent;
  • Step S2 The job scheduler allocates jobs from the job sequence to servers in the cloud data center according to the policy issued by the DRL-based resource controller;
  • Step S3 the information collector records the usage of different resources in the cloud data center and the current energy consumption measured by the energy agent, and the DRL-based resource controller generates the corresponding job scheduling policy.
  • the state space, action space and reward function in the DRL are defined as follows:
  • the state s t ⁇ S consists of the resource usage of all servers represented by time step t and the resource requests of all arriving jobs; on the one hand, u m,n is the usage of the nth resource type on the server virtual machine; on the other hand, Indicates the occupation requests of different types of resources by all arriving jobs, where o j,n is the occupation request of the nth resource type by the most recently arrived job j, represents the duration of all jobs arriving through time step t, where d j represents the duration of job j; therefore, the state of the cloud data center by time step t is defined as:
  • V ⁇ v 1 , v 2 ,..., v m ⁇ , the job sequence represented by J; when a job arrives or is completed, the state space changes, and the state space
  • the dimension depends on the server and arriving jobs, and is calculated by (mn+z(n+1)), where m, n, and z represent the number of servers, resource types, and arriving jobs, respectively;
  • Action space At time step t, the action taken by the job scheduler is to select and execute jobs from the job sequence according to the job scheduling policy issued by the DRL-based resource controller; the policy is generated based on the current resource allocation system status , the job scheduler assigns the job to the corresponding server for execution; once a job is scheduled to the corresponding server, the server will automatically allocate the corresponding resources according to the resource request of the job; therefore, the operation space A only indicates whether a job will be executed by the server Processing, defined as:
  • the reward function guides the DRL agent to learn better job scheduling strategies with higher discounts and long-term rewards, improving the system performance of cloud resource allocation; therefore, at time step t, the total reward R t is represented by the QoS reward, denoted by for and energy efficiency, denoted as It consists of two parts and is defined as
  • w 1 and w 2 are used to aggravate the punishment; due to is a negative value, jobs with a longer duration tend to wait for a shorter time; in addition, Reflects the penalty of energy consumption at time step t, defined as
  • w 3 is the weight of the penalty.
  • the resource allocation method based on Actor-Critic DRL adopts a DRL framework based on role comments, and uses asynchronous update A3C (Asynchronous Advantage Actor-Critic, A3C) to accelerate the training process; specifically,
  • A3C Asynchronous Advantage Actor-Critic, A3C
  • the resource allocation method based on Actor-Critic DRL combines value-based and policy-based DRL algorithms; on the one hand, value-based DRL utilizes a function approximator to determine the value function and uses ⁇ -greedy to balance exploration and development; on the other hand, Policy-based DRL parameterizes the job scheduling strategy and directly outputs actions as probability distributions during the learning process without storing their Q-values.
  • the critic network For state-action value functions Make an estimate and update the parameter w; in addition, the actor network According to the evaluation value of the critic network, the update of job scheduling policy parameters is guided; the corresponding policy gradient is defined as:
  • MDP models the smooth distribution of cloud resource allocation
  • R t is the immediate reward
  • Multiple DRL agents work simultaneously and update their respective job scheduling policy parameters asynchronously; specifically, a predetermined number of DRL agents are initialized using the same local parameters of the neural network, that is, the scheduling policy, and interact with the corresponding cloud data center environment; for Each DRL agent periodically accumulates gradients in the actor and critic networks, and uses gradient ascent through the RMSProp optimizer to asynchronously update parameters in the global network; next, each DRL agent extracts actors and critics from the global network The latest parameters of the network and replace local parameters with them; each DRL agent will continue to interact with the corresponding environment based on the updated local parameters and independently optimize the local parameters of the scheduling strategy; during the local training process, there is no connection between these DRL agents Coordination; the Actor-Critic DRL-based resource allocation method will continue training through an asynchronous update mechanism between multiple DRL agents until the results converge.
  • the present invention has the following beneficial effects:
  • the A3C-based resource allocation method proposed by the present invention effectively schedules jobs to improve the QoS and energy efficiency of the cloud data center.
  • Extensive simulation experiments were conducted using real tracking data from Google Cloud data centers to verify the effectiveness of this method in achieving adaptive and efficient resource allocation.
  • this method is superior to classic resource allocation methods such as LJF, Tetris, SJF, RR, PG and DQL in terms of QoS (average job delay and average energy consumption of jobs) and energy efficiency (average energy consumption of jobs).
  • the training effect of this method is better than the other two methods, and it has higher training efficiency (faster convergence speed) than the two advanced DRL-based methods (PG and DQL). Simulation results show that this method is of great significance for improving resource allocation in cloud data centers.
  • Figure 1 is a cloud data center resource allocation model of the present invention
  • Figure 2 is an example of MDP process modeling cloud resource allocation according to the present invention
  • Figure 3 is the framework of the cloud resource allocation method based on A3C according to the present invention.
  • Figure 4 shows the total returns of different resource allocation methods under different system loads under single-objective optimization of the present invention.
  • the present invention is an adaptive and efficient resource allocation method for cloud data centers based on deep reinforcement learning, and designs a unified resource allocation model.
  • the resource allocation model takes job delay, dismissal rate and energy efficiency as optimization goals; based on the resource allocation model
  • the state space, behavior space and reward function of cloud resource allocation are defined as Markov decision processes and used in the cloud resource allocation method based on DRL; a resource allocation method based on Actor-Critic DRL is proposed to solve the problem of cloud resource allocation.
  • the optimal strategy problem of data center job scheduling; in addition, the resource allocation method based on Actor-Critic DRL uses asynchronous updates of policy parameters among multiple DRL agents.
  • the present invention formulates the resource allocation problem of cloud data center as a model-free DRL problem, which has dynamic system status and various user requirements. Aiming at cloud data centers with heterogeneous resources, diverse user needs, high energy consumption and dynamic environments. In view of the advantages of the A3C algorithm, the present invention proposes a resource allocation scheme based on A3C.
  • a unified resource allocation model is designed for cloud data centers with dynamic system status and heterogeneous user needs.
  • the model takes job delay, dismissal rate and energy efficiency (average energy consumption of jobs) as optimization objectives.
  • the state space, behavior space and reward function of cloud resource allocation are defined as Markov decision process (MDP) and used in the DRL-based cloud resource allocation scheme.
  • MDP Markov decision process
  • a resource allocation method based on Actor-Critic DRL (A3C) is proposed, which effectively solves the optimal strategy problem of cloud data center job scheduling.
  • DNN is used to deal with high-dimensional state space problems in cloud data centers.
  • this method greatly improves training efficiency through asynchronous updating of policy parameters among multiple DRL agents.
  • FIG. 1 shows the cloud data center allocation model.
  • a DRL-based resource controller is embedded in the Resource Allocation System (RAS).
  • RAS Resource Allocation System
  • the job scheduler allocates jobs to servers from the job sequence according to the policy issued by the DRL-based resource controller.
  • these efforts are generalized into data processing jobs, such as training jobs for deep learning (DL) models for image processing and speech recognition.
  • DL deep learning
  • For different jobs they show different resource requests based on their purpose. Therefore, each job consists of a specific job duration (e.g., minutes, hours, or days) and requests for different types of resources (e.g., CPU and memory).
  • the information collector records the usage of different resources in the cloud data center and the current energy consumption (measured by the energy agent). Based on the above information, the DRL-based resource controller will generate the corresponding job scheduling policy.
  • the present invention proposes a cloud data center resource allocation method based on asynchronous advantage Actor-Critic (A3C), which adopts a DRL framework based on role comments and uses asynchronous update (A3C) to accelerate the training process.
  • A3C-based approach combines value-based and policy-based DRL algorithms.
  • value-based DRL utilizes a function approximator to determine the value ordinal and adopts ⁇ -greedy to balance exploration and exploitation; therefore, the DRL agent uses existing experience to select good job scheduling operations while exploring new operations.
  • policy-based DRL parameterizes the job scheduling policy and directly outputs actions as probability distributions during the learning process without storing their Q-values.
  • the present invention proposes a DRL-based resource allocation method, which includes the following steps:
  • Step S1 RAS generates a job scheduling policy based on the resource requests of different user jobs and the current status information of the cloud data center (such as the number of servers, resource usage, energy consumption, etc.).
  • Step S2 The job scheduler allocates jobs to servers from the job sequence according to the policy issued by the DRL-based resource controller.
  • Step S3 the information collector records the usage of different resources in the cloud data center and the current energy consumption (measured by the energy agent). Based on the above information, the DRL-based resource controller will generate the corresponding job scheduling policy.
  • the state space, action space and reward function in DRL are defined as follows:
  • the state s t ⁇ S consists of the resource usage of all servers represented by time step t and the resource requests of all arriving jobs.
  • U m,n is the usage of the nth resource type on the server virtual machine.
  • the state space changes.
  • the dimensions of the state space depend on the server and the situation of the arriving job, and are calculated by (mn+z(n+1)), where m, n and z represent the number of servers, respectively. Resource type and arriving job.
  • Action space At time step t, the action adopted by the job scheduler is to select and execute jobs from the job sequence according to the job scheduling policy issued by the DRL-based resource controller. Policies are generated based on the current system state, and the job scheduler assigns jobs to specific servers for execution. Once a job is scheduled to the appropriate server, the server will automatically allocate the corresponding resources based on the job's resource request. Therefore, the operation space only indicates whether a job will be processed by the server, defined as:
  • the state transition probability matrix is recorded as IP (s t+1
  • the value of the transition probability is obtained by running the DRL algorithm, which outputs the probability of taking different actions in a certain state.
  • the reward function guides the DRL agent (RAS) to learn better job scheduling strategies with higher discounts and long-term rewards, thereby improving the system performance of cloud resource allocation. Therefore, at time step t, the total reward R t is determined by the QoS reward (denoted as ) and energy efficiency (denoted as ) consists of two parts, defined as
  • w 1 and w 2 are used to increase the punishment. Because for negative values, jobs with longer durations tend to wait shorter times. This is sensible for cloud systems aiming at profit maximization, as the longer the job duration, the higher the profit. also, Reflects the penalty of energy consumption at time step t, defined as
  • J seq ⁇ j 1 , j 2 ,..., j q ⁇ , where q represents the number of jobs waiting in the job sequence, and q ⁇ p.
  • L normal is defined as the normalized average job delay, which normalizes the job delays of all successfully completed jobs and then takes their average, where L normal ⁇ 1 and d j is the duration of the job.
  • disRate is defined as the job dismissal rate, which is used to calculate the rate of jobs that are dismissed when the job sequence is full, where 0 ⁇ disRate ⁇ 1.
  • the total energy consumption of the cloud data center is E total .
  • P max is the maximum energy consumption when the server is fully utilized
  • k is the energy consumption used to calculate the proportion of idle servers
  • All servers use resources at pace t
  • t is the pace when the last job is completed.
  • E job is defined as the energy efficiency in the job scheduling process (measured by the average energy consumption of all successfully completed jobs).
  • the DRL agent first selects an action a t (scheduling job) under the current system state s t (resource usage and resource request) of the environment (cloud data center). Next, the DRL agent receives the reward R t (QoS and energy efficiency) and enters the next state s t+1 . This process is illustrated using MDP, as shown in Figure 2.
  • the present invention proposes a cloud data center resource allocation method based on asynchronous advantage actor-critic (A3C), which can achieve excellent QoS and energy efficiency in the cloud data center.
  • This method adopts a DRL framework based on role comments and adopts asynchronous updates (A3C) to accelerate the training process.
  • A3C-based approach combines value-based and policy-based DRL algorithms.
  • value-based DRL utilizes a function approximator to determine the value function and adopts ⁇ -greedy to balance exploration and exploitation; therefore, the DRL agent uses existing experience to select good job scheduling operations while exploring new operations.
  • policy-based DRL parameterizes the job scheduling policy and directly outputs actions as probability distributions during the learning process without storing their Q-values.
  • the actor network and critic network Perform weighting and bias initialization. Then, initialize the learning rates ⁇ a and ⁇ c of the actor and critic, and initialize the TD error discount factor ⁇ .
  • MDPs model the smooth distribution of cloud resource allocation.
  • the training process of optimizing cloud resource allocation begins.
  • the job scheduling policy parameters are constantly updated.
  • the policy gradient of the objective function is defined as:
  • the immediate reward R t is valued in the long term
  • the decision gradient theorem is defined as:
  • time-domain difference learning is used to accurately estimate the state value and guide the update of policy parameters.
  • Figure 3 shows the framework of the A3C cloud resource allocation method proposed in this paper. This method utilizes policy-based and value-based DRL, is able to handle larger action spaces, and reduces variance when estimating gradients.
  • the critic network evaluates the state-action value function Make an estimate and update the parameter w.
  • the actor network guides the update of job scheduling policy parameters based on the evaluation value of the critic network.
  • the corresponding policy gradient is defined as
  • multiple DRL agents work simultaneously and update their respective job scheduling policy parameters asynchronously, as shown in Algorithm 2.
  • a certain number of DRL agents are initialized using the same neural network local parameters (i.e., scheduling strategy) and interact with the corresponding cloud data center environment.
  • gradients are periodically accumulated in the actor and critic networks, and gradient ascent is used by the RMSProp optimizer to asynchronously update parameters in the global network.
  • each DRL agent extracts the latest parameters of the actor and critic networks from the global network and replaces the local parameters with them.
  • Each DRL agent will continue to interact with the corresponding environment based on the updated local parameters and independently optimize the local parameters of the scheduling strategy. Note that there is no coordination between these DRL agents during local training.
  • the A3C-based method will continue training through an asynchronous update mechanism between multiple DRL agents until the results converge.
  • the cloud resource allocation model proposed by this invention is implemented based on TensorFlow 1.4.0. Taking 50 heterogeneous servers as an example, a cloud data center is simulated, the energy consumption k of the idle server is set to 70%, and the maximum energy consumption P max of the server is set to 250W. Therefore, as the resource utilization increases from 0% to 100%, the energy consumption of the server is distributed between 175W and 250W. Furthermore, real-world tracking data from Google Cloud data centers are used as input to our proposed model. The dataset contains resource usage data for different jobs on more than 125,000 servers in Google Cloud data centers in May 2011. More specifically, we first randomly extract 50 servers from the 29-day Google dataset, where each server consists of approximately 100,000 job trajectories. Next, several basic metrics are extracted from each job trace, including machine ID, job ID, start time, end time and corresponding resource usage. In addition, the job sequence length is set to 1000.
  • DRL agents are used to implement asynchronous updates of policy parameters.
  • job tracking data is provided to the proposed model in batches, where the batch size is set to 64.
  • the design of DNN we constructed two fully connected hidden layers with 200 and 100 neurons respectively.
  • the maximum number of epochs to 1000, the reward decay rate ⁇ to 0.9, and the critic’s learning speed ⁇ c to 0.01.
  • Random scheduling algorithm (Random): Jobs are executed in random order based on job duration.
  • Longest job first scheduling algorithm (Longest job first, LJF): Jobs are executed in decreasing order of job duration.
  • Shortest job first scheduling algorithm (Shortest job first, SJF): jobs are executed in increasing order of job duration.
  • Round-robin scheduling algorithm (Round-robin, RR): Jobs are executed fairly in round-robin order, and time slices are used and allocated to each job in equal proportions.
  • Tetris Scheduling Algorithm Jobs are executed based on their resource requirements and the availability of system resources at the time of arrival.
  • the total reward (representing QoS) generally decreases as the average load of the system increases.
  • the proposed method can always achieve a higher total return than other methods.
  • other classical methods have comparable performance only when the average system load is less than 1.2.
  • these classical methods perform only slightly better than random schemes.
  • the average system load exceeds 2.4, the LJF method performs even worse than the random scheme. This is because the average system load is high when a large number of jobs are waiting to be processed, but the LJF method always schedules jobs with the longest working time, causing many jobs to wait excessively and seriously reducing scheduling performance. In comparison, this method always maintains good performance.
  • the experimental results verify the superiority of the method of the present invention in scheduling jobs in complex environments with high system load.
  • RAS generates job scheduling policies based on the resource requests of different user jobs and the current status information of the cloud data center (such as the number of servers, resource usage, energy consumption, etc.).
  • the job scheduler allocates jobs to servers from the job sequence according to the policy issued by the DRL-based resource controller.
  • the information collector records the usage of different resources and the current energy consumption (measured by the energy agent) in the cloud data center. Based on the above information, the DRL-based resource controller will generate the corresponding job scheduling policy.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明涉及一种基于深度强化学习的云数据中心自适应高效资源分配方法。首先,采用actor参数化策略(分配资源),并根据critic(评估操作)评估的分数选择操作(调度作业)。然后利用梯度上升更新资源分配策略,利用优势函数减小策略梯度的方差,提高了训练效率;我们使用来自谷歌云数据中心的真实数据进行了广泛的模拟实验。本发明方法与2种先进的基于DRL的云资源分配方法和5种经典的云资源分配方法相比,该方法在延迟和作业丢弃率方面具有更高的服务质量(Quality-of-Service,QoS),且能量效率更高。

Description

基于深度强化学习的云数据中心自适应高效资源分配方法 技术领域
本发明涉及一种基于深度强化学习的云数据中心自适应高效资源分配方法。
背景技术
云计算已经迅速发展成为最流行的计算模式之一。在云计算中,资源分配是指为满足用户和云服务提供商的需求而对计算、存储和网络资源进行分配的过程。随着云数据中心规模的不断扩大和动态变化,云资源分配出现了许多问题,如资源分配不合理、变化响应缓慢等。这些问题不仅降低了服务质量,而且还导致了较高的能源消耗和维护开销。因此设计一个自适应的、高效的云数据中心资源分配解决方案已成为当务之急。然而由于云计算中系统状态的动态性和用户需求的多样性,这是一项极具挑战性的任务,如下所述:
云数据中心的复杂性:云数据中心中存在大量不同类型的服务器,提供各种计算和存储资源,包括CPU、内存和存储单元。因此如何在云计算中有效地管理和协调异构资源是一个挑战。
用户需求的多样性:来自不同用户的作业需要异构的资源(例如CPU、内存和存储单元)和不同的持续时间(例如分钟、小时和天)。用户需求的多样性加大了云数据中心资源分配的难度。
过度的能源消耗:大量的能源消耗不仅会造成巨大的运行费用,而且会导致大里的碳排放。在谷歌云数据中心中,服务器的平均CPU利用率只有20%左右。这种能源浪费发生在不合理的资源配置方案。然而,在维护高效节能的云数据中心的同时,很难满足多样化的用户需求。
云系统的动态性:在云数据中心中,资源使用和请求等系统状态经常发生变化。在这样的动态云环境下,期望有效的资源分配能够持续满足用户作业的需求。然而,在动态云环境下,很难建立准确的资源分配模型。因此,这些动态性给云数据中心的资源自适应分带来了巨大的挑战。
云资源分配的许多经典解决方案都是基于规则、启发式和控制理论。虽然这些解决方案可以在一定程度上解决云资源分配问题,但它们通常利用云系统的先验知识(如状态转换、需求变化、能源消耗等)来制定相应的资源分配策略。因此,这些解决方案可能在特定的应用程序场景中工作的很好,但它们无法完全适应具有动态系统状态和用户需求的云环境。例如,作业调度可以通过使用基于规则的策略轻松地执行,以满足即时的用户需求。但是,他们只考虑当前的作业特点(如资源需求、工作时长)来获得短期利益。因此,它们不能从长远的角度自适应地满足用户作业的动态需求,由于资源配置不合理,可能导致作业延迟过大,造成严重的资源 浪费。此外,这些解决方案可能需要多次迭代才能找到可行的资源分配方案,这导致了较高的计算复杂度和资源开销。因此它们不能有效地解决动态云环境中复杂的资源分配问题。
强化学习(Reinforcement Learning,RL)是一种具有高自适应性和低复杂度的资源分配方法。然而传统的基于RL的方法在处理复杂的云环境时存在高维状态空间的问题。为了解决这一问题,提出了利用深度神经网络,从高维状态空间中提取低维表示的深度强化学习。虽然目前有一些基于DRL的方法专注于云资源分配问题,但大多数方法都使用基于值的DRL,这可能导致在处理较大的动作空间时训练效率较低。这是因为基于值的DRL通过计算每个操作的概率来学习确定性策略。然而,在云数据中心中,作业可能会不断到来,因此操作空间可能会相当大,以持续满足调度作业的要求。因此,基于值的DRL很难快速收敛到最优策略。相比之下,基于策略的DRL(如策略梯度PG(Policy Gradient,PG))学习随机策略,通过直接输出具有概率分布的动作,可以更好地处理云数据中心中较大的动作空间,但估计策略梯度时产生的高方差可能会降低训练效率。
技术问题
作为基于值和基于策略的DRL算法的协同作用,A2C(AdvantageActor-Critic,A2C)旨在解决上述问题。在A2C模型中,Actor根据Critic评估的分数来选择行为,策略梯度的方差减小,具有优势函数。而A2C采用单线程训练方式,计算资源利用不足。同时,在使用A2C时可能会出现较强的数据相关性,因为在只有一个DRL agent与环境交互的情况下,会产生相似的训练样本,导致训练结果不理想。为了解决A2C算法存在的这些问题,提出了一种低方差、高效率的异步优势Actor-Critic(A3C)算法。A3C利用多个DRL代理同时与环境交互,充分利用计算资源,提高学习速度。同时不同DRL agent采集的数据是相互独立的,因此A3C打破了数据的相关性。
技术解决方案
本发明的目的在于提供一种基于深度强化学习的云数据中心自适应高效资源分配方法,该方法在延迟和作业丢弃率方面具有更高的服务质量,且能量效率更高。
为实现上述目的,本发明的技术方案是:一种基于深度强化学习的云数据中心自适应高效资源分配方法,设计一种统一的资源分配模型,该资源分配模型以作业延迟、解雇率和能量效率为优化目标;在资源分配模型基础上,将云资源分配的状态空间、行为空间和奖励函数定义为马尔可夫决策过程并将其用于基于DRL(Deep Reinforcement Learning,DRL)的云资源分配方法;提出一种基于Actor-Critic DRL的资源分配方法,解决云数据中心作业调度的最优策略问题;此外基于Actor-Critic DRL的资源分配方法通过多个DRL代理间策略参数的异步更新。
在本发明一实施例中,所述基于DRL的云资源分配方法具体实现如下:
步骤S1、资源分配系统RAS(Resource Allocation System,RAS)根据不同用户作业的资源请求和云数据中心的当前状态信息生成作业调度策略;资源分配系统RAS包括基于DRL的资源控制器、作业调度器、信息收集器、能源代理;
步骤S2、作业调度器根据基于DRL的资源控制器下发的策略,从作业序列中将作业分配给云数据中心的服务器;
步骤S3、在资源分配过程中,信息收集器记录云数据中心中不同资源的使用情况和能源代理测量的当前的能耗,由基于DRL的资源控制器生成相应的作业调度策略。
在本发明一实施例中,所述DRL中的状态空间、动作空间和奖励函数定义如下:
状态空间:在状态空间S中,状态s t∈S由时间步t表示的所有服务器的资源使用情况和所有到达的作业的资源请求组成;一方面,
Figure PCTCN2022126468-appb-000001
Figure PCTCN2022126468-appb-000002
u m,n是第n个资源类型在服务器虚拟机上的使用情况;另一方面,
Figure PCTCN2022126468-appb-000003
Figure PCTCN2022126468-appb-000004
表示所有到达的作业对不同类型资源的占用请求,其中o j,n是最近到达的作业j对第n个资源类型的占用请求,
Figure PCTCN2022126468-appb-000005
Figure PCTCN2022126468-appb-000006
表示通过时间步长t到达的所有作业的持续时间,其中d j表示作业j的持续时间;因此,云数据中心按时间步长t的状态定义为:
Figure PCTCN2022126468-appb-000007
其中
Figure PCTCN2022126468-appb-000008
Figure PCTCN2022126468-appb-000009
用于表示所有服务器和到达的作业的状态,V={v 1,v 2,...,v m},J表示的作业序列;当作业到达或完成时,状态空间发生变化,状态空间的维度取决于服务器和到达作业的情况,由(mn+z(n+1))计算,其中m、n和z分别表示服务器的数量、资源类型和到达的作业;
动作空间:在时间步长t时,作业调度器采用的动作是根据基于DRL的资源控制器下发的作业调度策略,从作业序列中选择并执行作业;策略是根据当前资源分配系统状态生成的,作业调度器将作业分配给相应的服务器执行;一旦一个作业被调度到相应的服务器上,服务器将根据该作业的资源请求自动分配相应的资源;因此,操作空间A仅指示一个作业是否将由服务器处理,定义为:
A={a t|a t∈{0,1,2,...,m}}          公式(2)
其中a t∈A;当a t=0时,作业调度器不会在时间步长t分配作业,作业需要在作业序列中等待;否则,作业将由相应的服务器处理;
状态转移概率矩阵:该矩阵表示两种状态之间转移的概率;在时间步长t 0时,无待处理作业,初始状态s 0=[0,[[0],[0]]],其中三个“0”项分别表示服务器的CPU(Central Processing Units,CPU)使用情况、作业的占用请求和作业持续时间;在t 1,作业j 1被立即调度,因为可用资源是足够的;执行此操作后,状态演化为s 1=[50,[[50],[d 1]]];d1,第一个“50”项表示服务器的CPU使用率,第二个“50”项表示j 1对CPU资源的占用请求,d 1表示j 1的持续时间;类似地,在将j 2调度到t 2之后,状态演变为s 2=[80,[[50,30],[d 1,d 2]]];其中,状态转移概率矩阵记为IP(s t+1|s t,a t),表示在当前状态s t采取一个动作a t时,过渡到下一个状态s t+1的概率;过渡概率的值通过运行DRL算法得到,该算法输出在某一状态下采取不同动作的概率;
奖励函数:通过奖励函数引导DRL agent学习到更高折扣长期奖励的更好的作业调度策略,提高云资源分配的系统性能;因此,在时间步t时,总奖励R t由QoS的奖励,记为
Figure PCTCN2022126468-appb-000010
和能量效率,记为
Figure PCTCN2022126468-appb-000011
两部分组成,定义为
Figure PCTCN2022126468-appb-000012
具体来说,
Figure PCTCN2022126468-appb-000013
反映了在时间步长t上不同类型延迟的惩罚,包括
Figure PCTCN2022126468-appb-000014
Figure PCTCN2022126468-appb-000015
它被定义为:
Figure PCTCN2022126468-appb-000016
其中w 1和w 2用于加重处罚;由于
Figure PCTCN2022126468-appb-000017
为负值,持续时间较长的作业往往等待的时间较短;此外,
Figure PCTCN2022126468-appb-000018
反映了时间步长t处的能量消耗的惩罚,定义为
Figure PCTCN2022126468-appb-000019
其中
Figure PCTCN2022126468-appb-000020
为执行一个作业所消耗的时间步长t,w 3为惩罚的权重。
在本发明一实施例中,所述基于Actor-Critic DRL的资源分配方法采用基于角色评论的DRL框架,并采用异步更新A3C(Asynchronous Advantage Actor-Critic,A3C)来加速训练过程;具体来说,基于Actor-Critic DRL的资源分配方法结合基于值和基于策略的DRL算法;一方面,基于值的DRL利用函数逼近器来确定值函数,并采用∈-贪心来平衡探索和开发;另一方面,基于策略的DRL参数化作业调度策略,并在学习过程中直接以概率分布输出动作,而不存储它们的Q值。
在本发明一实施例中,在每个DRL代理中,critic网络
Figure PCTCN2022126468-appb-000021
对状态-动作值函数
Figure PCTCN2022126468-appb-000022
Figure PCTCN2022126468-appb-000023
进行估计并更新参数w;另外,actor网络
Figure PCTCN2022126468-appb-000024
根据critic网络的评估值,引导作业调度策略参数的更新;对应的策略梯度定义为:
Figure PCTCN2022126468-appb-000025
Figure PCTCN2022126468-appb-000026
其中
Figure PCTCN2022126468-appb-000027
为在作业调度的当前策略
Figure PCTCN2022126468-appb-000028
放置下,MDP建模云资源分配的平稳分布;R t为即时奖励;
接下来,使用状态值函数
Figure PCTCN2022126468-appb-000029
来减小估计梯度时的方差;因此,策略梯度被重新定义为:
Figure PCTCN2022126468-appb-000030
其中
Figure PCTCN2022126468-appb-000031
是优势函数;此外
Figure PCTCN2022126468-appb-000032
通过TD学习进行更新,其中TD误差定义为:
Figure PCTCN2022126468-appb-000033
多个DRL代理同时工作,并异步更新各自的作业调度策略参数;具体来说,使用相同的神经网络局部参数即调度策略初始化预定数量的DRL代理,并与相应的云数据中心环境进行交互;对于每个DRL代理,在actor和critic网络中周期性地累积梯度,并通过RMSProp优化器使用梯度上升对全局网络中的参数进行异步更新;接下来,每个DRL代理从全局网络中提取actor和critic网络的最新参数,并用它们替换局部参数;每个DRL代理将根据更新后的本地参数,继续与相应的环境交互,独立优化调度策略的本地参数;在本地培训过程中,这些DRL代理之间没有协调;基于Actor-Critic DRL的资源分配方法将通过多个DRL代理之间的异步更新机制继续训练,直到结果收敛。
有益效果
相较于现有技术,本发明具有以下有益效果:
本发明提出的基于A3C的资源分配方法来有效地调度作业,以提高云数据中心的QoS和能量效率。利用谷歌云数据中心的真实跟踪数据进行了大量的仿真实验,验证了该方法在实现自适应和高效资源分配方面的有效性。具体而言,该方法在QoS(作业平均延迟和作业的平均能量消耗)和能效(作业的平均能量消耗)方面均优于LJF、Tetris、SJF、RR、PG和DQL等经典资源分配方法。此外,随着系统平均负载的增加,该方法的训练效果优于其他两种方法,且比基于DRL的两种高级方法(PG和DQL)具有更高的训练效率(更快的收敛速度)。仿真结果表明,该方法对改善云数据中心的资源分配具有重要意义。
附图说明
图1为本发明云数据中心资源分配模型;
图2为本发明MDP进程建模云资源分配的一个例子;
图3为本发明基于A3C的云资源分配方法框架;
图4为本发明单目标优化下不同系统负载下不同资源分配方法的总回报。
本发明的实施方式
下面结合附图,对本发明的技术方案进行具体说明。
本发明一种基于深度强化学习的云数据中心自适应高效资源分配方法,设计一种统一的资源分配模型,该资源分配模型以作业延迟、解雇率和能量效率为优化目标;在资源分配模型基础上,将云资源分配的状态空间、行为空间和奖励函数定义为马尔可夫决策过程并将其用于基于DRL的云资源分配方法;提出一种基于Actor-Critic DRL的资源分配方法,解决云数据中心作业调度的最优策略问题;此外基于Actor-Critic DRL的资源分配方法通过多个DRL代理间策略参数的异步更新。
以下为本发明具体实现过程。
本发明将云数据中心的资源分配问题表述为一个无模型的DRL问题,该问题具有动态的系统状态和各种用户需求。针对资源异构、用户需求多样化、能耗大和动态环境下的云数据中心。针对A3C算法的优势,本发明提出了一种基于A3C的资源分配方案。
针对系统状态动态、用户需求异构的云数据中心,设计了一种统一的资源分配模型。该模型以作业延迟、解雇率和能量效率(作业的平均能量消耗)为优化目标。在此基础上,将云资源分配的状态空间、行为空间和奖励函数定义为马尔可夫决策过程(MDP)并将其用于基于DRL的云资源分配方案。提出了一种基于Actor-Critic DRL(A3C)的资源分配方法,有效地解决了云数据中心作业调度的最优策略问题。其中DNN用于处理云数据中心的高维状态空间问题。此外该方法通过多个DRL代理间策略参数的异步更新,大大提高了训练效率。
图1展示了云数据中心分配模型。一个基于DRL的资源控制器嵌入到资源分配系统(RAS)中。RAS根据不同用户作业的资源请求和云数据中心的当前状态信息(如服务器数量、资源使用情况、能耗等)生成作业调度策略。作业调度器根据基于DRL的资源控制器下发的策略,从作业序列中将作业分配给服务器。具体来说,将这些工作推广为数据处理工作,如用于图像处理和语音识别的深度学习(DL)模型的训练作业。对于不同的作业,它们会根据其目的显示不同的资源请求。因此,每个作业由特定的作业持续时间(例如,分钟、小时或天)和对不同类型资源的请求(例如,CPU和内存)组成。在资源分配过程中,信息采集器记录云数据中心中不同资源的使用情况和当前的能耗(由能源代理测量)。根据以上信息,基于DRL的资源控制 器会生成相应的作业调度策略。
为了能够在云数据中心实现更好的服务质量和更高的能量效率。本发明提出了一种基于异步优势Actor-Critic(A3C)的云数据中心资源分配方法,该方法采用基于角色评论的DRL框架,并采用异步更新(A3C)来加速训练过程。具体来说,基于A3C的方法结合了基于值和基于策略的DRL算法。一方面,基于值的DRL利用函数逼近器来确定值还数,并采用∈-贪心来平衡探索和开发;因此,DRL代理利用现有经验选择好的作业调度操作,同时探索新的操作。另一方面,基于策略的DRL参数化了作业调度策略,并在学习过程中直接以概率分布输出动作,而不存储它们的Q值。
1、资源分配模型
为了提高服务质量和能源效率,本发明提出了一种基于DRL的资源分配方法,包括以下步骤:
步骤S1、RAS根据不同用户作业的资源请求和云数据中心的当前状态信息(如服务器数量、资源使用情况、能耗等)生成作业调度策略。
步骤S2、作业调度器根据基于DRL的资源控制器下发的策略,从作业序列中将作业分配给服务器。
步骤S3、在资源分配过程中,信息采集器记录云数据中心中不同资源的使用情况和当前的能耗(由能源代理测量)。根据以上信息,基于DRL的资源控制器会生成相应的作业调度策略。
其中DRL中的状态空间、动作空间和奖励函数定义如下:
状态空间:在状态空间S中,状态s t∈S由时间步t表示的所有服务器的资源使用情况和所有到达的作业的资源请求组成。一方面,
Figure PCTCN2022126468-appb-000034
Figure PCTCN2022126468-appb-000035
U m,n是第n个资源类型在服务器虚拟机上的使用情况。另一方面,
Figure PCTCN2022126468-appb-000036
Figure PCTCN2022126468-appb-000037
表示所有到达的作业对不同类型资源的占用请求(时间步长t),其中o j,n是最近到达的作业j对第n个资源类型的占用请求,
Figure PCTCN2022126468-appb-000038
Figure PCTCN2022126468-appb-000039
表示通过时间步长t到达的所有作业的持续时间。因此,云数据中心按时间步长t的状态定义为:
Figure PCTCN2022126468-appb-000040
其中
Figure PCTCN2022126468-appb-000041
Figure PCTCN2022126468-appb-000042
用于表示所有服务器和到达的作业的状态,以保证清晰的表 示。当作业到达或完成时,状态空间发生变化,状态空间的维度取决于服务器和到达作业的情况,由(mn+z(n+1))计算,其中m、n和z分别表示服务器的数量、资源类型和到达的作业。
动作空间:在时间步长t时,作业调度器采用的动作是根据基于DRL的资源控制器下发的作业调度策略,从作业序列中选择并执行作业。策略是根据当前系统状态生成的,作业调度器将作业分配给特定的服务器执行。一旦一个作业被调度到合适的服务器上,服务器将根据该作业的资源请求自动分配相应的资源。因此,操作空间仅指示一个作业是否将由服务器处理,定义为:
A={a t|a t∈{0,1,2,...,m}}         公式(2)
其中a t∈A。当a t=0时,作业调度器不会在时间步长t分配作业,作业需要在作业序列中等待。否则,作业将由特定的服务器处理。
状态转移概率矩阵:该矩阵表示两种状态之间转移的概率。在时间步长t 0时,无待处理作业,初始状态s 0=[0,[[0],[0]]],其中三个“0”项分别表示服务器的CPU使用情况、作业的占用请求和作业持续时间。在t 1,作业j 1被立即调度,因为可用资源是足够的。执行此操作后,状态演化为s 1=[50,[[50],[d 1]]];d1,第一个“50”项表示服务器的CPU使用率,第二个“50”项表示j 1对CPU资源的占用请求,d 1表示j 1的持续时间。类似地,在将j 2调度到t 2之后,状态演变为s 2=[80,[[50,30],[d 1,d 2]]]。其中,状态转移概率矩阵记为IP(s t+1|s t,a t),表示在当前状态s t采取一个动作a t时,过渡到下一个状态s t+1的概率。过渡概率的值通过运行DRL算法得到,该算法输出在某一状态下采取不同动作的概率。
奖励函数:通过奖励函数引导DRL agent(RAS)学习到更高折扣长期奖励的更好的作业调度策略,提高云资源分配的系统性能。因此,在时间步t时,总奖励R t由QoS的奖励(记为
Figure PCTCN2022126468-appb-000043
)和能量效率(记为
Figure PCTCN2022126468-appb-000044
)两部分组成,定义为
Figure PCTCN2022126468-appb-000045
具体来说,
Figure PCTCN2022126468-appb-000046
反映了在时间步长t上不同类型延迟的惩罚(因此为负),包括
Figure PCTCN2022126468-appb-000047
Figure PCTCN2022126468-appb-000048
(如表1所示),它被定义为:
Figure PCTCN2022126468-appb-000049
其中w 1和w 2用于加重处罚。由于
Figure PCTCN2022126468-appb-000050
为负值,持续时间较长的作业往往等待的时间较短。对于以利润最大化为目标的云系统来说,这是明智的,因为作业持续时间越长,利润越高。此外,
Figure PCTCN2022126468-appb-000051
反映了时间步长t处的能量消耗的惩罚,定义为
Figure PCTCN2022126468-appb-000052
其中
Figure PCTCN2022126468-appb-000053
为执行一个作业所消耗的时间步长t,w 3为惩罚的权重。
在本实施例中,符号定义如下:
定义1:考虑了一个云数据中心的场景,其中有一组服务器,表示为V={v 1,v 2,...,v m},其中m为服务器的个数。
定义2:每个服务器提供多种类型的资源(如CPU、内存和存储单元),表示为Res={r 1,r 2,...,r m},其中n为资源类型个数。
定义3:考虑到有一组预期要处理的所有作业,表示为J total={j 1,j 2,...,j p},其中p表示作业的总数。
定义4:一组作业在作业序列中等待,表示为J seq={j 1,j 2,...,j q},其中q表示作业序列中等待的作业数,并且q≤p。当J total的作业来了,它将首先进入J seq。如果可用资源足够,则可以立即处理该作业。否则,该作业将在作业序列中等待调度。
定义5:由于作业延迟值的数值差异会导致梯度下降时计算时间过长,因此采用归一化算法提高训练速度和算法的收敛性。因此,L normal被定义为标准化的平均作业延迟,它将所有成功完成的作业的作业延迟标准化,然后取其平均值,其中L normal≥1并且d j是作业的持续时间。
Figure PCTCN2022126468-appb-000054
定义6:disRate定义为作业解雇率,用于计算作业序列满时被解雇的作业的比率,其中0≤disRate≤1。
Figure PCTCN2022126468-appb-000055
定义7:云数据中心的总能耗为E total。其中P max是服务器充分利用时的最大能耗,k是用来计算比例的能源消耗空闲服务器,
Figure PCTCN2022126468-appb-000056
所有服务器的资源使用步伐t,t是步伐当最后一个作业是完成。
Figure PCTCN2022126468-appb-000057
定义8:E job定义为作业调度过程中的能源效率(由所有成功完成的作业的平均能耗衡量)。
Figure PCTCN2022126468-appb-000058
在云资源分配的优化过程中,DRL代理首先在环境(云数据中心)的当前系统状态s t(资 源使用和资源请求)下选择一个动作a t(调度作业)。接下来,DRL代理收到奖励R t(QoS和能量效率),并进入下一个状态s t+1。这个过程用MDP来说明,如图2所示。
本发明提出了一种基于异步优势Actor-Critic(A3C)的云数据中心资源分配方法,该方法能够在云数据中心中实现卓越的QoS和能量效率。该方法采用基于角色评论的DRL框架,并采用异步更新(A3C)来加速训练过程。具体来说,基于A3C的方法结合了基于值和基于策略的DRL算法。一方面,基于值的DRL利用函数逼近器来确定值函数,并采用∈-贪心来平衡探索和开发;因此DRL代理利用现有经验选择好的作业调度操作,同时探索新的操作。另一方面,基于策略的DRL参数化了作业调度策略,并在学习过程中直接以概率分布输出动作,而不存储它们的Q值。
本发明提出的基于A3C的云资源分配方法的关键步骤如算法1所示。
Figure PCTCN2022126468-appb-000059
Figure PCTCN2022126468-appb-000060
基于式(1)中状态空间、式(2)中动作空间和式(3)中奖励函数的定义,首先对actor网络
Figure PCTCN2022126468-appb-000061
和critic网络
Figure PCTCN2022126468-appb-000062
进行加权和偏差初始化。然后,初始化actor和critic的学习速率γ a和γ c,并初始化TD误差折现因子β。
所提出的基于A3C的资源分配方法的优化目标是获得最大的收益。因此,即时奖励R t(在式(7)中定义)是通过概率分布累积的:
Figure PCTCN2022126468-appb-000063
其中
Figure PCTCN2022126468-appb-000064
为在作业调度的当前策略
Figure PCTCN2022126468-appb-000065
放置下,MDPs建模云资源分配的平稳分布。
初始化完成后,开始优化云资源分配的训练过程。为了提高优化目标,作业调度策略参数不断更新。
在一步策略规划中,目标函数的策略梯度定义为:
Figure PCTCN2022126468-appb-000066
对于多步MDP,即时奖励R t被长期价值
Figure PCTCN2022126468-appb-000067
所取代,决策梯度定理定义为:
定理1.策略梯度定理[13]:对于任意可微策略
Figure PCTCN2022126468-appb-000068
和任意策略目标函数,对应的梯度定义为
Figure PCTCN2022126468-appb-000069
在此基础上,采用时域差分学习,准确估计状态值并指导策略参数的更新。
图3给出了本文提出的基于A3C云资源分配方法的框架。该方法利用了基于策略和基于值的DRL,能够处理较大的动作空间,并且在估计梯度时减小了方差。
在每个DRL代理中,critic网络对状态-动作值函数
Figure PCTCN2022126468-appb-000070
进行估计并更新参数w。另外,actor网络根据critic网络的评估值,引导作业调度策略参数的更新。对应的策略梯度定义为
Figure PCTCN2022126468-appb-000071
接下来,使用状态值函数
Figure PCTCN2022126468-appb-000072
来减小估计梯度时的方差,它只与状态相关,不改变梯度。因此,策略梯度被重新定义为
Figure PCTCN2022126468-appb-000073
其中
Figure PCTCN2022126468-appb-000074
是优势函数。此外
Figure PCTCN2022126468-appb-000075
通过TD学习进行更新,其中TD误差定义为:
Figure PCTCN2022126468-appb-000076
为了提高训练效率,多个DRL代理同时工作,并异步更新各自的作业调度策略参数,如算法2所示。具体来说,使用相同的神经网络局部参数(即调度策略)初始化一定数量的DRL代理,并与相应的云数据中心环境进行交互。对于每个DRL代理,在actor和critic网络中周期性地累积梯度,并通过RMSProp优化器使用梯度上升对全局网络中的参数进行异步更新。接下来,每个DRL代理从全局网络中提取actor和critic网络的最新参数,并用它们替换局部参数。每个DRL代理将根据更新后的本地参数,继续与相应的环境交互,独立优化调度策略的本地参数。请注意,在本地培训过程中,这些DRL代理之间没有协调。基于A3C的方法将通过多个DRL代理之间的异步更新机制继续训练,直到结果收敛。
Figure PCTCN2022126468-appb-000077
Figure PCTCN2022126468-appb-000078
2、方法评估
本发明提出的云资源分配模型是基于TensorFlow 1.4.0实现的。以50台异构服务器为例,模拟一个云数据中心,将空闲服务器的能耗k设置为70%,将服务器的最大能耗P max设置为250W。因此,随着资源利用率从0%增加到100%,服务器的能耗分布在175W~250W之间。此外,来自谷歌云数据中心的真实世界跟踪数据被用作我们提出的模型的输入。数据集包含2011年5月谷歌云数据中心超过125000台服务器的不同作业的资源使用数据。更具体地说,首先从29天的谷歌数据集中随机提取50个服务器,其中每个服务器由大约100000个作业轨迹组成。接下来从每个作业跟踪中提取几个基本指标,包括机器ID、作业ID、开始时间、结束时间和相应的资源使用情况。另外作业序列长度设置为1000。
在训练过程中,使用10个DRL代理实现策略参数的异步更新。在每个DRL代理中,作业跟踪数据是分批提供给建议的模型的,其中批大小设置为64。在DNN的设计上,我们分别用200个和100个神经元构建了两个完全连通的隐藏层。此外,我们将epoch的最大数目设置为1000,奖励衰减率λ设置为0.9,还有critic的学习速度γ c设置为0.01。
基于以上设置,我们进行了大量的仿真实验来评估所提出的基于A3C的云资源分配方法的性能。
为了分析所提出的云资源分配方法的有效性和优势,进行了大量的对比实验,对五种经典算法也进行了如下的评估。
随机调度算法(Random):作业按作业持续时间的随机顺序执行。
最长作业优先调度算法(Longest job first,LJF):作业按照作业持续时间的递减顺序执行。
最短作业优先调度算法(Shortest job first,SJF):作业按照作业持续时间的递增顺序执行。
轮询调度算法(Round-robin,RR):作业按照循环顺序公平地执行,时间片被使用,并以相等的比例分配给每个作业。
Tetris调度算法:作业是根据其资源需求和到达时系统资源的可用性来执行的。
如图4所示,总奖励(代表QoS)一般随着系统平均负载的增加而下降。在这种情况下,即使系统平均负载变大,所提出的方法也总能获得比其他方法更高的总回报。相比之下,其他经典方法只有在平均系统负载小于1.2时才具有可比较的性能。特别是,当平均系统负载大于2.0时,这些经典方法的性能仅略优于随机方案。当平均系统负载超过2.4时,LJF方法的性能甚至比随机方案更差。这是因为大量的作业等待处理时的平均系统负载高但LJF方法总是安排优先级最长的工作时间的作业,导致很多作业的过度等,严重降低了调度性能。相比之下,该方法始终保持良好的性能。实验结果验证了本发明方法在高系统负载的复杂环境下调度作业的优越性。
3、本发明产品使用过程
(1)RAS根据不同用户作业的资源请求和云数据中心的当前状态信息(如服务器数量、资源使用情况、能耗等)生成作业调度策略。
(2)作业调度器根据基于DRL的资源控制器下发的策略,从作业序列中将作业分配给服务器。
(3)在资源分配过程中,信息采集器记录云数据中心中不同资源的使用情况和当前的能耗(由能源代理测量)。根据以上信息,基于DRL的资源控制器会生成相应的作业调度策略。
以上是本发明的较佳实施例,凡依本发明技术方案所作的改变,所产生的功能作用未超出本发明技术方案的范围时,均属于本发明的保护范围。

Claims (5)

  1. 一种基于深度强化学习的云数据中心自适应高效资源分配方法,其特征在于,设计一种统一的资源分配模型,该资源分配模型以作业延迟、解雇率和能量效率为优化目标;在资源分配模型基础上,将云资源分配的状态空间、行为空间和奖励函数定义为马尔可夫决策过程并将其用于基于DRL的云资源分配方法;提出一种基于Actor-Critic DRL的资源分配方法,解决云数据中心作业调度的最优策略问题;此外基于Actor-Critic DRL的资源分配方法通过多个DRL代理间策略参数的异步更新。
  2. 根据权利要求1所述的基于深度强化学习的云数据中心自适应高效资源分配方法,其特征在于,所述基于DRL的云资源分配方法具体实现如下:
    步骤S1、资源分配系统RAS根据不同用户作业的资源请求和云数据中心的当前状态信息生成作业调度策略;资源分配系统RAS包括基于DRL的资源控制器、作业调度器、信息收集器、能源代理;
    步骤S2、作业调度器根据基于DRL的资源控制器下发的策略,从作业序列中将作业分配给云数据中心的服务器;
    步骤S3、在资源分配过程中,信息收集器记录云数据中心中不同资源的使用情况和能源代理测量的当前的能耗,由基于DRL的资源控制器生成相应的作业调度策略。
  3. 根据权利要求2所述的基于深度强化学习的云数据中心自适应高效资源分配方法,其特征在于,所述DRL中的状态空间、动作空间和奖励函数定义如下:
    状态空间:在状态空间S中,状态s t∈S由时间步t表示的所有服务器的资源使用情况和所有到达的作业的资源请求组成;一方面,
    Figure PCTCN2022126468-appb-100001
    Figure PCTCN2022126468-appb-100002
    u m,n是第n个资源类型在服务器虚拟机上的使用情况;另一方面,
    Figure PCTCN2022126468-appb-100003
    Figure PCTCN2022126468-appb-100004
    表示所有到达的作业对不同类型资源的占用请求,其中o j,n是最近到达的作业j对第n个资源类型的占用请求,
    Figure PCTCN2022126468-appb-100005
    表示通过时间步长t到达的所有作业的持续时间,其中d j表示作业j的持续时间;因此,云数据中心按时间步长t的状态定义为:
    Figure PCTCN2022126468-appb-100006
    其中
    Figure PCTCN2022126468-appb-100007
    Figure PCTCN2022126468-appb-100008
    用于表示所有服务器和到达的作业的状态,V={v 1,v 2,...,v m},J表示的作业序列;当作业到达或完成时,状态空间发生变化,状态空间的维度 取决于服务器和到达作业的情况,由(mn+z(n+1))计算,其中m、n和z分别表示服务器的数量、资源类型和到达的作业;
    动作空间:在时间步长t时,作业调度器采用的动作是根据基于DRL的资源控制器下发的作业调度策略,从作业序列中选择并执行作业;策略是根据当前资源分配系统状态生成的,作业调度器将作业分配给相应的服务器执行;一旦一个作业被调度到相应的服务器上,服务器将根据该作业的资源请求自动分配相应的资源;因此,操作空间A仅指示一个作业是否将由服务器处理,定义为:
    A={a t|a t∈{0,1,2,...,m}}    公式(2)
    其中a t∈A;当a t=0时,作业调度器不会在时间步长t分配作业,作业需要在作业序列中等待;否则,作业将由相应的服务器处理;
    状态转移概率矩阵:该矩阵表示两种状态之间转移的概率;在时间步长t 0时,无待处理作业,初始状态s 0=[0,[[0],[0]]],其中三个“0”项分别表示服务器的CPU使用情况、作业的占用请求和作业持续时间;在t 1,作业j 1被立即调度,因为可用资源是足够的;执行此操作后,状态演化为s 1=[50,[[50],[d 1]]];第一个“50”项表示服务器的CPU使用率,第二个“50”项表示j 1对CPU资源的占用请求,d 1表示j 1的持续时间;类似地,在将j 2调度到t 2之后,状态演变为s 2=[80,[[50,30],[d 1,d 2]]];其中,状态转移概率矩阵记为IP(s t+1|s t,a t),表示在当前状态s t采取一个动作a t时,过渡到下一个状态s t+1的概率;过渡概率的值通过运行DRL算法得到,该算法输出在某一状态下采取不同动作的概率;
    奖励函数:通过奖励函数引导DRL agent学习到更高折扣长期奖励的更好的作业调度策略,提高云资源分配的系统性能;因此,在时间步t时,总奖励R t由QoS的奖励,记为
    Figure PCTCN2022126468-appb-100009
    和能量效率,记为
    Figure PCTCN2022126468-appb-100010
    两部分组成,定义为
    Figure PCTCN2022126468-appb-100011
    具体来说,
    Figure PCTCN2022126468-appb-100012
    反映了在时间步长t上不同类型延迟的惩罚,包括
    Figure PCTCN2022126468-appb-100013
    Figure PCTCN2022126468-appb-100014
    它被定义为:
    Figure PCTCN2022126468-appb-100015
    其中w 1和w 2用于加重处罚;由于
    Figure PCTCN2022126468-appb-100016
    为负值,持续时间较长的作业往往等待的时间较短;此外,
    Figure PCTCN2022126468-appb-100017
    反映了时间步长t处的能量消耗的惩罚,定义为
    Figure PCTCN2022126468-appb-100018
    其中
    Figure PCTCN2022126468-appb-100019
    为执行一个作业所消耗的时间步长t,w 3为惩罚的权重。
  4. 根据权利要求1所述的基于深度强化学习的云数据中心自适应高效资源分配方法,其特征在于,所述基于Actor-Critic DRL的资源分配方法采用基于角色评论的DRL框架,并采用异步更新A3C来加速训练过程;具体来说,基于Actor-Critic DRL的资源分配方法结合基于值和基于策略的DRL算法;一方面,基于值的DRL利用函数逼近器来确定值函数,并采用ε-贪心来平衡探索和开发;另一方面,基于策略的DRL参数化作业调度策略,并在学习过程中直接以概率分布输出动作,而不存储它们的Q值。
  5. 根据权利要求3所述的基于深度强化学习的云数据中心自适应高效资源分配方法,其特征在于,在每个DRL代理中,critic网络
    Figure PCTCN2022126468-appb-100020
    对状态-动作值函数
    Figure PCTCN2022126468-appb-100021
    进行估计并更新参数w;另外,actor网络
    Figure PCTCN2022126468-appb-100022
    根据critic网络的评估值,引导作业调度策略参数的更新;对应的策略梯度定义为:
    Figure PCTCN2022126468-appb-100023
    Figure PCTCN2022126468-appb-100024
    其中
    Figure PCTCN2022126468-appb-100025
    为在作业调度的当前策略
    Figure PCTCN2022126468-appb-100026
    放置下,MDP(Markov Decision Process)建模云资源分配的平稳分布;R t为即时奖励;
    接下来,使用状态值函数
    Figure PCTCN2022126468-appb-100027
    来减小估计梯度时的方差;因此策略梯度被重新定义为:
    Figure PCTCN2022126468-appb-100028
    其中
    Figure PCTCN2022126468-appb-100029
    是优势函数;此外
    Figure PCTCN2022126468-appb-100030
    通过TD学习进行更新,其中TD误差定义为:
    Figure PCTCN2022126468-appb-100031
    多个DRL代理同时工作,并异步更新各自的作业调度策略参数;具体来说,使用相同的神经网络局部参数即调度策略初始化预定数量的DRL代理,并与相应的云数据中心环境进行交互;对于每个DRL代理,在actor和critic网络中周期性地累积梯度,并通过RMSProp优化器使用梯度上升对全局网络中的参数进行异步更新;接下来,每个DRL代理从全局网络中提取actor和critic网络的最新参数,并用它们替换局部参数;每个DRL代理将根据更新后的本地参数,继续与相应的环境交互,独立优化调度策略的本地参数;在本地培训过程中,这些DRL代理之间没有协调;基于Actor-Critic DRL的资源分配方法将通过多个DRL代理之间的异步更新机制继续训练,直到结果收敛。
PCT/CN2022/126468 2022-03-28 2022-10-20 基于深度强化学习的云数据中心自适应高效资源分配方法 WO2023184939A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210309973.8A CN114691363A (zh) 2022-03-28 2022-03-28 基于深度强化学习的云数据中心自适应高效资源分配方法
CN202210309973.8 2022-03-28

Publications (1)

Publication Number Publication Date
WO2023184939A1 true WO2023184939A1 (zh) 2023-10-05

Family

ID=82141026

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/126468 WO2023184939A1 (zh) 2022-03-28 2022-10-20 基于深度强化学习的云数据中心自适应高效资源分配方法

Country Status (2)

Country Link
CN (1) CN114691363A (zh)
WO (1) WO2023184939A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117194057A (zh) * 2023-11-08 2023-12-08 贵州大学 一种基于强化学习优化边缘能耗与负载的资源调度方法
CN117314370A (zh) * 2023-11-30 2023-12-29 嘉兴市信达电子科技有限公司 一种基于智慧能源的数据驾驶舱系统及实现方法
CN117453398A (zh) * 2023-10-27 2024-01-26 国网江苏省电力有限公司南通供电分公司 一种提高供电可靠性的算力调度的智能优化方法及系统
CN117472587A (zh) * 2023-12-26 2024-01-30 广东奥飞数据科技股份有限公司 一种ai智算中心的资源调度系统
CN117539648A (zh) * 2024-01-09 2024-02-09 天津市大数据管理中心 一种电子政务云平台的服务质量管理方法及装置
CN117634859A (zh) * 2024-01-26 2024-03-01 清云小筑(北京)创新技术有限公司 基于深度强化学习的资源均衡施工排程方法、装置及设备
CN117667360A (zh) * 2024-01-31 2024-03-08 湘江实验室 面向大模型任务的计算与通信融合的智能算网调度方法
CN117726143A (zh) * 2024-02-07 2024-03-19 山东大学 基于深度强化学习的环境友好型微网优化调度方法及系统
CN117953351A (zh) * 2024-03-27 2024-04-30 之江实验室 一种基于模型强化学习的决策方法
CN117971475A (zh) * 2024-01-31 2024-05-03 酷标物联科技江苏有限公司 一种gpu算力池智能管理方法及系统

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691363A (zh) * 2022-03-28 2022-07-01 福州大学 基于深度强化学习的云数据中心自适应高效资源分配方法
CN115421930B (zh) * 2022-11-07 2023-03-24 山东海量信息技术研究院 任务处理方法、系统、装置、设备及计算机可读存储介质
CN115878295B (zh) * 2023-03-02 2023-05-30 国网江西省电力有限公司信息通信分公司 基于深度强化学习的软件定义安全中台调度方法
CN116069512B (zh) * 2023-03-23 2023-08-04 之江实验室 一种基于强化学习的Serverless高效资源分配方法及系统
CN117709683A (zh) * 2024-02-02 2024-03-15 合肥喆塔科技有限公司 基于实时制造数据的半导体晶圆动态调度方法及设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026549A (zh) * 2019-11-28 2020-04-17 国网甘肃省电力公司电力科学研究院 一种电力信息通信设备自动化测试资源调度方法
CN113692021A (zh) * 2021-08-16 2021-11-23 北京理工大学 一种基于亲密度的5g网络切片智能资源分配方法
WO2022006830A1 (zh) * 2020-07-10 2022-01-13 广东石油化工学院 一种多队列多集群的任务调度方法及系统
CN114691363A (zh) * 2022-03-28 2022-07-01 福州大学 基于深度强化学习的云数据中心自适应高效资源分配方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026549A (zh) * 2019-11-28 2020-04-17 国网甘肃省电力公司电力科学研究院 一种电力信息通信设备自动化测试资源调度方法
WO2022006830A1 (zh) * 2020-07-10 2022-01-13 广东石油化工学院 一种多队列多集群的任务调度方法及系统
CN113692021A (zh) * 2021-08-16 2021-11-23 北京理工大学 一种基于亲密度的5g网络切片智能资源分配方法
CN114691363A (zh) * 2022-03-28 2022-07-01 福州大学 基于深度强化学习的云数据中心自适应高效资源分配方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHEN ZHEYI; HU JIA; MIN GEYONG: "Learning-Based Resource Allocation in Cloud Data Center using Advantage Actor-Critic", ICC 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), IEEE, 20 May 2019 (2019-05-20), pages 1 - 6, XP033581833, DOI: 10.1109/ICC.2019.8761309 *
GAO ZHENFENG; LIU WEI; SUO LONG; LI JIANDONG; LU YIJUN: "Deep Reinforcement Learning based Compute-Intensive Workload Allocation in Data Centers with High Energy Efficiency", 2021 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC), IEEE, 28 July 2021 (2021-07-28), pages 334 - 339, XP034012443, DOI: 10.1109/ICCC52777.2021.9580316 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453398A (zh) * 2023-10-27 2024-01-26 国网江苏省电力有限公司南通供电分公司 一种提高供电可靠性的算力调度的智能优化方法及系统
CN117194057B (zh) * 2023-11-08 2024-01-23 贵州大学 一种基于强化学习优化边缘能耗与负载的资源调度方法
CN117194057A (zh) * 2023-11-08 2023-12-08 贵州大学 一种基于强化学习优化边缘能耗与负载的资源调度方法
CN117314370B (zh) * 2023-11-30 2024-03-01 嘉兴市信达电子科技有限公司 一种基于智慧能源的数据驾驶舱系统及实现方法
CN117314370A (zh) * 2023-11-30 2023-12-29 嘉兴市信达电子科技有限公司 一种基于智慧能源的数据驾驶舱系统及实现方法
CN117472587B (zh) * 2023-12-26 2024-03-01 广东奥飞数据科技股份有限公司 一种ai智算中心的资源调度系统
CN117472587A (zh) * 2023-12-26 2024-01-30 广东奥飞数据科技股份有限公司 一种ai智算中心的资源调度系统
CN117539648A (zh) * 2024-01-09 2024-02-09 天津市大数据管理中心 一种电子政务云平台的服务质量管理方法及装置
CN117634859A (zh) * 2024-01-26 2024-03-01 清云小筑(北京)创新技术有限公司 基于深度强化学习的资源均衡施工排程方法、装置及设备
CN117634859B (zh) * 2024-01-26 2024-04-12 清云小筑(北京)创新技术有限公司 基于深度强化学习的资源均衡施工排程方法、装置及设备
CN117667360A (zh) * 2024-01-31 2024-03-08 湘江实验室 面向大模型任务的计算与通信融合的智能算网调度方法
CN117667360B (zh) * 2024-01-31 2024-04-16 湘江实验室 面向大模型任务的计算与通信融合的智能算网调度方法
CN117971475A (zh) * 2024-01-31 2024-05-03 酷标物联科技江苏有限公司 一种gpu算力池智能管理方法及系统
CN117726143A (zh) * 2024-02-07 2024-03-19 山东大学 基于深度强化学习的环境友好型微网优化调度方法及系统
CN117726143B (zh) * 2024-02-07 2024-05-17 山东大学 基于深度强化学习的环境友好型微网优化调度方法及系统
CN117953351A (zh) * 2024-03-27 2024-04-30 之江实验室 一种基于模型强化学习的决策方法

Also Published As

Publication number Publication date
CN114691363A (zh) 2022-07-01

Similar Documents

Publication Publication Date Title
WO2023184939A1 (zh) 基于深度强化学习的云数据中心自适应高效资源分配方法
CN110096349B (zh) 一种基于集群节点负载状态预测的作业调度方法
CN108182115B (zh) 一种云环境下的虚拟机负载均衡方法
CN104168318B (zh) 一种资源服务系统及其资源分配方法
CN109324875B (zh) 一种基于强化学习的数据中心服务器功耗管理与优化方法
CN110737529A (zh) 一种面向短时多变大数据作业集群调度自适应性配置方法
CN113515351B (zh) 一种基于能耗与QoS协同优化的资源调度实现方法
CN101216710A (zh) 一种由计算机实现的自适应选择动态生产调度控制系统
CN110262897B (zh) 一种基于负载预测的Hadoop计算任务初始分配方法
CN111611062B (zh) 云边协同分层计算方法及云边协同分层计算系统
CN107341041B (zh) 基于优先队列的云任务多维约束回填调度方法
Tong et al. DDQN-TS: A novel bi-objective intelligent scheduling algorithm in the cloud environment
CN111813506A (zh) 一种基于粒子群算法资源感知计算迁移方法、装置及介质
CN114638167B (zh) 基于多智能体强化学习的高性能集群资源公平分配方法
CN110221909A (zh) 一种基于负载预测的Hadoop计算任务推测执行方法
CN114675975B (zh) 一种基于强化学习的作业调度方法、装置及设备
Bian et al. Neural task scheduling with reinforcement learning for fog computing systems
Wang et al. Dynamic multiworkflow deadline and budget constrained scheduling in heterogeneous distributed systems
CN108958919B (zh) 一种云计算中有期限约束的多dag任务调度费用公平性评估方法
CN113535387A (zh) 一种异构感知的gpu资源分配与调度方法及系统
CN117555683A (zh) 基于深度强化学习的云集群资源调度方法
Singh et al. Market-inspired dynamic resource allocation in many-core high performance computing systems
CN116932198A (zh) 资源调度方法、装置、电子设备及可读存储介质
CN115145383A (zh) 一种面向cpu/gpu服务器的自适应节能选择方法
Fan Intelligent Job Scheduling on High Performance Computing Systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22934757

Country of ref document: EP

Kind code of ref document: A1