CN114443249A - Container cluster resource scheduling method and system based on deep reinforcement learning - Google Patents

Container cluster resource scheduling method and system based on deep reinforcement learning Download PDF

Info

Publication number
CN114443249A
CN114443249A CN202210051579.9A CN202210051579A CN114443249A CN 114443249 A CN114443249 A CN 114443249A CN 202210051579 A CN202210051579 A CN 202210051579A CN 114443249 A CN114443249 A CN 114443249A
Authority
CN
China
Prior art keywords
task
container cluster
reinforcement learning
scheduling
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210051579.9A
Other languages
Chinese (zh)
Inventor
吴迪
刘可
胡淼
肖子立
肖霖畅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202210051579.9A priority Critical patent/CN114443249A/en
Publication of CN114443249A publication Critical patent/CN114443249A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Factory Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a container cluster resource scheduling method and system based on deep reinforcement learning, which comprises the following steps: s1: and establishing a deep reinforcement learning intelligent agent. S2: when the allocable resources of a certain container cluster node meet the resource request of a certain task to be scheduled in the task queue, inputting the resource use state of the container cluster node and the characteristic value of the task to be scheduled into a deep reinforcement learning agent to obtain the action probability distribution of task scheduling. S3: and scheduling the tasks to be scheduled to container cluster nodes for execution according to the action probability distribution of the task scheduling, calculating rewards, and updating network parameters of the intelligent agent according to the rewards. S4: and repeating the steps from S2 to S3, training the deep reinforcement learning agent, and enabling the agent to continuously learn and adjust. By establishing the deep reinforcement learning agent and continuously learning and training the agent, the agent can automatically generate a corresponding scheduling strategy and schedule tasks to corresponding container cluster nodes.

Description

Container cluster resource scheduling method and system based on deep reinforcement learning
Technical Field
The invention relates to the field of cluster scheduling, in particular to a container cluster resource scheduling method and system based on deep reinforcement learning.
Background
A container is a form of operating system virtualization, and in practice, one container can be used to run all the contents of a small microservice or software process large application. Different container tasks in a container cluster have different properties. In order to better address the characteristics of container tasks, shorten the average task completion time and realize load balancing among node resources, a scheduling strategy method which considers the current node resource occupation situation and also considers the influence of the future node resource occupation situation on resource contention needs to be designed.
The existing scheduling method of a container cluster comprises the steps of obtaining information and load data of all terminals in the container cluster at a preset time interval, wherein the load data comprises the CPU utilization rate and the memory occupancy rate of the terminals and the transmission rate occupied by a container; calculating the comprehensive load rate of the terminal according to the terminal identification and the load data; and when the comprehensive load rate of the terminal exceeds 50%, scheduling the container cluster. However, the scheduling method is a method for compiling rules by experts, and when the cluster structure or task state changes, the scheduling algorithm needs to be manually adjusted again, so that a large amount of manpower and material resources are wasted, and the scheduling efficiency of cluster resources is reduced.
Disclosure of Invention
The invention provides a container cluster resource scheduling method and system based on deep reinforcement learning, aiming at overcoming the defect that in the prior art, when the cluster structure or task state changes, the manual scheduling algorithm needs to be readjusted, and modeling is carried out on the multi-dimensional characteristics of a container task.
In order to solve the technical problems, the technical scheme of the invention is as follows:
in a first aspect, the invention provides a container cluster resource scheduling method based on deep reinforcement learning, which includes the following steps:
s1: establishing a deep reinforcement learning intelligent agent;
s2: when the allocable resources of a certain container cluster node meet the resource request of a certain task to be scheduled in a task queue, inputting the resource use state of the container cluster node and the characteristic value of the task to be scheduled into a deep reinforcement learning agent to obtain action probability distribution of task scheduling;
s3: the container cluster scheduler schedules the tasks to be scheduled to container cluster nodes for execution according to the action probability distribution of the task scheduling, calculates rewards, and updates the network parameters of the deep reinforcement learning agent according to the rewards;
s4: and repeating S2 to S3 to train the deep reinforcement learning agent, so that the deep reinforcement learning agent continuously learns and adjusts.
According to the technical scheme, a deep reinforcement learning intelligent agent is established, the resource use state of a container cluster node and the characteristic value of a task to be scheduled are input into the deep reinforcement learning intelligent agent to obtain action probability distribution of task scheduling, the container cluster scheduler schedules the task to be scheduled to a corresponding container cluster node according to the action probability distribution, rewards are calculated, network parameters of the deep reinforcement learning intelligent agent are updated according to the rewards to continuously perform learning and training, the deep reinforcement learning intelligent agent can automatically adjust according to the change of the container cluster node or task state information, and finally the deep reinforcement learning intelligent agent can generate a corresponding scheduling strategy and instruct the container cluster scheduler to schedule the task to be scheduled to the corresponding container cluster node.
Preferably, in S3, the calculation formula of the reward is as follows:
rj,i(c)=γp·priorityj(c)-γi1·imblance1(c)-γi2·imblance2(c)
wherein the priorityj(c) Representing that the deep reinforcement learning agent schedules the task T in the time state of the c-th taskjScheduling to appropriate nodes to execute the acquired task priority reward, imblamece1(c) Indicating the degree of imbalance in use between resources within a container cluster node, interference2(c) Representing container cluster sectionsDegree of resource usage imbalance between points, gammapWeight coefficient, gamma, for task priorityi1Using a weight coefficient, γ, of an unbalanced penalty term for resources within a container cluster nodei2An unbalanced weight factor is used for the resource between container cluster nodes.
Preferably, the task priority reward priorityj(c) The specific formula of (a) is as follows:
Figure BDA0003474460870000021
in the formula, gamma1Rewarding coefficient, gamma, for the weight of task latency2Priority rewards, gamma, for task resource request volumes3Awarding priority for task latency sensitivity;
Figure BDA0003474460870000022
the task priority reward obtained by scheduling the task from the perspective of the task resource request amount is represented by the following formula:
Figure BDA0003474460870000023
wherein n is the number of nodes, rCPUFor weighting the reward factor, r, of CPU resourcesMemFor the weighted reward factor, r, of the memory resourceGPUWeighting and rewarding coefficients of the resources corresponding to the GPU;
Figure BDA0003474460870000024
representing node NiThe maximum available amount of CPU resources of (c),
Figure BDA0003474460870000025
representing a node NiThe maximum available amount of memory resources of (a),
Figure BDA0003474460870000026
representing a node NiMaximum usable amount of GPU resources;
Figure BDA0003474460870000031
indicating a task T to be scheduledjThe amount of CPU resource requests of (a),
Figure BDA0003474460870000032
indicating a task T to be scheduledjThe amount of memory resource requests of (a),
Figure BDA0003474460870000033
indicating a task T to be scheduledjThe GPU resource request amount;
Figure BDA0003474460870000034
the task priority reward obtained by scheduling the task from the viewpoint of the task waiting time is represented by the following formula:
Figure BDA0003474460870000035
where Q denotes a task queue, wj(c) Representing a task TjWaiting time of, wk(c) Representing a task TkThe waiting time of (c);
Figure BDA0003474460870000036
representing the task priority reward obtained by scheduling the tasks from the perspective of task delay sensitivity, wherein the tasks to be scheduled comprise reasoning tasks and training tasks,
Figure BDA0003474460870000037
is shown below:
Figure BDA0003474460870000038
preferably, the task priority reward function priority when considering dependencies between tasks to be scheduledj(c) Is shown as:
Figure BDA0003474460870000039
Wherein the priorityz(c) Representing tasks T dependent on task TjzChild (j) indicates a dependency on task TzIs executed.
Preferably, in S2, when the task priority reward of a certain task is greater than 0, it indicates that there is an allocable resource of a certain container cluster node to satisfy the resource request of the task to be scheduled.
Preferably, the degree of imbalance in usage interference between resources within the container cluster nodes is determined by the container cluster node1(c) And resource usage imbalance degree interference between container cluster nodes2(c) The expression of (a) is as follows:
Figure BDA00034744608700000310
Figure BDA00034744608700000311
wherein, Ui(c) Node N in time state representing c scheduling of deep reinforcement learning agentiResource usage of, including CPU resource usage
Figure BDA00034744608700000312
Memory resource usage rate
Figure BDA00034744608700000313
GPU resource usage
Figure BDA0003474460870000041
The expression is as follows:
Figure BDA0003474460870000042
Figure BDA0003474460870000043
Figure BDA0003474460870000044
Figure BDA0003474460870000045
wherein the content of the first and second substances,
Figure BDA0003474460870000046
indicating node N in the state of scheduling for the c-th timeiThe available resources of the CPU are used,
Figure BDA0003474460870000047
indicating node N in the state of scheduling for the c-th timeiThe available memory resources are then available to the user,
Figure BDA0003474460870000048
indicating node N in the state of scheduling for the c-th timeiAvailable GPU resources.
Preferably, in S1, the initial characteristic values include a resource request amount, a waiting time, a depended task index and a task delay sensitivity of the task to be scheduled, and a resource maximum usable amount, a current resource allocable amount and a current resource utilization rate of the container cluster node.
Preferably, the deep reinforcement learning agent comprises a policy network, the state information of the tasks to be scheduled and the container cluster nodes is input into the policy network, and the policy network outputs the action probability distribution of task scheduling.
Preferably, the deep reinforcement learning agent further comprises a value network, wherein the value network is used for scoring the strategy network;
and inputting the initial characteristic value into the value network to obtain the score of the strategy network, and updating the network parameters of the strategy network and the value network according to the reward and the score of the strategy network.
In a second aspect, the present invention further provides a container cluster resource scheduling system based on deep reinforcement learning, including:
the deep reinforcement learning agent is used for obtaining action probability distribution of task scheduling according to the resource use state of a container cluster node and the characteristic value of a task to be scheduled when allocable resources of the container cluster node meet the resource request of the task to be scheduled in a task queue;
the container cluster scheduler is used for scheduling the tasks to be scheduled to the container cluster nodes for execution according to the action probability distribution output by the deep reinforcement learning agent;
and the optimization module is used for calculating rewards, updating the network parameters of the deep reinforcement learning agent according to the rewards and continuously optimizing the deep reinforcement learning agent.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that: according to the method, a deep reinforcement learning intelligent agent is established, the resource use state of a container cluster node and the characteristic value of a task to be scheduled are input into the deep reinforcement learning intelligent agent to obtain the action probability distribution of task scheduling, the container cluster scheduler schedules the task to be scheduled into the corresponding container cluster node according to the action probability distribution, calculates rewards, updates the network parameters of the deep reinforcement learning intelligent agent according to the rewards to continuously learn and train, so that the deep reinforcement learning intelligent agent can automatically adjust according to the change of the container cluster node or the task state information, finally the deep reinforcement learning intelligent agent can generate a corresponding scheduling strategy, and instructs the container cluster scheduler to schedule the task to be scheduled onto the corresponding container cluster node.
Drawings
Fig. 1 is a flowchart of a container cluster resource scheduling method based on deep reinforcement learning.
Fig. 2 is a schematic diagram of a container cluster resource scheduling method based on deep reinforcement learning in embodiment 1.
Fig. 3 is a flowchart of a task scheduling algorithm in embodiment 3.
Fig. 4 is an architecture diagram of a container cluster resource scheduling system based on deep reinforcement learning.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
the technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
Referring to fig. 1-2, the present embodiment provides a container cluster resource scheduling method based on deep reinforcement learning, including the following steps:
s1: establishing a deep reinforcement learning intelligent agent;
s2: when the allocable resources of a certain container cluster node meet the resource request of a certain task to be scheduled in a task queue, inputting the resource use state of the container cluster node and the characteristic value of the task to be scheduled into a deep reinforcement learning agent to obtain action probability distribution of task scheduling;
s3: the container cluster scheduler schedules the tasks to be scheduled to container cluster nodes for execution according to the action probability distribution of the task scheduling, calculates rewards, and updates the network parameters of the deep reinforcement learning agent according to the rewards;
s4: and repeating S2 to S3 to train the deep reinforcement learning agent, so that the deep reinforcement learning agent continuously learns and adjusts.
As shown in fig. 2, fig. 2 is a system architecture diagram of a container cluster resource scheduling method based on deep reinforcement learning according to this embodiment, and it is assumed that a task TjArriving discretely in the form of containers in a Kubernetes cluster, into a task queue of fixed length m maintained by the container cluster. The task requests certain system resources, including a CPU, a memory and a GPU, and the scheduler selects one task from the task queue to be scheduled to a proper node according to the resource request of the task, the time waiting for scheduling, the depended index and the response delay sensitivity and the optimal scheduling strategy, so that the shortest average scheduling completion time and the resource load balance among the nodes are realized.
The method comprises the steps of establishing a deep reinforcement learning intelligent agent, inputting resource use states of container cluster nodes and characteristic values of tasks to be scheduled into the deep reinforcement learning intelligent agent to obtain action probability distribution of task scheduling, scheduling the tasks to be scheduled into corresponding cluster nodes by a container cluster scheduler according to the action probability distribution, calculating rewards, updating network parameters of the deep reinforcement learning intelligent agent according to the rewards to continuously learn and train, enabling the deep reinforcement learning intelligent agent to be capable of adjusting, updating the network parameters of the deep reinforcement learning intelligent agent to continuously learn and train, and finally automatically generating an optimal scheduling strategy by the deep reinforcement learning intelligent agent and instructing the container cluster scheduler to schedule the tasks to be scheduled onto the corresponding cluster nodes.
Example 2
The embodiment provides a container cluster resource scheduling method based on deep reinforcement learning, which includes:
s1: and establishing a deep reinforcement learning intelligent agent.
In this embodiment, the deep reinforcement learning agent interacts with the system environment, and needs to take action, that is, select a task from the task queue, instruct the container cluster scheduler to schedule the task to the corresponding container cluster node for execution, and maximize the long-term reward by observing the environment and the internal state. More specifically, assume that the initial maintenance state of the deep reinforcement learning agent is s0When some node allocable resources meet the resource requirement of some task to be scheduled in the task queue, the deep reinforcement learning agent interacts with the scheduling environment. When scheduling for the c time, the deep reinforcement learning agent maintains the state of scThe current information of the system is described. On that basis, the deep reinforcement learning agent instructs the container cluster scheduler to select action acNamely, a certain task is selected from the task queue and is scheduled to the corresponding container cluster node for execution. The deep reinforcement learning agent then receives a reward r generated by the post-action environmentcGenerating a new state sc+1. Therefore, the sequence of a series of operations of the deep reinforcement learning agent is as follows:
s0,a0,r0,s1,a1,r1,…,sc+1
s2: when the allocable resources of a certain container cluster node meet the resource request of a certain task to be scheduled in the task queue, inputting the resource use state of the container cluster node and the characteristic value of the task to be scheduled into a deep reinforcement learning agent to obtain the action probability distribution of task scheduling.
In this embodiment, the resource usage state of the container cluster node and the feature value of the task to be scheduled include a resource request amount, a waiting time, a task index that is relied on, and a task delay sensitivity of the task to be scheduled, and a maximum resource available amount, a current resource allocable amount, and a current resource utilization rate of the container cluster node are set as N { N ═ N { of the kubernets container cluster1,N2,…,NnIn this, there are n nodes in total. The computing resources which can be provided by each node are different, and the computing resources are assumed to be a CPU, a memory and a GPU.
In this embodiment, the deep reinforcement learning agent includes a policy network. In the specific implementation process, based on an A2C algorithm, the state information of the task to be scheduled and the container cluster node is input into a deep reinforcement learning strategy network pi (a | s; theta)1) The policy network outputs an action probability distribution pi (a | s) for task scheduling.
As a decision unit, the deep reinforcement learning agent is essentially a neural network taking state information of a task to be scheduled and a container cluster node as input, and outputs an action probability distribution pi (a | s), wherein a represents an action, and s represents a state characteristic. The action of the deep reinforcement learning agent is defined as that the deep reinforcement learning agent selects a certain task from the task queue to be scheduled to the corresponding node for execution. The Kubernetes container cluster has n nodes, a task queue with the length of m is maintained, the deep reinforcement learning intelligent agent has n m possible actions, and the network output is the probability distribution of the n m actions.
In this embodiment, the state space of the deep reinforcement learning agent is formed according to the resource usage state of each container cluster node of the current container cluster system and the initial characteristic value set by the task to be scheduled in the task queue.
TABLE 1 State space for deep reinforcement learning Agents
Input device Size and breadth
Maximum available amount of node resources n
Currently allocable amount of node resource n
Current utilization of node resources n
Per resource request amount of task 3m
Task latency m
Task dependent indexing m*m
Task latency sensitivity m
As shown in table 1, table 1 shows a state space of the deep reinforcement learning agent, where table 1 includes a maximum resource available amount RC of a node, a current node resource allocable amount ra (c), a current node resource utilization rate u (c), a resource request amount RT of each task to be scheduled in a task queue, a waiting time W, a task index I that is depended on, and a task delay sensitivity L, all of which are expressed by the same unit proportion.
The expression of the motion space is ac(j, i) indicates that the deep reinforcement learning agent selects the task T from the task queuejTo task TjScheduling to node NiIs executed.
S3: and the container cluster scheduler schedules the tasks to be scheduled to container cluster nodes for execution according to the action probability distribution of the task scheduling, calculates rewards and updates the network parameters of the deep reinforcement learning agent according to the rewards.
In order to effectively train a deep reinforcement learning agent, appropriate rewards need to be set.
In order to avoid the situation that a task running for a long time and having low response sensitivity occupies a large amount of resources and blocks the execution of the task with low resource occupation amount, the average job completion time of a system is increased, a task queue is maintained, and the tasks in the task queue are subjected to dynamic priority sequencing. The higher the reward the scheduler picks for a task with a high priority. The priority ranking is mainly based on the task resource request amount, the task waiting time, the task dependency relationship and the task delay sensitivity, and aims to hope that tasks with higher task priority rewards are scheduled first. Constructing a reward model of the deep reinforcement learning agent according to the task priority reward, the degree of imbalance in resource usage among the container cluster nodes and the degree of imbalance in resource usage among the container cluster nodes:
in order to avoid the influence of the tasks with large resource occupation on the running of a large number of tasks with small resource occupation, the task priority rewarding obtained by scheduling the tasks from the perspective of the task resource request amount
Figure BDA0003474460870000081
The calculation formula of (a) is as follows:
Figure BDA0003474460870000082
wherein n is the number of nodes, rCPUFor weighting the reward factor, r, of CPU resourcesMemFor the weighted reward factor, r, of the memory resourceGPUWeighting and rewarding coefficients of the resources corresponding to the GPU;
Figure BDA0003474460870000083
representing node NiThe maximum available amount of CPU resources of (c),
Figure BDA0003474460870000084
representing a node NiThe maximum available amount of memory resources of (a),
Figure BDA0003474460870000085
representing a node NiMaximum available amount of GPU resources;
Figure BDA0003474460870000086
representing a task T to be scheduledjThe amount of CPU resource requests of (a),
Figure BDA0003474460870000087
indicating a task T to be scheduledjThe amount of memory resource requests of (a),
Figure BDA0003474460870000088
indicating a task T to be scheduledjThe amount of GPU resource requests.
The method avoids new tasks from continuously entering, and leads the tasks to be in a hungry state for a long time. Thus, the longer waiting tasks receive a higher priority reward, and the task priority reward received from scheduling tasks in terms of task latency
Figure BDA0003474460870000089
The calculation formula of (a) is as follows:
Figure BDA00034744608700000810
where Q denotes a task queue, wj(c) Representing a task TjWaiting time of, wk(c) Representing a task TkThe waiting time of (c);
in a cluster, different types of tasks have different delay sensitivities. For example, in a machine learning task cluster, inference tasks often require faster response than training tasks, and thus, for these two task types, task priority rewards are obtained by scheduling tasks from the perspective of task delay sensitivity
Figure BDA0003474460870000091
The formula of (a) is as follows:
Figure BDA0003474460870000092
in addition, under the condition that cluster resources are in shortage, for the condition that no node allocable resources meet the resource requirement of the task to be scheduled, the priority reward of the scheduling task is directly set to be minus infinity, and therefore the priority reward function priority of the taskj(c) The specific formula is as follows:
Figure BDA0003474460870000093
in the formula, gamma1Rewarding coefficient, gamma, for the weight of task latency2Priority rewards, gamma, for task resource request volumes3Priority rewards for task latency sensitivity. priorityj(c) Task T is selected by deep reinforcement learning agent under the time state of scheduling the c-th taskjScheduling to the appropriate node to perform the resulting priority reward.
When the task priority reward of a certain task is negative infinity, no container cluster node can allocate resources to meet the resource requirement of a certain task to be scheduled, and the task is not scheduled until the allocable resources of the container cluster node can meet the resource requirement of the task to be scheduled, namely the priority of the task is greater than 0, and the task to be scheduled can be scheduled to the container cluster node.
In an actual cluster environment, dependency relationships often exist between tasks, and in such a case, the task which is most depended on should be executed preferentially, so that the completion time of the task which depends on the task is shortened. Thus, considering the dependencies between tasks, the priority reward may be expressed as:
Figure BDA0003474460870000094
wherein the priorityz(c) Representing tasks T dependent on task TjzChild (j) indicates a dependency on task TzIs executed.
In addition, the calculation of the scheduling reward obtained by the deep reinforcement learning agent also needs to consider the utilization imbalance degree interference among the resources in the container cluster nodes1(c) And resource usage imbalance degree interference between container cluster nodes2(c) The expression is as follows:
Figure BDA0003474460870000101
Figure BDA0003474460870000102
wherein, Ui(c) Node N in time state representing c scheduling of deep reinforcement learning agentiResource usage of, including CPU resource usage
Figure BDA0003474460870000103
Memory resource usage rate
Figure BDA0003474460870000104
GPU resource usage
Figure BDA0003474460870000105
The expression is as follows:
Figure BDA0003474460870000106
Figure BDA0003474460870000107
Figure BDA0003474460870000108
Figure BDA0003474460870000109
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00034744608700001010
indicating node N in the state of scheduling for the c-th timeiThe available resources of the CPU are used,
Figure BDA00034744608700001011
indicating node N in the state of scheduling for the c-th timeiThe available memory resources are then available to the user,
Figure BDA00034744608700001012
indicating node N in the state of scheduling for the c-th timeiAvailable GPU resources.
Thus deep reinforcement learning agent's reward function rj,i(c) The expression of (a) is as follows:
rj,i(c)=γp·priorityj(c)-γi1·imblance1(c)-γi2·imblance2(c)
wherein, γpWeight coefficient, gamma, for task priorityi1Using weight coefficients, gamma, of unequal penalty terms for resources within a nodei2An unbalanced weight coefficient is used for the inter-node resources.
When the task priority reward of a certain task is greater than 0, the fact that allocable resources of a certain container cluster node meet the resource request of the task to be scheduled is indicated.
In this embodiment, the intelligent deep reinforcement learning agent further includes a value network v (s; θ)2) The value network v (s; theta2) For providing a policy network pi (a | s; theta1) Help the policy network pi (a | s; theta1) And (5) carrying out improvement. In the concrete implementation process, the state information scAnd sc+1Respectively inputting value networks v (s; theta)2) Respectively obtain corresponding scores
Figure BDA00034744608700001013
And
Figure BDA00034744608700001014
according to the score
Figure BDA00034744608700001015
Scoring
Figure BDA00034744608700001016
And a prize rcTraining a deep reinforcement learning agent by using a time difference method (TD method), and updating network parameters of a strategy network and a value network, wherein the specific expression formula is as follows:
Figure BDA0003474460870000111
Figure BDA0003474460870000112
Figure BDA0003474460870000113
Figure BDA0003474460870000114
wherein gamma is a future reward weight factor,
Figure BDA0003474460870000115
for TD object, δcIs TD error, θ'1Represents the updated policy network parameter, θ'2Representing the updated value network parameters.
S4: and repeating S2 to S3 to train the deep reinforcement learning agent, so that the deep reinforcement learning agent continuously learns and adjusts.
Optimizing reward function r for deep reinforcement learning agent with the goal of maximizing total reward modelj,i(c) And the deep reinforcement learning agent generates an optimal scheduling strategy and schedules the tasks to be scheduled to the corresponding container cluster nodes according to the optimal scheduling strategy so as to shorten the average completion time of task scheduling and realize load balance among node resources.
The pseudo code of the container cluster resource scheduling method based on deep reinforcement learning is as follows:
inputting:
resource request RP of task to be scheduled
Waiting time W of task to be scheduled
Depended task index I of task to be scheduled
Task delay sensitivity L of task to be scheduled
Maximum resource availability RC for container cluster node
Current resource allowability RA (c) of Container Cluster node
Current resource utilization of Container Cluster node U (c)
Reward function weight ω
Future reward weight factor gamma
And (3) outputting:
task T to be scheduledjNode N of task schedulingi
The first step is as follows: calculating priority rewards for tasks to be scheduled
The second step is that: and judging whether the reward of a certain task priority is greater than 0, if so, executing the next step, and otherwise, waiting until the condition is met.
The third step: and setting the characteristic value of each state information and inputting the deep reinforcement learning agent.
The RC, RA (c), U (c) and C of the container cluster node in the c-th scheduling state and the RP, W, I, L and other state information of the task to be scheduled form a state characteristic sCInput strategy network pi (a | s; theta)1)
The fourth step: and carrying out scheduling decision according to the action probability distribution pi (a | s) output by the policy network, and scheduling the specified task to the specified node for execution.
Selecting task T to be scheduled according to action probability distribution output by policy networkj
Obtaining selected scheduling node N according to action probability distribution output by policy networki
At node NiUp scheduling task Tj
The fifth step: calculating rewards
Calculating the reward r according to a formulaj,i(c)
And a sixth step: updating network parameters
Characterizing the status information scAnd sc+1Respectively inputting the value network to obtain:
Figure BDA0003474460870000121
and
Figure BDA0003474460870000122
Figure BDA0003474460870000123
calculating TD targets
Figure BDA0003474460870000124
And TD error deltac
Updating value network parameter θ2
Updating policy network parameter θ1
Example 4
Referring to fig. 4, the present embodiment provides a container cluster resource scheduling system based on deep reinforcement learning, which includes a deep reinforcement learning agent, a container cluster scheduler, and an optimization module.
In a specific implementation process, when allocable resources of a certain container cluster node meet a resource request of a certain task to be scheduled in a task queue, the resource use state of the container cluster node and the characteristic value of the task to be scheduled are input into a deep reinforcement learning agent, and a policy network of the deep reinforcement learning agent outputs action probability distribution of task scheduling. And the container cluster scheduler schedules the tasks to be scheduled to corresponding container cluster nodes according to the action probability distribution, the optimization module calculates rewards, updates network parameters of the deep reinforcement learning intelligent agent according to the rewards and continuously optimizes the deep reinforcement learning intelligent agent, so that the deep reinforcement learning intelligent agent can automatically adjust according to the container cluster nodes or the task state information when the state information changes, and finally the deep reinforcement learning intelligent agent can generate corresponding scheduling strategies and instruct the container cluster scheduler to schedule the tasks to be scheduled to the corresponding container cluster nodes.
Terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A container cluster resource scheduling method based on deep reinforcement learning is characterized by comprising the following steps:
s1: establishing a deep reinforcement learning intelligent agent;
s2: when the allocable resources of a certain container cluster node meet the resource request of a certain task to be scheduled in a task queue, inputting the resource use state of the container cluster node and the characteristic value of the task to be scheduled into a deep reinforcement learning agent to obtain action probability distribution of task scheduling;
s3: the container cluster scheduler schedules the tasks to be scheduled to container cluster nodes for execution according to the action probability distribution of the task scheduling, calculates rewards, and updates the network parameters of the deep reinforcement learning agent according to the rewards;
s4: and repeating S2 to S3 to train the deep reinforcement learning agent, so that the deep reinforcement learning agent continuously learns and adjusts.
2. The deep reinforcement learning container cluster resource scheduling method according to claim 1, wherein in S3, the calculation formula of the reward is as follows:
rj,i(c)=γp·priorityj(c)-γi1·imblance1(c)-γi2·imblance2(c)
wherein the priorityj(c) Representing that the deep reinforcement learning agent schedules the task T in the time state of the c-th taskjScheduling to appropriate nodes to execute the acquired task priority reward, opportunity1(c) Indicating the degree of imbalance in use between resources within a container cluster node, interference2(c) Indicates the degree of resource usage imbalance, gamma, between container cluster nodespWeight coefficient, gamma, for task priorityi1Using a weight coefficient, γ, of an unbalanced penalty term for resources within a container cluster nodei2Is a container setThe resources between the cluster nodes use an unbalanced weight factor.
3. The method of claim 2, wherein the task priority reward priority is priorityj(c) The specific formula of (a) is as follows:
Figure FDA0003474460860000011
in the formula, gamma1Rewarding coefficient, gamma, for the weight of task latency2Priority rewards, gamma, for task resource request volumes3Rewarding the task delay sensitivity priority;
Figure FDA0003474460860000012
the task priority reward obtained by scheduling the task from the perspective of the task resource request amount is represented by the following formula:
Figure FDA0003474460860000013
wherein n is the number of nodes, rCPUFor weighting the reward factor, r, of CPU resourcesMemFor the weighted reward factor of the memory, rGPUWeighting and rewarding coefficients of the resources corresponding to the GPU;
Figure FDA0003474460860000021
representing a node NiThe maximum available amount of CPU resources of (c),
Figure FDA0003474460860000022
representing node NiThe maximum available amount of memory resources of (a),
Figure FDA0003474460860000023
representing a node NiMaximum usable amount of GPU resources;
Figure FDA0003474460860000024
indicating a task T to be scheduledjThe amount of CPU resource requests of (a),
Figure FDA0003474460860000025
indicating a task T to be scheduledjThe amount of memory requests of (a) to be requested,
Figure FDA0003474460860000026
indicating a task T to be scheduledjThe GPU resource request amount;
Figure FDA0003474460860000027
represents the task priority reward obtained by scheduling the task from the viewpoint of task waiting time, and the formula is as follows:
Figure FDA0003474460860000028
where Q denotes a task queue, wj(c) Representing a task TjWaiting time of, wk(c) Representing a task TkThe waiting time of (c);
Figure FDA0003474460860000029
represents the task priority reward obtained by scheduling the tasks from the perspective of task delay sensitivity, the tasks to be scheduled comprise an inference task and a training task,
Figure FDA00034744608600000210
the formula of (a) is as follows:
Figure FDA00034744608600000211
4. the method for scheduling container cluster resources based on deep reinforcement learning of claim 3, wherein the task priority reward function priority is used when considering the dependency relationship between the tasks to be scheduledj(c) Expressed as:
Figure FDA00034744608600000212
wherein the priorityz(c) Representing tasks T dependent on task TjzChild (j) indicates a dependency on task TjIs executed.
5. The method according to claim 4, wherein in step S2, when the task priority reward of a certain task is greater than 0, it indicates that there is an allocable resource of a certain container cluster node that meets the resource request of the task to be scheduled.
6. The method as claimed in claim 2, wherein the degree of imbalance used between resources in the container cluster nodes is equal to or less than zero1(c) And resource usage imbalance degree interference between container cluster nodes2(c) The expression of (a) is as follows:
Figure FDA0003474460860000031
Figure FDA0003474460860000032
wherein, Ui(c) Node N in time state representing c scheduling of deep reinforcement learning agentiResource usage of, including CPU resource usage
Figure FDA0003474460860000033
Memory resource usage rate
Figure FDA0003474460860000034
GPU resource usage
Figure FDA0003474460860000035
The expression is as follows:
Figure FDA0003474460860000036
Figure FDA0003474460860000037
Figure FDA0003474460860000038
Figure FDA0003474460860000039
wherein the content of the first and second substances,
Figure FDA00034744608600000310
indicating node N in the state of scheduling for the c-th timeiThe available resources of the CPU are used,
Figure FDA00034744608600000311
indicating node N in the state of scheduling for the c-th timeiThe available memory resources are then available to the user,
Figure FDA00034744608600000312
indicating node N in the state of scheduling for the c-th timeiAvailable GPU resources.
7. The deep reinforcement learning container cluster resource scheduling method according to any one of claims 1 to 6, wherein in S1, the initial characteristic values include resource request amount, waiting time, task index depended on and task delay sensitivity of a task to be scheduled, and resource maximum usable amount, current resource allocable amount and current resource utilization rate of a container cluster node.
8. The deep-reinforcement-learning container cluster resource scheduling method of claim 7, wherein the deep-reinforcement-learning agent comprises a policy network, and wherein initial characteristic values are input into the policy network, and wherein the policy network outputs an action probability distribution for task scheduling.
9. The deep-reinforcement learning container cluster resource scheduling method of claim 8, wherein the deep-reinforcement learning agent further comprises a value network for scoring a policy network;
and inputting the initial characteristic value into the value network to obtain the score of the strategy network, and updating the network parameters of the strategy network and the value network according to the reward and the score of the strategy network.
10. A container cluster resource scheduling system based on deep reinforcement learning, comprising:
the deep reinforcement learning agent is used for obtaining action probability distribution of task scheduling according to the resource use state of a container cluster node and the characteristic value of a task to be scheduled when allocable resources of the container cluster node meet the resource request of the task to be scheduled in a task queue;
the container cluster scheduler is used for scheduling the tasks to be scheduled to the container cluster nodes for execution according to the action probability distribution output by the deep reinforcement learning agent;
and the optimization module is used for calculating rewards, updating the network parameters of the deep reinforcement learning agent according to the rewards and continuously optimizing the deep reinforcement learning agent.
CN202210051579.9A 2022-01-17 2022-01-17 Container cluster resource scheduling method and system based on deep reinforcement learning Pending CN114443249A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210051579.9A CN114443249A (en) 2022-01-17 2022-01-17 Container cluster resource scheduling method and system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210051579.9A CN114443249A (en) 2022-01-17 2022-01-17 Container cluster resource scheduling method and system based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN114443249A true CN114443249A (en) 2022-05-06

Family

ID=81368646

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210051579.9A Pending CN114443249A (en) 2022-01-17 2022-01-17 Container cluster resource scheduling method and system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114443249A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610474A (en) * 2022-05-12 2022-06-10 之江实验室 Multi-strategy job scheduling method and system in heterogeneous supercomputing environment
CN115237581A (en) * 2022-09-21 2022-10-25 之江实验室 Heterogeneous computing power-oriented multi-strategy intelligent scheduling method and device
CN115293451A (en) * 2022-08-24 2022-11-04 中国西安卫星测控中心 Resource dynamic scheduling method based on deep reinforcement learning
CN115361301A (en) * 2022-10-09 2022-11-18 之江实验室 Distributed computing network cooperative traffic scheduling system and method based on DQN
CN116320843A (en) * 2023-04-24 2023-06-23 华南师范大学 Queue request mobilization method and device for elastic optical network
CN117971502A (en) * 2024-03-29 2024-05-03 南京认知物联网研究院有限公司 Method and device for carrying out online optimization scheduling on AI reasoning cluster

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610474A (en) * 2022-05-12 2022-06-10 之江实验室 Multi-strategy job scheduling method and system in heterogeneous supercomputing environment
CN115293451A (en) * 2022-08-24 2022-11-04 中国西安卫星测控中心 Resource dynamic scheduling method based on deep reinforcement learning
CN115293451B (en) * 2022-08-24 2023-06-16 中国西安卫星测控中心 Resource dynamic scheduling method based on deep reinforcement learning
CN115237581A (en) * 2022-09-21 2022-10-25 之江实验室 Heterogeneous computing power-oriented multi-strategy intelligent scheduling method and device
CN115237581B (en) * 2022-09-21 2022-12-27 之江实验室 Heterogeneous computing power-oriented multi-strategy intelligent scheduling method and device
CN115361301A (en) * 2022-10-09 2022-11-18 之江实验室 Distributed computing network cooperative traffic scheduling system and method based on DQN
US12021751B2 (en) 2022-10-09 2024-06-25 Zhejiang Lab DQN-based distributed computing network coordinate flow scheduling system and method
CN116320843A (en) * 2023-04-24 2023-06-23 华南师范大学 Queue request mobilization method and device for elastic optical network
CN116320843B (en) * 2023-04-24 2023-07-25 华南师范大学 Queue request mobilization method and device for elastic optical network
CN117971502A (en) * 2024-03-29 2024-05-03 南京认知物联网研究院有限公司 Method and device for carrying out online optimization scheduling on AI reasoning cluster

Similar Documents

Publication Publication Date Title
CN114443249A (en) Container cluster resource scheduling method and system based on deep reinforcement learning
CN111756812B (en) Energy consumption perception edge cloud cooperation dynamic unloading scheduling method
CN108958916B (en) Workflow unloading optimization method under mobile edge environment
CN109561148A (en) Distributed task dispatching method in edge calculations network based on directed acyclic graph
CN109960585A (en) A kind of resource regulating method based on kubernetes
CN109788046B (en) Multi-strategy edge computing resource scheduling method based on improved bee colony algorithm
CN113225377A (en) Internet of things edge task unloading method and device
CN112817728B (en) Task scheduling method, network device and storage medium
CN108055292B (en) Optimization method for mapping from virtual machine to physical machine
CN109656713B (en) Container scheduling method based on edge computing framework
CN113127169A (en) Efficient link scheduling method for dynamic workflow in data center network
CN115033357A (en) Micro-service workflow scheduling method and device based on dynamic resource selection strategy
CN114675975B (en) Job scheduling method, device and equipment based on reinforcement learning
CN116302389A (en) Task scheduling method based on improved ant colony algorithm
CN110958192B (en) Virtual data center resource allocation system and method based on virtual switch
CN109298932B (en) OpenFlow-based resource scheduling method, scheduler and system
CN116069473A (en) Deep reinforcement learning-based Yarn cluster workflow scheduling method
CN116932198A (en) Resource scheduling method, device, electronic equipment and readable storage medium
CN113179175B (en) Real-time bandwidth prediction method and device for power communication network service
Li et al. SLA-based task offloading for energy consumption constrained workflows in fog computing
CN112698911B (en) Cloud job scheduling method based on deep reinforcement learning
CN114090239A (en) Model-based reinforcement learning edge resource scheduling method and device
CN113256128A (en) Task scheduling method for balancing resource usage by reinforcement learning in power internet of things
Zhang et al. Online joint scheduling of delay-sensitive and computation-oriented tasks in edge computing
CN116909717B (en) Task scheduling method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination