CN114443249A - Container cluster resource scheduling method and system based on deep reinforcement learning - Google Patents
Container cluster resource scheduling method and system based on deep reinforcement learning Download PDFInfo
- Publication number
- CN114443249A CN114443249A CN202210051579.9A CN202210051579A CN114443249A CN 114443249 A CN114443249 A CN 114443249A CN 202210051579 A CN202210051579 A CN 202210051579A CN 114443249 A CN114443249 A CN 114443249A
- Authority
- CN
- China
- Prior art keywords
- task
- container cluster
- reinforcement learning
- scheduling
- resource
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 109
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000009471 action Effects 0.000 claims abstract description 35
- 230000035945 sensitivity Effects 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 230000001419 dependent effect Effects 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 4
- 239000000126 substance Substances 0.000 claims description 2
- 230000008569 process Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5021—Priority
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- General Factory Administration (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a container cluster resource scheduling method and system based on deep reinforcement learning, which comprises the following steps: s1: and establishing a deep reinforcement learning intelligent agent. S2: when the allocable resources of a certain container cluster node meet the resource request of a certain task to be scheduled in the task queue, inputting the resource use state of the container cluster node and the characteristic value of the task to be scheduled into a deep reinforcement learning agent to obtain the action probability distribution of task scheduling. S3: and scheduling the tasks to be scheduled to container cluster nodes for execution according to the action probability distribution of the task scheduling, calculating rewards, and updating network parameters of the intelligent agent according to the rewards. S4: and repeating the steps from S2 to S3, training the deep reinforcement learning agent, and enabling the agent to continuously learn and adjust. By establishing the deep reinforcement learning agent and continuously learning and training the agent, the agent can automatically generate a corresponding scheduling strategy and schedule tasks to corresponding container cluster nodes.
Description
Technical Field
The invention relates to the field of cluster scheduling, in particular to a container cluster resource scheduling method and system based on deep reinforcement learning.
Background
A container is a form of operating system virtualization, and in practice, one container can be used to run all the contents of a small microservice or software process large application. Different container tasks in a container cluster have different properties. In order to better address the characteristics of container tasks, shorten the average task completion time and realize load balancing among node resources, a scheduling strategy method which considers the current node resource occupation situation and also considers the influence of the future node resource occupation situation on resource contention needs to be designed.
The existing scheduling method of a container cluster comprises the steps of obtaining information and load data of all terminals in the container cluster at a preset time interval, wherein the load data comprises the CPU utilization rate and the memory occupancy rate of the terminals and the transmission rate occupied by a container; calculating the comprehensive load rate of the terminal according to the terminal identification and the load data; and when the comprehensive load rate of the terminal exceeds 50%, scheduling the container cluster. However, the scheduling method is a method for compiling rules by experts, and when the cluster structure or task state changes, the scheduling algorithm needs to be manually adjusted again, so that a large amount of manpower and material resources are wasted, and the scheduling efficiency of cluster resources is reduced.
Disclosure of Invention
The invention provides a container cluster resource scheduling method and system based on deep reinforcement learning, aiming at overcoming the defect that in the prior art, when the cluster structure or task state changes, the manual scheduling algorithm needs to be readjusted, and modeling is carried out on the multi-dimensional characteristics of a container task.
In order to solve the technical problems, the technical scheme of the invention is as follows:
in a first aspect, the invention provides a container cluster resource scheduling method based on deep reinforcement learning, which includes the following steps:
s1: establishing a deep reinforcement learning intelligent agent;
s2: when the allocable resources of a certain container cluster node meet the resource request of a certain task to be scheduled in a task queue, inputting the resource use state of the container cluster node and the characteristic value of the task to be scheduled into a deep reinforcement learning agent to obtain action probability distribution of task scheduling;
s3: the container cluster scheduler schedules the tasks to be scheduled to container cluster nodes for execution according to the action probability distribution of the task scheduling, calculates rewards, and updates the network parameters of the deep reinforcement learning agent according to the rewards;
s4: and repeating S2 to S3 to train the deep reinforcement learning agent, so that the deep reinforcement learning agent continuously learns and adjusts.
According to the technical scheme, a deep reinforcement learning intelligent agent is established, the resource use state of a container cluster node and the characteristic value of a task to be scheduled are input into the deep reinforcement learning intelligent agent to obtain action probability distribution of task scheduling, the container cluster scheduler schedules the task to be scheduled to a corresponding container cluster node according to the action probability distribution, rewards are calculated, network parameters of the deep reinforcement learning intelligent agent are updated according to the rewards to continuously perform learning and training, the deep reinforcement learning intelligent agent can automatically adjust according to the change of the container cluster node or task state information, and finally the deep reinforcement learning intelligent agent can generate a corresponding scheduling strategy and instruct the container cluster scheduler to schedule the task to be scheduled to the corresponding container cluster node.
Preferably, in S3, the calculation formula of the reward is as follows:
rj,i(c)=γp·priorityj(c)-γi1·imblance1(c)-γi2·imblance2(c)
wherein the priorityj(c) Representing that the deep reinforcement learning agent schedules the task T in the time state of the c-th taskjScheduling to appropriate nodes to execute the acquired task priority reward, imblamece1(c) Indicating the degree of imbalance in use between resources within a container cluster node, interference2(c) Representing container cluster sectionsDegree of resource usage imbalance between points, gammapWeight coefficient, gamma, for task priorityi1Using a weight coefficient, γ, of an unbalanced penalty term for resources within a container cluster nodei2An unbalanced weight factor is used for the resource between container cluster nodes.
Preferably, the task priority reward priorityj(c) The specific formula of (a) is as follows:
in the formula, gamma1Rewarding coefficient, gamma, for the weight of task latency2Priority rewards, gamma, for task resource request volumes3Awarding priority for task latency sensitivity;the task priority reward obtained by scheduling the task from the perspective of the task resource request amount is represented by the following formula:
wherein n is the number of nodes, rCPUFor weighting the reward factor, r, of CPU resourcesMemFor the weighted reward factor, r, of the memory resourceGPUWeighting and rewarding coefficients of the resources corresponding to the GPU;representing node NiThe maximum available amount of CPU resources of (c),representing a node NiThe maximum available amount of memory resources of (a),representing a node NiMaximum usable amount of GPU resources;indicating a task T to be scheduledjThe amount of CPU resource requests of (a),indicating a task T to be scheduledjThe amount of memory resource requests of (a),indicating a task T to be scheduledjThe GPU resource request amount;
the task priority reward obtained by scheduling the task from the viewpoint of the task waiting time is represented by the following formula:
where Q denotes a task queue, wj(c) Representing a task TjWaiting time of, wk(c) Representing a task TkThe waiting time of (c);
representing the task priority reward obtained by scheduling the tasks from the perspective of task delay sensitivity, wherein the tasks to be scheduled comprise reasoning tasks and training tasks,is shown below:
preferably, the task priority reward function priority when considering dependencies between tasks to be scheduledj(c) Is shown as:
Wherein the priorityz(c) Representing tasks T dependent on task TjzChild (j) indicates a dependency on task TzIs executed.
Preferably, in S2, when the task priority reward of a certain task is greater than 0, it indicates that there is an allocable resource of a certain container cluster node to satisfy the resource request of the task to be scheduled.
Preferably, the degree of imbalance in usage interference between resources within the container cluster nodes is determined by the container cluster node1(c) And resource usage imbalance degree interference between container cluster nodes2(c) The expression of (a) is as follows:
wherein, Ui(c) Node N in time state representing c scheduling of deep reinforcement learning agentiResource usage of, including CPU resource usageMemory resource usage rateGPU resource usageThe expression is as follows:
wherein the content of the first and second substances,indicating node N in the state of scheduling for the c-th timeiThe available resources of the CPU are used,indicating node N in the state of scheduling for the c-th timeiThe available memory resources are then available to the user,indicating node N in the state of scheduling for the c-th timeiAvailable GPU resources.
Preferably, in S1, the initial characteristic values include a resource request amount, a waiting time, a depended task index and a task delay sensitivity of the task to be scheduled, and a resource maximum usable amount, a current resource allocable amount and a current resource utilization rate of the container cluster node.
Preferably, the deep reinforcement learning agent comprises a policy network, the state information of the tasks to be scheduled and the container cluster nodes is input into the policy network, and the policy network outputs the action probability distribution of task scheduling.
Preferably, the deep reinforcement learning agent further comprises a value network, wherein the value network is used for scoring the strategy network;
and inputting the initial characteristic value into the value network to obtain the score of the strategy network, and updating the network parameters of the strategy network and the value network according to the reward and the score of the strategy network.
In a second aspect, the present invention further provides a container cluster resource scheduling system based on deep reinforcement learning, including:
the deep reinforcement learning agent is used for obtaining action probability distribution of task scheduling according to the resource use state of a container cluster node and the characteristic value of a task to be scheduled when allocable resources of the container cluster node meet the resource request of the task to be scheduled in a task queue;
the container cluster scheduler is used for scheduling the tasks to be scheduled to the container cluster nodes for execution according to the action probability distribution output by the deep reinforcement learning agent;
and the optimization module is used for calculating rewards, updating the network parameters of the deep reinforcement learning agent according to the rewards and continuously optimizing the deep reinforcement learning agent.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that: according to the method, a deep reinforcement learning intelligent agent is established, the resource use state of a container cluster node and the characteristic value of a task to be scheduled are input into the deep reinforcement learning intelligent agent to obtain the action probability distribution of task scheduling, the container cluster scheduler schedules the task to be scheduled into the corresponding container cluster node according to the action probability distribution, calculates rewards, updates the network parameters of the deep reinforcement learning intelligent agent according to the rewards to continuously learn and train, so that the deep reinforcement learning intelligent agent can automatically adjust according to the change of the container cluster node or the task state information, finally the deep reinforcement learning intelligent agent can generate a corresponding scheduling strategy, and instructs the container cluster scheduler to schedule the task to be scheduled onto the corresponding container cluster node.
Drawings
Fig. 1 is a flowchart of a container cluster resource scheduling method based on deep reinforcement learning.
Fig. 2 is a schematic diagram of a container cluster resource scheduling method based on deep reinforcement learning in embodiment 1.
Fig. 3 is a flowchart of a task scheduling algorithm in embodiment 3.
Fig. 4 is an architecture diagram of a container cluster resource scheduling system based on deep reinforcement learning.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
the technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
Referring to fig. 1-2, the present embodiment provides a container cluster resource scheduling method based on deep reinforcement learning, including the following steps:
s1: establishing a deep reinforcement learning intelligent agent;
s2: when the allocable resources of a certain container cluster node meet the resource request of a certain task to be scheduled in a task queue, inputting the resource use state of the container cluster node and the characteristic value of the task to be scheduled into a deep reinforcement learning agent to obtain action probability distribution of task scheduling;
s3: the container cluster scheduler schedules the tasks to be scheduled to container cluster nodes for execution according to the action probability distribution of the task scheduling, calculates rewards, and updates the network parameters of the deep reinforcement learning agent according to the rewards;
s4: and repeating S2 to S3 to train the deep reinforcement learning agent, so that the deep reinforcement learning agent continuously learns and adjusts.
As shown in fig. 2, fig. 2 is a system architecture diagram of a container cluster resource scheduling method based on deep reinforcement learning according to this embodiment, and it is assumed that a task TjArriving discretely in the form of containers in a Kubernetes cluster, into a task queue of fixed length m maintained by the container cluster. The task requests certain system resources, including a CPU, a memory and a GPU, and the scheduler selects one task from the task queue to be scheduled to a proper node according to the resource request of the task, the time waiting for scheduling, the depended index and the response delay sensitivity and the optimal scheduling strategy, so that the shortest average scheduling completion time and the resource load balance among the nodes are realized.
The method comprises the steps of establishing a deep reinforcement learning intelligent agent, inputting resource use states of container cluster nodes and characteristic values of tasks to be scheduled into the deep reinforcement learning intelligent agent to obtain action probability distribution of task scheduling, scheduling the tasks to be scheduled into corresponding cluster nodes by a container cluster scheduler according to the action probability distribution, calculating rewards, updating network parameters of the deep reinforcement learning intelligent agent according to the rewards to continuously learn and train, enabling the deep reinforcement learning intelligent agent to be capable of adjusting, updating the network parameters of the deep reinforcement learning intelligent agent to continuously learn and train, and finally automatically generating an optimal scheduling strategy by the deep reinforcement learning intelligent agent and instructing the container cluster scheduler to schedule the tasks to be scheduled onto the corresponding cluster nodes.
Example 2
The embodiment provides a container cluster resource scheduling method based on deep reinforcement learning, which includes:
s1: and establishing a deep reinforcement learning intelligent agent.
In this embodiment, the deep reinforcement learning agent interacts with the system environment, and needs to take action, that is, select a task from the task queue, instruct the container cluster scheduler to schedule the task to the corresponding container cluster node for execution, and maximize the long-term reward by observing the environment and the internal state. More specifically, assume that the initial maintenance state of the deep reinforcement learning agent is s0When some node allocable resources meet the resource requirement of some task to be scheduled in the task queue, the deep reinforcement learning agent interacts with the scheduling environment. When scheduling for the c time, the deep reinforcement learning agent maintains the state of scThe current information of the system is described. On that basis, the deep reinforcement learning agent instructs the container cluster scheduler to select action acNamely, a certain task is selected from the task queue and is scheduled to the corresponding container cluster node for execution. The deep reinforcement learning agent then receives a reward r generated by the post-action environmentcGenerating a new state sc+1. Therefore, the sequence of a series of operations of the deep reinforcement learning agent is as follows:
s0,a0,r0,s1,a1,r1,…,sc+1。
s2: when the allocable resources of a certain container cluster node meet the resource request of a certain task to be scheduled in the task queue, inputting the resource use state of the container cluster node and the characteristic value of the task to be scheduled into a deep reinforcement learning agent to obtain the action probability distribution of task scheduling.
In this embodiment, the resource usage state of the container cluster node and the feature value of the task to be scheduled include a resource request amount, a waiting time, a task index that is relied on, and a task delay sensitivity of the task to be scheduled, and a maximum resource available amount, a current resource allocable amount, and a current resource utilization rate of the container cluster node are set as N { N ═ N { of the kubernets container cluster1,N2,…,NnIn this, there are n nodes in total. The computing resources which can be provided by each node are different, and the computing resources are assumed to be a CPU, a memory and a GPU.
In this embodiment, the deep reinforcement learning agent includes a policy network. In the specific implementation process, based on an A2C algorithm, the state information of the task to be scheduled and the container cluster node is input into a deep reinforcement learning strategy network pi (a | s; theta)1) The policy network outputs an action probability distribution pi (a | s) for task scheduling.
As a decision unit, the deep reinforcement learning agent is essentially a neural network taking state information of a task to be scheduled and a container cluster node as input, and outputs an action probability distribution pi (a | s), wherein a represents an action, and s represents a state characteristic. The action of the deep reinforcement learning agent is defined as that the deep reinforcement learning agent selects a certain task from the task queue to be scheduled to the corresponding node for execution. The Kubernetes container cluster has n nodes, a task queue with the length of m is maintained, the deep reinforcement learning intelligent agent has n m possible actions, and the network output is the probability distribution of the n m actions.
In this embodiment, the state space of the deep reinforcement learning agent is formed according to the resource usage state of each container cluster node of the current container cluster system and the initial characteristic value set by the task to be scheduled in the task queue.
TABLE 1 State space for deep reinforcement learning Agents
Input device | Size and breadth |
Maximum available amount of node resources | n |
Currently allocable amount of node resource | n |
Current utilization of node resources | n |
Per resource request amount of task | 3m |
Task latency | m |
Task dependent indexing | m*m |
Task latency sensitivity | m |
As shown in table 1, table 1 shows a state space of the deep reinforcement learning agent, where table 1 includes a maximum resource available amount RC of a node, a current node resource allocable amount ra (c), a current node resource utilization rate u (c), a resource request amount RT of each task to be scheduled in a task queue, a waiting time W, a task index I that is depended on, and a task delay sensitivity L, all of which are expressed by the same unit proportion.
The expression of the motion space is ac(j, i) indicates that the deep reinforcement learning agent selects the task T from the task queuejTo task TjScheduling to node NiIs executed.
S3: and the container cluster scheduler schedules the tasks to be scheduled to container cluster nodes for execution according to the action probability distribution of the task scheduling, calculates rewards and updates the network parameters of the deep reinforcement learning agent according to the rewards.
In order to effectively train a deep reinforcement learning agent, appropriate rewards need to be set.
In order to avoid the situation that a task running for a long time and having low response sensitivity occupies a large amount of resources and blocks the execution of the task with low resource occupation amount, the average job completion time of a system is increased, a task queue is maintained, and the tasks in the task queue are subjected to dynamic priority sequencing. The higher the reward the scheduler picks for a task with a high priority. The priority ranking is mainly based on the task resource request amount, the task waiting time, the task dependency relationship and the task delay sensitivity, and aims to hope that tasks with higher task priority rewards are scheduled first. Constructing a reward model of the deep reinforcement learning agent according to the task priority reward, the degree of imbalance in resource usage among the container cluster nodes and the degree of imbalance in resource usage among the container cluster nodes:
in order to avoid the influence of the tasks with large resource occupation on the running of a large number of tasks with small resource occupation, the task priority rewarding obtained by scheduling the tasks from the perspective of the task resource request amountThe calculation formula of (a) is as follows:
wherein n is the number of nodes, rCPUFor weighting the reward factor, r, of CPU resourcesMemFor the weighted reward factor, r, of the memory resourceGPUWeighting and rewarding coefficients of the resources corresponding to the GPU;representing node NiThe maximum available amount of CPU resources of (c),representing a node NiThe maximum available amount of memory resources of (a),representing a node NiMaximum available amount of GPU resources;representing a task T to be scheduledjThe amount of CPU resource requests of (a),indicating a task T to be scheduledjThe amount of memory resource requests of (a),indicating a task T to be scheduledjThe amount of GPU resource requests.
The method avoids new tasks from continuously entering, and leads the tasks to be in a hungry state for a long time. Thus, the longer waiting tasks receive a higher priority reward, and the task priority reward received from scheduling tasks in terms of task latencyThe calculation formula of (a) is as follows:
where Q denotes a task queue, wj(c) Representing a task TjWaiting time of, wk(c) Representing a task TkThe waiting time of (c);
in a cluster, different types of tasks have different delay sensitivities. For example, in a machine learning task cluster, inference tasks often require faster response than training tasks, and thus, for these two task types, task priority rewards are obtained by scheduling tasks from the perspective of task delay sensitivityThe formula of (a) is as follows:
in addition, under the condition that cluster resources are in shortage, for the condition that no node allocable resources meet the resource requirement of the task to be scheduled, the priority reward of the scheduling task is directly set to be minus infinity, and therefore the priority reward function priority of the taskj(c) The specific formula is as follows:
in the formula, gamma1Rewarding coefficient, gamma, for the weight of task latency2Priority rewards, gamma, for task resource request volumes3Priority rewards for task latency sensitivity. priorityj(c) Task T is selected by deep reinforcement learning agent under the time state of scheduling the c-th taskjScheduling to the appropriate node to perform the resulting priority reward.
When the task priority reward of a certain task is negative infinity, no container cluster node can allocate resources to meet the resource requirement of a certain task to be scheduled, and the task is not scheduled until the allocable resources of the container cluster node can meet the resource requirement of the task to be scheduled, namely the priority of the task is greater than 0, and the task to be scheduled can be scheduled to the container cluster node.
In an actual cluster environment, dependency relationships often exist between tasks, and in such a case, the task which is most depended on should be executed preferentially, so that the completion time of the task which depends on the task is shortened. Thus, considering the dependencies between tasks, the priority reward may be expressed as:
wherein the priorityz(c) Representing tasks T dependent on task TjzChild (j) indicates a dependency on task TzIs executed.
In addition, the calculation of the scheduling reward obtained by the deep reinforcement learning agent also needs to consider the utilization imbalance degree interference among the resources in the container cluster nodes1(c) And resource usage imbalance degree interference between container cluster nodes2(c) The expression is as follows:
wherein, Ui(c) Node N in time state representing c scheduling of deep reinforcement learning agentiResource usage of, including CPU resource usageMemory resource usage rateGPU resource usageThe expression is as follows:
wherein, the first and the second end of the pipe are connected with each other,indicating node N in the state of scheduling for the c-th timeiThe available resources of the CPU are used,indicating node N in the state of scheduling for the c-th timeiThe available memory resources are then available to the user,indicating node N in the state of scheduling for the c-th timeiAvailable GPU resources.
Thus deep reinforcement learning agent's reward function rj,i(c) The expression of (a) is as follows:
rj,i(c)=γp·priorityj(c)-γi1·imblance1(c)-γi2·imblance2(c)
wherein, γpWeight coefficient, gamma, for task priorityi1Using weight coefficients, gamma, of unequal penalty terms for resources within a nodei2An unbalanced weight coefficient is used for the inter-node resources.
When the task priority reward of a certain task is greater than 0, the fact that allocable resources of a certain container cluster node meet the resource request of the task to be scheduled is indicated.
In this embodiment, the intelligent deep reinforcement learning agent further includes a value network v (s; θ)2) The value network v (s; theta2) For providing a policy network pi (a | s; theta1) Help the policy network pi (a | s; theta1) And (5) carrying out improvement. In the concrete implementation process, the state information scAnd sc+1Respectively inputting value networks v (s; theta)2) Respectively obtain corresponding scoresAndaccording to the scoreScoringAnd a prize rcTraining a deep reinforcement learning agent by using a time difference method (TD method), and updating network parameters of a strategy network and a value network, wherein the specific expression formula is as follows:
wherein gamma is a future reward weight factor,for TD object, δcIs TD error, θ'1Represents the updated policy network parameter, θ'2Representing the updated value network parameters.
S4: and repeating S2 to S3 to train the deep reinforcement learning agent, so that the deep reinforcement learning agent continuously learns and adjusts.
Optimizing reward function r for deep reinforcement learning agent with the goal of maximizing total reward modelj,i(c) And the deep reinforcement learning agent generates an optimal scheduling strategy and schedules the tasks to be scheduled to the corresponding container cluster nodes according to the optimal scheduling strategy so as to shorten the average completion time of task scheduling and realize load balance among node resources.
The pseudo code of the container cluster resource scheduling method based on deep reinforcement learning is as follows:
inputting:
resource request RP of task to be scheduled
Waiting time W of task to be scheduled
Depended task index I of task to be scheduled
Task delay sensitivity L of task to be scheduled
Maximum resource availability RC for container cluster node
Current resource allowability RA (c) of Container Cluster node
Current resource utilization of Container Cluster node U (c)
Reward function weight ω
Future reward weight factor gamma
And (3) outputting:
task T to be scheduledjNode N of task schedulingi
The first step is as follows: calculating priority rewards for tasks to be scheduled
The second step is that: and judging whether the reward of a certain task priority is greater than 0, if so, executing the next step, and otherwise, waiting until the condition is met.
The third step: and setting the characteristic value of each state information and inputting the deep reinforcement learning agent.
The RC, RA (c), U (c) and C of the container cluster node in the c-th scheduling state and the RP, W, I, L and other state information of the task to be scheduled form a state characteristic sCInput strategy network pi (a | s; theta)1)
The fourth step: and carrying out scheduling decision according to the action probability distribution pi (a | s) output by the policy network, and scheduling the specified task to the specified node for execution.
Selecting task T to be scheduled according to action probability distribution output by policy networkj
Obtaining selected scheduling node N according to action probability distribution output by policy networki
At node NiUp scheduling task Tj
The fifth step: calculating rewards
Calculating the reward r according to a formulaj,i(c)
And a sixth step: updating network parameters
Characterizing the status information scAnd sc+1Respectively inputting the value network to obtain:and
Updating value network parameter θ2
Updating policy network parameter θ1。
Example 4
Referring to fig. 4, the present embodiment provides a container cluster resource scheduling system based on deep reinforcement learning, which includes a deep reinforcement learning agent, a container cluster scheduler, and an optimization module.
In a specific implementation process, when allocable resources of a certain container cluster node meet a resource request of a certain task to be scheduled in a task queue, the resource use state of the container cluster node and the characteristic value of the task to be scheduled are input into a deep reinforcement learning agent, and a policy network of the deep reinforcement learning agent outputs action probability distribution of task scheduling. And the container cluster scheduler schedules the tasks to be scheduled to corresponding container cluster nodes according to the action probability distribution, the optimization module calculates rewards, updates network parameters of the deep reinforcement learning intelligent agent according to the rewards and continuously optimizes the deep reinforcement learning intelligent agent, so that the deep reinforcement learning intelligent agent can automatically adjust according to the container cluster nodes or the task state information when the state information changes, and finally the deep reinforcement learning intelligent agent can generate corresponding scheduling strategies and instruct the container cluster scheduler to schedule the tasks to be scheduled to the corresponding container cluster nodes.
Terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (10)
1. A container cluster resource scheduling method based on deep reinforcement learning is characterized by comprising the following steps:
s1: establishing a deep reinforcement learning intelligent agent;
s2: when the allocable resources of a certain container cluster node meet the resource request of a certain task to be scheduled in a task queue, inputting the resource use state of the container cluster node and the characteristic value of the task to be scheduled into a deep reinforcement learning agent to obtain action probability distribution of task scheduling;
s3: the container cluster scheduler schedules the tasks to be scheduled to container cluster nodes for execution according to the action probability distribution of the task scheduling, calculates rewards, and updates the network parameters of the deep reinforcement learning agent according to the rewards;
s4: and repeating S2 to S3 to train the deep reinforcement learning agent, so that the deep reinforcement learning agent continuously learns and adjusts.
2. The deep reinforcement learning container cluster resource scheduling method according to claim 1, wherein in S3, the calculation formula of the reward is as follows:
rj,i(c)=γp·priorityj(c)-γi1·imblance1(c)-γi2·imblance2(c)
wherein the priorityj(c) Representing that the deep reinforcement learning agent schedules the task T in the time state of the c-th taskjScheduling to appropriate nodes to execute the acquired task priority reward, opportunity1(c) Indicating the degree of imbalance in use between resources within a container cluster node, interference2(c) Indicates the degree of resource usage imbalance, gamma, between container cluster nodespWeight coefficient, gamma, for task priorityi1Using a weight coefficient, γ, of an unbalanced penalty term for resources within a container cluster nodei2Is a container setThe resources between the cluster nodes use an unbalanced weight factor.
3. The method of claim 2, wherein the task priority reward priority is priorityj(c) The specific formula of (a) is as follows:
in the formula, gamma1Rewarding coefficient, gamma, for the weight of task latency2Priority rewards, gamma, for task resource request volumes3Rewarding the task delay sensitivity priority;the task priority reward obtained by scheduling the task from the perspective of the task resource request amount is represented by the following formula:
wherein n is the number of nodes, rCPUFor weighting the reward factor, r, of CPU resourcesMemFor the weighted reward factor of the memory, rGPUWeighting and rewarding coefficients of the resources corresponding to the GPU;representing a node NiThe maximum available amount of CPU resources of (c),representing node NiThe maximum available amount of memory resources of (a),representing a node NiMaximum usable amount of GPU resources;indicating a task T to be scheduledjThe amount of CPU resource requests of (a),indicating a task T to be scheduledjThe amount of memory requests of (a) to be requested,indicating a task T to be scheduledjThe GPU resource request amount;
represents the task priority reward obtained by scheduling the task from the viewpoint of task waiting time, and the formula is as follows:
where Q denotes a task queue, wj(c) Representing a task TjWaiting time of, wk(c) Representing a task TkThe waiting time of (c);
represents the task priority reward obtained by scheduling the tasks from the perspective of task delay sensitivity, the tasks to be scheduled comprise an inference task and a training task,the formula of (a) is as follows:
4. the method for scheduling container cluster resources based on deep reinforcement learning of claim 3, wherein the task priority reward function priority is used when considering the dependency relationship between the tasks to be scheduledj(c) Expressed as:
wherein the priorityz(c) Representing tasks T dependent on task TjzChild (j) indicates a dependency on task TjIs executed.
5. The method according to claim 4, wherein in step S2, when the task priority reward of a certain task is greater than 0, it indicates that there is an allocable resource of a certain container cluster node that meets the resource request of the task to be scheduled.
6. The method as claimed in claim 2, wherein the degree of imbalance used between resources in the container cluster nodes is equal to or less than zero1(c) And resource usage imbalance degree interference between container cluster nodes2(c) The expression of (a) is as follows:
wherein, Ui(c) Node N in time state representing c scheduling of deep reinforcement learning agentiResource usage of, including CPU resource usageMemory resource usage rateGPU resource usageThe expression is as follows:
wherein the content of the first and second substances,indicating node N in the state of scheduling for the c-th timeiThe available resources of the CPU are used,indicating node N in the state of scheduling for the c-th timeiThe available memory resources are then available to the user,indicating node N in the state of scheduling for the c-th timeiAvailable GPU resources.
7. The deep reinforcement learning container cluster resource scheduling method according to any one of claims 1 to 6, wherein in S1, the initial characteristic values include resource request amount, waiting time, task index depended on and task delay sensitivity of a task to be scheduled, and resource maximum usable amount, current resource allocable amount and current resource utilization rate of a container cluster node.
8. The deep-reinforcement-learning container cluster resource scheduling method of claim 7, wherein the deep-reinforcement-learning agent comprises a policy network, and wherein initial characteristic values are input into the policy network, and wherein the policy network outputs an action probability distribution for task scheduling.
9. The deep-reinforcement learning container cluster resource scheduling method of claim 8, wherein the deep-reinforcement learning agent further comprises a value network for scoring a policy network;
and inputting the initial characteristic value into the value network to obtain the score of the strategy network, and updating the network parameters of the strategy network and the value network according to the reward and the score of the strategy network.
10. A container cluster resource scheduling system based on deep reinforcement learning, comprising:
the deep reinforcement learning agent is used for obtaining action probability distribution of task scheduling according to the resource use state of a container cluster node and the characteristic value of a task to be scheduled when allocable resources of the container cluster node meet the resource request of the task to be scheduled in a task queue;
the container cluster scheduler is used for scheduling the tasks to be scheduled to the container cluster nodes for execution according to the action probability distribution output by the deep reinforcement learning agent;
and the optimization module is used for calculating rewards, updating the network parameters of the deep reinforcement learning agent according to the rewards and continuously optimizing the deep reinforcement learning agent.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210051579.9A CN114443249A (en) | 2022-01-17 | 2022-01-17 | Container cluster resource scheduling method and system based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210051579.9A CN114443249A (en) | 2022-01-17 | 2022-01-17 | Container cluster resource scheduling method and system based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114443249A true CN114443249A (en) | 2022-05-06 |
Family
ID=81368646
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210051579.9A Pending CN114443249A (en) | 2022-01-17 | 2022-01-17 | Container cluster resource scheduling method and system based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114443249A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114610474A (en) * | 2022-05-12 | 2022-06-10 | 之江实验室 | Multi-strategy job scheduling method and system in heterogeneous supercomputing environment |
CN115237581A (en) * | 2022-09-21 | 2022-10-25 | 之江实验室 | Heterogeneous computing power-oriented multi-strategy intelligent scheduling method and device |
CN115293451A (en) * | 2022-08-24 | 2022-11-04 | 中国西安卫星测控中心 | Resource dynamic scheduling method based on deep reinforcement learning |
CN115361301A (en) * | 2022-10-09 | 2022-11-18 | 之江实验室 | Distributed computing network cooperative traffic scheduling system and method based on DQN |
CN116320843A (en) * | 2023-04-24 | 2023-06-23 | 华南师范大学 | Queue request mobilization method and device for elastic optical network |
CN117971502A (en) * | 2024-03-29 | 2024-05-03 | 南京认知物联网研究院有限公司 | Method and device for carrying out online optimization scheduling on AI reasoning cluster |
-
2022
- 2022-01-17 CN CN202210051579.9A patent/CN114443249A/en active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114610474A (en) * | 2022-05-12 | 2022-06-10 | 之江实验室 | Multi-strategy job scheduling method and system in heterogeneous supercomputing environment |
CN115293451A (en) * | 2022-08-24 | 2022-11-04 | 中国西安卫星测控中心 | Resource dynamic scheduling method based on deep reinforcement learning |
CN115293451B (en) * | 2022-08-24 | 2023-06-16 | 中国西安卫星测控中心 | Resource dynamic scheduling method based on deep reinforcement learning |
CN115237581A (en) * | 2022-09-21 | 2022-10-25 | 之江实验室 | Heterogeneous computing power-oriented multi-strategy intelligent scheduling method and device |
CN115237581B (en) * | 2022-09-21 | 2022-12-27 | 之江实验室 | Heterogeneous computing power-oriented multi-strategy intelligent scheduling method and device |
CN115361301A (en) * | 2022-10-09 | 2022-11-18 | 之江实验室 | Distributed computing network cooperative traffic scheduling system and method based on DQN |
US12021751B2 (en) | 2022-10-09 | 2024-06-25 | Zhejiang Lab | DQN-based distributed computing network coordinate flow scheduling system and method |
CN116320843A (en) * | 2023-04-24 | 2023-06-23 | 华南师范大学 | Queue request mobilization method and device for elastic optical network |
CN116320843B (en) * | 2023-04-24 | 2023-07-25 | 华南师范大学 | Queue request mobilization method and device for elastic optical network |
CN117971502A (en) * | 2024-03-29 | 2024-05-03 | 南京认知物联网研究院有限公司 | Method and device for carrying out online optimization scheduling on AI reasoning cluster |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114443249A (en) | Container cluster resource scheduling method and system based on deep reinforcement learning | |
CN111756812B (en) | Energy consumption perception edge cloud cooperation dynamic unloading scheduling method | |
CN108958916B (en) | Workflow unloading optimization method under mobile edge environment | |
CN109561148A (en) | Distributed task dispatching method in edge calculations network based on directed acyclic graph | |
CN109960585A (en) | A kind of resource regulating method based on kubernetes | |
CN109788046B (en) | Multi-strategy edge computing resource scheduling method based on improved bee colony algorithm | |
CN113225377A (en) | Internet of things edge task unloading method and device | |
CN112817728B (en) | Task scheduling method, network device and storage medium | |
CN108055292B (en) | Optimization method for mapping from virtual machine to physical machine | |
CN109656713B (en) | Container scheduling method based on edge computing framework | |
CN113127169A (en) | Efficient link scheduling method for dynamic workflow in data center network | |
CN115033357A (en) | Micro-service workflow scheduling method and device based on dynamic resource selection strategy | |
CN114675975B (en) | Job scheduling method, device and equipment based on reinforcement learning | |
CN116302389A (en) | Task scheduling method based on improved ant colony algorithm | |
CN110958192B (en) | Virtual data center resource allocation system and method based on virtual switch | |
CN109298932B (en) | OpenFlow-based resource scheduling method, scheduler and system | |
CN116069473A (en) | Deep reinforcement learning-based Yarn cluster workflow scheduling method | |
CN116932198A (en) | Resource scheduling method, device, electronic equipment and readable storage medium | |
CN113179175B (en) | Real-time bandwidth prediction method and device for power communication network service | |
Li et al. | SLA-based task offloading for energy consumption constrained workflows in fog computing | |
CN112698911B (en) | Cloud job scheduling method based on deep reinforcement learning | |
CN114090239A (en) | Model-based reinforcement learning edge resource scheduling method and device | |
CN113256128A (en) | Task scheduling method for balancing resource usage by reinforcement learning in power internet of things | |
Zhang et al. | Online joint scheduling of delay-sensitive and computation-oriented tasks in edge computing | |
CN116909717B (en) | Task scheduling method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |