CN114443249A

CN114443249A - Container cluster resource scheduling method and system based on deep reinforcement learning

Info

Publication number: CN114443249A
Application number: CN202210051579.9A
Authority: CN
Inventors: 吴迪; 刘可; 胡淼; 肖子立; 肖霖畅
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-01-17
Filing date: 2022-01-17
Publication date: 2022-05-06

Abstract

The invention provides a container cluster resource scheduling method and system based on deep reinforcement learning, which comprises the following steps: s1: and establishing a deep reinforcement learning intelligent agent. S2: when the allocable resources of a certain container cluster node meet the resource request of a certain task to be scheduled in the task queue, inputting the resource use state of the container cluster node and the characteristic value of the task to be scheduled into a deep reinforcement learning agent to obtain the action probability distribution of task scheduling. S3: and scheduling the tasks to be scheduled to container cluster nodes for execution according to the action probability distribution of the task scheduling, calculating rewards, and updating network parameters of the intelligent agent according to the rewards. S4: and repeating the steps from S2 to S3, training the deep reinforcement learning agent, and enabling the agent to continuously learn and adjust. By establishing the deep reinforcement learning agent and continuously learning and training the agent, the agent can automatically generate a corresponding scheduling strategy and schedule tasks to corresponding container cluster nodes.

Description

Container cluster resource scheduling method and system based on deep reinforcement learning

Technical Field

The invention relates to the field of cluster scheduling, in particular to a container cluster resource scheduling method and system based on deep reinforcement learning.

Background

A container is a form of operating system virtualization, and in practice, one container can be used to run all the contents of a small microservice or software process large application. Different container tasks in a container cluster have different properties. In order to better address the characteristics of container tasks, shorten the average task completion time and realize load balancing among node resources, a scheduling strategy method which considers the current node resource occupation situation and also considers the influence of the future node resource occupation situation on resource contention needs to be designed.

The existing scheduling method of a container cluster comprises the steps of obtaining information and load data of all terminals in the container cluster at a preset time interval, wherein the load data comprises the CPU utilization rate and the memory occupancy rate of the terminals and the transmission rate occupied by a container; calculating the comprehensive load rate of the terminal according to the terminal identification and the load data; and when the comprehensive load rate of the terminal exceeds 50%, scheduling the container cluster. However, the scheduling method is a method for compiling rules by experts, and when the cluster structure or task state changes, the scheduling algorithm needs to be manually adjusted again, so that a large amount of manpower and material resources are wasted, and the scheduling efficiency of cluster resources is reduced.

Disclosure of Invention

The invention provides a container cluster resource scheduling method and system based on deep reinforcement learning, aiming at overcoming the defect that in the prior art, when the cluster structure or task state changes, the manual scheduling algorithm needs to be readjusted, and modeling is carried out on the multi-dimensional characteristics of a container task.

In order to solve the technical problems, the technical scheme of the invention is as follows:

in a first aspect, the invention provides a container cluster resource scheduling method based on deep reinforcement learning, which includes the following steps:

s1: establishing a deep reinforcement learning intelligent agent;

s2: when the allocable resources of a certain container cluster node meet the resource request of a certain task to be scheduled in a task queue, inputting the resource use state of the container cluster node and the characteristic value of the task to be scheduled into a deep reinforcement learning agent to obtain action probability distribution of task scheduling;

s3: the container cluster scheduler schedules the tasks to be scheduled to container cluster nodes for execution according to the action probability distribution of the task scheduling, calculates rewards, and updates the network parameters of the deep reinforcement learning agent according to the rewards;

s4: and repeating S2 to S3 to train the deep reinforcement learning agent, so that the deep reinforcement learning agent continuously learns and adjusts.

According to the technical scheme, a deep reinforcement learning intelligent agent is established, the resource use state of a container cluster node and the characteristic value of a task to be scheduled are input into the deep reinforcement learning intelligent agent to obtain action probability distribution of task scheduling, the container cluster scheduler schedules the task to be scheduled to a corresponding container cluster node according to the action probability distribution, rewards are calculated, network parameters of the deep reinforcement learning intelligent agent are updated according to the rewards to continuously perform learning and training, the deep reinforcement learning intelligent agent can automatically adjust according to the change of the container cluster node or task state information, and finally the deep reinforcement learning intelligent agent can generate a corresponding scheduling strategy and instruct the container cluster scheduler to schedule the task to be scheduled to the corresponding container cluster node.

Preferably, in S3, the calculation formula of the reward is as follows:

r_j,i(c)＝γ_p·priority_j(c)-γ_i1·imblance₁(c)-γ_i2·imblance₂(c)

wherein the priority_j(c) Representing that the deep reinforcement learning agent schedules the task T in the time state of the c-th task_jScheduling to appropriate nodes to execute the acquired task priority reward, imblamece₁(c) Indicating the degree of imbalance in use between resources within a container cluster node, interference₂(c) Representing container cluster sectionsDegree of resource usage imbalance between points, gamma_pWeight coefficient, gamma, for task priority_i1Using a weight coefficient, γ, of an unbalanced penalty term for resources within a container cluster node_i2An unbalanced weight factor is used for the resource between container cluster nodes.

Preferably, the task priority reward priority_j(c) The specific formula of (a) is as follows:

in the formula, gamma₁Rewarding coefficient, gamma, for the weight of task latency₂Priority rewards, gamma, for task resource request volumes₃Awarding priority for task latency sensitivity;

the task priority reward obtained by scheduling the task from the perspective of the task resource request amount is represented by the following formula:

wherein n is the number of nodes, r_CPUFor weighting the reward factor, r, of CPU resources_MemFor the weighted reward factor, r, of the memory resource_GPUWeighting and rewarding coefficients of the resources corresponding to the GPU;

representing node N_iThe maximum available amount of CPU resources of (c),

representing a node N_iThe maximum available amount of memory resources of (a),

representing a node N_iMaximum usable amount of GPU resources;

indicating a task T to be scheduled_jThe amount of CPU resource requests of (a),

indicating a task T to be scheduled_jThe amount of memory resource requests of (a),

indicating a task T to be scheduled_jThe GPU resource request amount;

the task priority reward obtained by scheduling the task from the viewpoint of the task waiting time is represented by the following formula:

where Q denotes a task queue, w_j(c) Representing a task T_jWaiting time of, w_k(c) Representing a task T_kThe waiting time of (c);

representing the task priority reward obtained by scheduling the tasks from the perspective of task delay sensitivity, wherein the tasks to be scheduled comprise reasoning tasks and training tasks,

is shown below:

preferably, the task priority reward function priority when considering dependencies between tasks to be scheduled_j(c) Is shown as：

Wherein the priority_z(c) Representing tasks T dependent on task Tj_zChild (j) indicates a dependency on task T_zIs executed.

Preferably, in S2, when the task priority reward of a certain task is greater than 0, it indicates that there is an allocable resource of a certain container cluster node to satisfy the resource request of the task to be scheduled.

Preferably, the degree of imbalance in usage interference between resources within the container cluster nodes is determined by the container cluster node₁(c) And resource usage imbalance degree interference between container cluster nodes₂(c) The expression of (a) is as follows:

wherein, U_i(c) Node N in time state representing c scheduling of deep reinforcement learning agent_iResource usage of, including CPU resource usage

Memory resource usage rate

GPU resource usage

The expression is as follows:

wherein the content of the first and second substances,

indicating node N in the state of scheduling for the c-th time_iThe available resources of the CPU are used,

indicating node N in the state of scheduling for the c-th time_iThe available memory resources are then available to the user,

indicating node N in the state of scheduling for the c-th time_iAvailable GPU resources.

Preferably, in S1, the initial characteristic values include a resource request amount, a waiting time, a depended task index and a task delay sensitivity of the task to be scheduled, and a resource maximum usable amount, a current resource allocable amount and a current resource utilization rate of the container cluster node.

Preferably, the deep reinforcement learning agent comprises a policy network, the state information of the tasks to be scheduled and the container cluster nodes is input into the policy network, and the policy network outputs the action probability distribution of task scheduling.

Preferably, the deep reinforcement learning agent further comprises a value network, wherein the value network is used for scoring the strategy network;

and inputting the initial characteristic value into the value network to obtain the score of the strategy network, and updating the network parameters of the strategy network and the value network according to the reward and the score of the strategy network.

In a second aspect, the present invention further provides a container cluster resource scheduling system based on deep reinforcement learning, including:

the deep reinforcement learning agent is used for obtaining action probability distribution of task scheduling according to the resource use state of a container cluster node and the characteristic value of a task to be scheduled when allocable resources of the container cluster node meet the resource request of the task to be scheduled in a task queue;

the container cluster scheduler is used for scheduling the tasks to be scheduled to the container cluster nodes for execution according to the action probability distribution output by the deep reinforcement learning agent;

and the optimization module is used for calculating rewards, updating the network parameters of the deep reinforcement learning agent according to the rewards and continuously optimizing the deep reinforcement learning agent.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that: according to the method, a deep reinforcement learning intelligent agent is established, the resource use state of a container cluster node and the characteristic value of a task to be scheduled are input into the deep reinforcement learning intelligent agent to obtain the action probability distribution of task scheduling, the container cluster scheduler schedules the task to be scheduled into the corresponding container cluster node according to the action probability distribution, calculates rewards, updates the network parameters of the deep reinforcement learning intelligent agent according to the rewards to continuously learn and train, so that the deep reinforcement learning intelligent agent can automatically adjust according to the change of the container cluster node or the task state information, finally the deep reinforcement learning intelligent agent can generate a corresponding scheduling strategy, and instructs the container cluster scheduler to schedule the task to be scheduled onto the corresponding container cluster node.

Drawings

Fig. 1 is a flowchart of a container cluster resource scheduling method based on deep reinforcement learning.

Fig. 2 is a schematic diagram of a container cluster resource scheduling method based on deep reinforcement learning in embodiment 1.

Fig. 3 is a flowchart of a task scheduling algorithm in embodiment 3.

Fig. 4 is an architecture diagram of a container cluster resource scheduling system based on deep reinforcement learning.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

the technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

Referring to fig. 1-2, the present embodiment provides a container cluster resource scheduling method based on deep reinforcement learning, including the following steps:

s1: establishing a deep reinforcement learning intelligent agent;

As shown in fig. 2, fig. 2 is a system architecture diagram of a container cluster resource scheduling method based on deep reinforcement learning according to this embodiment, and it is assumed that a task T_jArriving discretely in the form of containers in a Kubernetes cluster, into a task queue of fixed length m maintained by the container cluster. The task requests certain system resources, including a CPU, a memory and a GPU, and the scheduler selects one task from the task queue to be scheduled to a proper node according to the resource request of the task, the time waiting for scheduling, the depended index and the response delay sensitivity and the optimal scheduling strategy, so that the shortest average scheduling completion time and the resource load balance among the nodes are realized.

The method comprises the steps of establishing a deep reinforcement learning intelligent agent, inputting resource use states of container cluster nodes and characteristic values of tasks to be scheduled into the deep reinforcement learning intelligent agent to obtain action probability distribution of task scheduling, scheduling the tasks to be scheduled into corresponding cluster nodes by a container cluster scheduler according to the action probability distribution, calculating rewards, updating network parameters of the deep reinforcement learning intelligent agent according to the rewards to continuously learn and train, enabling the deep reinforcement learning intelligent agent to be capable of adjusting, updating the network parameters of the deep reinforcement learning intelligent agent to continuously learn and train, and finally automatically generating an optimal scheduling strategy by the deep reinforcement learning intelligent agent and instructing the container cluster scheduler to schedule the tasks to be scheduled onto the corresponding cluster nodes.

Example 2

The embodiment provides a container cluster resource scheduling method based on deep reinforcement learning, which includes:

s1: and establishing a deep reinforcement learning intelligent agent.

In this embodiment, the deep reinforcement learning agent interacts with the system environment, and needs to take action, that is, select a task from the task queue, instruct the container cluster scheduler to schedule the task to the corresponding container cluster node for execution, and maximize the long-term reward by observing the environment and the internal state. More specifically, assume that the initial maintenance state of the deep reinforcement learning agent is s₀When some node allocable resources meet the resource requirement of some task to be scheduled in the task queue, the deep reinforcement learning agent interacts with the scheduling environment. When scheduling for the c time, the deep reinforcement learning agent maintains the state of s_cThe current information of the system is described. On that basis, the deep reinforcement learning agent instructs the container cluster scheduler to select action a_cNamely, a certain task is selected from the task queue and is scheduled to the corresponding container cluster node for execution. The deep reinforcement learning agent then receives a reward r generated by the post-action environment_cGenerating a new state s_c+1. Therefore, the sequence of a series of operations of the deep reinforcement learning agent is as follows:

s₀,a₀,r₀,s₁,a₁,r₁,…,s_c+1。

s2: when the allocable resources of a certain container cluster node meet the resource request of a certain task to be scheduled in the task queue, inputting the resource use state of the container cluster node and the characteristic value of the task to be scheduled into a deep reinforcement learning agent to obtain the action probability distribution of task scheduling.

In this embodiment, the resource usage state of the container cluster node and the feature value of the task to be scheduled include a resource request amount, a waiting time, a task index that is relied on, and a task delay sensitivity of the task to be scheduled, and a maximum resource available amount, a current resource allocable amount, and a current resource utilization rate of the container cluster node are set as N { N ═ N { of the kubernets container cluster₁,N₂,…,N_nIn this, there are n nodes in total. The computing resources which can be provided by each node are different, and the computing resources are assumed to be a CPU, a memory and a GPU.

In this embodiment, the deep reinforcement learning agent includes a policy network. In the specific implementation process, based on an A2C algorithm, the state information of the task to be scheduled and the container cluster node is input into a deep reinforcement learning strategy network pi (a | s; theta)₁) The policy network outputs an action probability distribution pi (a | s) for task scheduling.

As a decision unit, the deep reinforcement learning agent is essentially a neural network taking state information of a task to be scheduled and a container cluster node as input, and outputs an action probability distribution pi (a | s), wherein a represents an action, and s represents a state characteristic. The action of the deep reinforcement learning agent is defined as that the deep reinforcement learning agent selects a certain task from the task queue to be scheduled to the corresponding node for execution. The Kubernetes container cluster has n nodes, a task queue with the length of m is maintained, the deep reinforcement learning intelligent agent has n m possible actions, and the network output is the probability distribution of the n m actions.

In this embodiment, the state space of the deep reinforcement learning agent is formed according to the resource usage state of each container cluster node of the current container cluster system and the initial characteristic value set by the task to be scheduled in the task queue.

TABLE 1 State space for deep reinforcement learning Agents

Input device	Size and breadth
		Maximum available amount of node resources	n
Currently allocable amount of node resource	n
		Current utilization of node resources	n
Per resource request amount of task	3m
		Task latency	m
Task dependent indexing	m*m
		Task latency sensitivity	m

As shown in table 1, table 1 shows a state space of the deep reinforcement learning agent, where table 1 includes a maximum resource available amount RC of a node, a current node resource allocable amount ra (c), a current node resource utilization rate u (c), a resource request amount RT of each task to be scheduled in a task queue, a waiting time W, a task index I that is depended on, and a task delay sensitivity L, all of which are expressed by the same unit proportion.

The expression of the motion space is a_c(j, i) indicates that the deep reinforcement learning agent selects the task T from the task queue_jTo task T_jScheduling to node N_iIs executed.

S3: and the container cluster scheduler schedules the tasks to be scheduled to container cluster nodes for execution according to the action probability distribution of the task scheduling, calculates rewards and updates the network parameters of the deep reinforcement learning agent according to the rewards.

In order to effectively train a deep reinforcement learning agent, appropriate rewards need to be set.

In order to avoid the situation that a task running for a long time and having low response sensitivity occupies a large amount of resources and blocks the execution of the task with low resource occupation amount, the average job completion time of a system is increased, a task queue is maintained, and the tasks in the task queue are subjected to dynamic priority sequencing. The higher the reward the scheduler picks for a task with a high priority. The priority ranking is mainly based on the task resource request amount, the task waiting time, the task dependency relationship and the task delay sensitivity, and aims to hope that tasks with higher task priority rewards are scheduled first. Constructing a reward model of the deep reinforcement learning agent according to the task priority reward, the degree of imbalance in resource usage among the container cluster nodes and the degree of imbalance in resource usage among the container cluster nodes:

in order to avoid the influence of the tasks with large resource occupation on the running of a large number of tasks with small resource occupation, the task priority rewarding obtained by scheduling the tasks from the perspective of the task resource request amount

The calculation formula of (a) is as follows:

representing node N_iThe maximum available amount of CPU resources of (c),

representing a node N_iThe maximum available amount of memory resources of (a),

representing a node N_iMaximum available amount of GPU resources;

representing a task T to be scheduled_jThe amount of CPU resource requests of (a),

indicating a task T to be scheduled_jThe amount of GPU resource requests.

The method avoids new tasks from continuously entering, and leads the tasks to be in a hungry state for a long time. Thus, the longer waiting tasks receive a higher priority reward, and the task priority reward received from scheduling tasks in terms of task latency

The calculation formula of (a) is as follows:

in a cluster, different types of tasks have different delay sensitivities. For example, in a machine learning task cluster, inference tasks often require faster response than training tasks, and thus, for these two task types, task priority rewards are obtained by scheduling tasks from the perspective of task delay sensitivity

The formula of (a) is as follows:

in addition, under the condition that cluster resources are in shortage, for the condition that no node allocable resources meet the resource requirement of the task to be scheduled, the priority reward of the scheduling task is directly set to be minus infinity, and therefore the priority reward function priority of the task_j(c) The specific formula is as follows:

in the formula, gamma₁Rewarding coefficient, gamma, for the weight of task latency₂Priority rewards, gamma, for task resource request volumes₃Priority rewards for task latency sensitivity. priority_j(c) Task T is selected by deep reinforcement learning agent under the time state of scheduling the c-th task_jScheduling to the appropriate node to perform the resulting priority reward.

When the task priority reward of a certain task is negative infinity, no container cluster node can allocate resources to meet the resource requirement of a certain task to be scheduled, and the task is not scheduled until the allocable resources of the container cluster node can meet the resource requirement of the task to be scheduled, namely the priority of the task is greater than 0, and the task to be scheduled can be scheduled to the container cluster node.

In an actual cluster environment, dependency relationships often exist between tasks, and in such a case, the task which is most depended on should be executed preferentially, so that the completion time of the task which depends on the task is shortened. Thus, considering the dependencies between tasks, the priority reward may be expressed as:

In addition, the calculation of the scheduling reward obtained by the deep reinforcement learning agent also needs to consider the utilization imbalance degree interference among the resources in the container cluster nodes₁(c) And resource usage imbalance degree interference between container cluster nodes₂(c) The expression is as follows:

Memory resource usage rate

GPU resource usage

The expression is as follows:

wherein, the first and the second end of the pipe are connected with each other,

Thus deep reinforcement learning agent's reward function r_j,i(c) The expression of (a) is as follows:

r_j,i(c)＝γ_p·priority_j(c)-γ_i1·imblance₁(c)-γ_i2·imblance₂(c)

wherein, γ_pWeight coefficient, gamma, for task priority_i1Using weight coefficients, gamma, of unequal penalty terms for resources within a node_i2An unbalanced weight coefficient is used for the inter-node resources.

When the task priority reward of a certain task is greater than 0, the fact that allocable resources of a certain container cluster node meet the resource request of the task to be scheduled is indicated.

In this embodiment, the intelligent deep reinforcement learning agent further includes a value network v (s; θ)₂) The value network v (s; theta₂) For providing a policy network pi (a | s; theta₁) Help the policy network pi (a | s; theta₁) And (5) carrying out improvement. In the concrete implementation process, the state information s_cAnd s_c+1Respectively inputting value networks v (s; theta)₂) Respectively obtain corresponding scores

And

according to the score

Scoring

And a prize r_cTraining a deep reinforcement learning agent by using a time difference method (TD method), and updating network parameters of a strategy network and a value network, wherein the specific expression formula is as follows:

wherein gamma is a future reward weight factor,

for TD object, δ_cIs TD error, θ'₁Represents the updated policy network parameter, θ'₂Representing the updated value network parameters.

Optimizing reward function r for deep reinforcement learning agent with the goal of maximizing total reward model_j,i(c) And the deep reinforcement learning agent generates an optimal scheduling strategy and schedules the tasks to be scheduled to the corresponding container cluster nodes according to the optimal scheduling strategy so as to shorten the average completion time of task scheduling and realize load balance among node resources.

The pseudo code of the container cluster resource scheduling method based on deep reinforcement learning is as follows:

inputting:

resource request RP of task to be scheduled

Waiting time W of task to be scheduled

Depended task index I of task to be scheduled

Task delay sensitivity L of task to be scheduled

Maximum resource availability RC for container cluster node

Current resource allowability RA (c) of Container Cluster node

Current resource utilization of Container Cluster node U (c)

Reward function weight ω

Future reward weight factor gamma

And (3) outputting:

task T to be scheduled_jNode N of task scheduling_i

The first step is as follows: calculating priority rewards for tasks to be scheduled

The second step is that: and judging whether the reward of a certain task priority is greater than 0, if so, executing the next step, and otherwise, waiting until the condition is met.

The third step: and setting the characteristic value of each state information and inputting the deep reinforcement learning agent.

The RC, RA (c), U (c) and C of the container cluster node in the c-th scheduling state and the RP, W, I, L and other state information of the task to be scheduled form a state characteristic s_CInput strategy network pi (a | s; theta)₁)

The fourth step: and carrying out scheduling decision according to the action probability distribution pi (a | s) output by the policy network, and scheduling the specified task to the specified node for execution.

Selecting task T to be scheduled according to action probability distribution output by policy network_j

Obtaining selected scheduling node N according to action probability distribution output by policy network_i

At node N_iUp scheduling task T_j

The fifth step: calculating rewards

Calculating the reward r according to a formula_j,i(c)

And a sixth step: updating network parameters

Characterizing the status information s_cAnd s_c+1Respectively inputting the value network to obtain:

and

calculating TD targets

And TD error delta_c

Updating value network parameter θ₂

Updating policy network parameter θ₁。

Example 4

Referring to fig. 4, the present embodiment provides a container cluster resource scheduling system based on deep reinforcement learning, which includes a deep reinforcement learning agent, a container cluster scheduler, and an optimization module.

In a specific implementation process, when allocable resources of a certain container cluster node meet a resource request of a certain task to be scheduled in a task queue, the resource use state of the container cluster node and the characteristic value of the task to be scheduled are input into a deep reinforcement learning agent, and a policy network of the deep reinforcement learning agent outputs action probability distribution of task scheduling. And the container cluster scheduler schedules the tasks to be scheduled to corresponding container cluster nodes according to the action probability distribution, the optimization module calculates rewards, updates network parameters of the deep reinforcement learning intelligent agent according to the rewards and continuously optimizes the deep reinforcement learning intelligent agent, so that the deep reinforcement learning intelligent agent can automatically adjust according to the container cluster nodes or the task state information when the state information changes, and finally the deep reinforcement learning intelligent agent can generate corresponding scheduling strategies and instruct the container cluster scheduler to schedule the tasks to be scheduled to the corresponding container cluster nodes.

Terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A container cluster resource scheduling method based on deep reinforcement learning is characterized by comprising the following steps:

s1: establishing a deep reinforcement learning intelligent agent;

2. The deep reinforcement learning container cluster resource scheduling method according to claim 1, wherein in S3, the calculation formula of the reward is as follows:

r_j,i(c)＝γ_p·priority_j(c)-γ_i1·imblance₁(c)-γ_i2·imblance₂(c)

wherein the priority_j(c) Representing that the deep reinforcement learning agent schedules the task T in the time state of the c-th task_jScheduling to appropriate nodes to execute the acquired task priority reward, opportunity₁(c) Indicating the degree of imbalance in use between resources within a container cluster node, interference₂(c) Indicates the degree of resource usage imbalance, gamma, between container cluster nodes_pWeight coefficient, gamma, for task priority_i1Using a weight coefficient, γ, of an unbalanced penalty term for resources within a container cluster node_i2Is a container setThe resources between the cluster nodes use an unbalanced weight factor.

3. The method of claim 2, wherein the task priority reward priority is priority_j(c) The specific formula of (a) is as follows:

in the formula, gamma₁Rewarding coefficient, gamma, for the weight of task latency₂Priority rewards, gamma, for task resource request volumes₃Rewarding the task delay sensitivity priority;

wherein n is the number of nodes, r_CPUFor weighting the reward factor, r, of CPU resources_MemFor the weighted reward factor of the memory, r_GPUWeighting and rewarding coefficients of the resources corresponding to the GPU;

representing a node N_iThe maximum available amount of CPU resources of (c),

representing node N_iThe maximum available amount of memory resources of (a),

representing a node N_iMaximum usable amount of GPU resources;

indicating a task T to be scheduled_jThe amount of memory requests of (a) to be requested,

indicating a task T to be scheduled_jThe GPU resource request amount;

represents the task priority reward obtained by scheduling the task from the viewpoint of task waiting time, and the formula is as follows:

represents the task priority reward obtained by scheduling the tasks from the perspective of task delay sensitivity, the tasks to be scheduled comprise an inference task and a training task,

the formula of (a) is as follows:

4. the method for scheduling container cluster resources based on deep reinforcement learning of claim 3, wherein the task priority reward function priority is used when considering the dependency relationship between the tasks to be scheduled_j(c) Expressed as:

wherein the priority_z(c) Representing tasks T dependent on task Tj_zChild (j) indicates a dependency on task T_jIs executed.

5. The method according to claim 4, wherein in step S2, when the task priority reward of a certain task is greater than 0, it indicates that there is an allocable resource of a certain container cluster node that meets the resource request of the task to be scheduled.

6. The method as claimed in claim 2, wherein the degree of imbalance used between resources in the container cluster nodes is equal to or less than zero₁(c) And resource usage imbalance degree interference between container cluster nodes₂(c) The expression of (a) is as follows:

Memory resource usage rate

GPU resource usage

The expression is as follows:

wherein the content of the first and second substances,

7. The deep reinforcement learning container cluster resource scheduling method according to any one of claims 1 to 6, wherein in S1, the initial characteristic values include resource request amount, waiting time, task index depended on and task delay sensitivity of a task to be scheduled, and resource maximum usable amount, current resource allocable amount and current resource utilization rate of a container cluster node.

8. The deep-reinforcement-learning container cluster resource scheduling method of claim 7, wherein the deep-reinforcement-learning agent comprises a policy network, and wherein initial characteristic values are input into the policy network, and wherein the policy network outputs an action probability distribution for task scheduling.

9. The deep-reinforcement learning container cluster resource scheduling method of claim 8, wherein the deep-reinforcement learning agent further comprises a value network for scoring a policy network;

10. A container cluster resource scheduling system based on deep reinforcement learning, comprising: