CN112416578B - Container cloud cluster resource utilization optimization method based on deep reinforcement learning - Google Patents

Container cloud cluster resource utilization optimization method based on deep reinforcement learning Download PDF

Info

Publication number
CN112416578B
CN112416578B CN202011225270.4A CN202011225270A CN112416578B CN 112416578 B CN112416578 B CN 112416578B CN 202011225270 A CN202011225270 A CN 202011225270A CN 112416578 B CN112416578 B CN 112416578B
Authority
CN
China
Prior art keywords
node
network model
load
action
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011225270.4A
Other languages
Chinese (zh)
Other versions
CN112416578A (en
Inventor
吴迪
吴灿豪
胡淼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202011225270.4A priority Critical patent/CN112416578B/en
Publication of CN112416578A publication Critical patent/CN112416578A/en
Application granted granted Critical
Publication of CN112416578B publication Critical patent/CN112416578B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a container cloud cluster resource utilization optimization method based on deep reinforcement learning, which comprises the following steps: preprocessing the original load data and assembling the original load data into an input state s; constructing a depth Q network model, inputting an input state s into the depth Q network model, randomly selecting an action a by the depth Q network model with a certain probability, or selecting the action a which enables the depth Q network model to be optimal, and executing one-time overstock ratio prediction; evaluating the selected action a through a reward function to obtain a reward r and entering a next state s'; the input state s, the action a, the reward r and the next state s' are formed into a quadruple and are taken as training samples to be put into a cache; when a preset training interval is reached, e training samples are sampled from the cache and input into the depth Q network model for training, and parameters of the depth Q network model are updated; and after the deep Q network model is trained by the E round, applying the deep Q network model with the updated parameters to determine the overstock strategy.

Description

Container cloud cluster resource utilization optimization method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of cloud computing resource management, in particular to a container cloud cluster resource utilization optimization method based on deep reinforcement learning.
Background
The Docker is an open-source application container engine, and Google first puts forward kubernetes (K8 s) in 2015 along with the wide application of the Docker container technology in application development, testing and release, and is a distributed architecture scheme based on the Docker container technology. K8s provides complete cluster management capabilities such as multi-level security and admission control, transparent service registration and service discovery mechanisms, and multi-granularity resource management capabilities. In addition, k8s provides both a built-in load balancer and extensible automatic resource scheduling capability, which is provided by a built-in scheduler and requires the following steps: (1) node preselection: excluding nodes which do not meet the conditions at all, for example, the conditions such as memory size, ports and the like are not met; (2) node prioritization: selecting an optimal node according to the priority; (3) binding: binding Pod to the optimal node screened in the previous step. Wherein, one of the screening conditions of (1) is the remaining resources of the node, and when the remaining resources of the node are smaller than the resource application amount of Pod, the node is directly excluded by the k8s scheduler, that is, the node cannot continue to schedule Pod.
It can be seen here that the scheduling manner used by k8s is static scheduling, i.e. k8s is binned according to the resources requested by the container, rather than according to the actual load of the node. However, the static scheduling method is simple and effective, but has a problem of easily causing low resource utilization of the cluster, which is also a common problem in the industry.
Disclosure of Invention
The invention provides a container cloud cluster resource utilization optimization method based on deep reinforcement learning for overcoming the defect of low resource utilization rate of clusters in the prior art.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a container cloud cluster resource utilization optimization method based on deep reinforcement learning comprises the following steps:
s1: preprocessing original load data, and assembling the preprocessed original load data into an input state s;
s2: constructing a depth Q network model for determining a super-selling strategy, inputting an input state s into the depth Q network model, randomly selecting an action a by the depth Q network model with a certain probability, or selecting the action a which enables the depth Q network model to be optimal, and executing one-time super-selling ratio prediction;
s3: evaluating the selected action a through a reward function to obtain a reward r and entering a next state s';
s4: the input state s, the action a, the reward r and the next state s' are formed into a quadruple and are taken as training samples to be put into a cache;
s5: when a preset training interval is reached, e training samples are sampled from the cache and input into the depth Q network model for training, and parameters of the depth Q network model are updated; wherein e is a positive integer;
s6: and after the deep Q network model is trained by the E round, applying the deep Q network model with the updated parameters to determine the overstock strategy.
Preferably, in the step S1, preprocessing the raw load data includes a binning operation; the method comprises the following specific steps:
assuming that the load update period is T, in the kth time period, the original load data in the kth period of the node n isAnd original load data +.>The dimension of the sampling point is the same as the number of sampling points in the updating period T; setting the number of the sub-boxes as B, and corresponding boundary value B i The value range of (b) is { b } i I 0 is less than or equal to i is less than or equal to B, i is E N; assuming that M clusters are arranged, the number of nodes of the mth cluster is N m
Then to the original load dataThe expression formula for performing the binning operation is as follows:
wherein C is m,n Representing the actual resource capacity of the nth node in the cluster m; the I function is an indication function;representing node load when load data +>When the value of (2) falls within the ith binning interval, node load +.>The value of (2) is correspondingly increased by 1.
Preferably, the input state s includes node information C n Node container informationNode history load->Wherein the node information C n The method comprises the steps of cluster ID where the nth node is located, city ID where the nth node is located, actual resource capacity and current super sales ratio; node container information->Representing the number of online service Pod, the number of offline service Pod and the total resource application amount of Pod on the nth node in the kth load updating period in the last 7 days; node history load->Representing the historical load of the last 7 days of the nodes after binning.
Preferably, in step S2, after the input state S is input into the depth Q network model, the depth Q network model calculates a Q value corresponding to each action a according to the input state S, and determines whether the current action a optimizes the depth Q network model according to the Q value corresponding to the action a; the calculation formula of the Q value is as follows:
Q(s,a,θ)=r′ current +γr′ future
wherein the Q value represents the instantaneous prize r 'available to the deep Q network model in the case of state s and execution of action a' current And a predictive value r 'for future rewards' future The method comprises the steps of carrying out a first treatment on the surface of the Gamma represents the discount factors for the instant and future rewards; θ is a parameter of the deep Q network model.
Preferably, in step S3, the selected action a is evaluated by a reward function, wherein the calculation formula of the reward r is as follows:
wherein w is o And w u Trade-off factors representing excess loss and loss of shortage, respectivelyAnd w is o +w u =1;o k Indicating excessive loss, i.e. when the load of the node is higher than L target The high load alarm risk existing in the time node; the expression formula is as follows:
u k indicating loss of demand, i.e. when the load of the node is below L target The resource waste exists in the time node; the expression formula is as follows:
wherein L is target A target load level for a preset node; h is a o And h u Half-lives representing excessive loss and loss of ullage, respectively;representing an estimate of the superpose node load.
Preferably, the estimate of the super-post-sale node loadFor compressible resources, the average resource utilization of cluster m is used +.>Estimating the load state of the super-sold node; the calculation formula is as follows:
for incompressible resources, when a node cannot provide the memory required by a process, the process is directly terminated by an operating system, and the maximum utilization rate of a cluster m is required to be adoptedThe load status of the super-sold node is estimated.
Preferably, the deep Q network model includes a target Q network and an online Q network, the parameters of which are denoted as θ and θ', respectively;
in the step S2, an online Q network is adopted to select an action a;
in the step S5, when a preset training interval is reached, e training samples are sampled from the cache and input into the online Q network for training, and the parameter theta' is updated;
in the step S6, after the online Q network is trained by the E-turn, the parameter θ' after completing the parameter update is used to update the parameter θ of the target Q network, and the target Q network is applied to determine the overstock policy.
Preferably, in step S5, a gradient descent algorithm pair (y-Q (S, a, θ') 2 Updating the parameter theta'; wherein, the expression formula of y is as follows:
y=r+γmax a′ Q(s′,a′,θ)
where y represents the current target Q value and a 'represents the action performed in the next state s'.
Preferably, the container cloud cluster resource utilization optimization method further comprises the following steps:
s7: deploying the depth Q network model with the updated parameters outside the cluster as a super-selling strategy interface to provide service, and deploying an event interception module in the cluster; and the event interception module intercepts Pod creation, deletion events and node heartbeat events by adopting an admission controller provided by k8 s.
Preferably, the event interception module performs the steps of:
1) The k8s server hopes to store the real resource state of the node into a database through a heartbeat event;
2) Intercepting a node heartbeat event, and replacing the real resource state of the node with a super-selling calculation result;
3) Intercepting the creation and deletion events of Pod in the cluster, wherein the corresponding node is marked as a dirty node;
4) Adding the dirty node into a node pool;
5) The independent thread monitors the node state in the local cache;
6) When no Pod creation or deletion event occurs on the node, marking the node as an 'old node' and adding the 'old node' into a node pool;
7) The nodes in the node pool call the super-selling strategy interface concurrently;
8) And calling a super-selling value strategy interface for the node with high load alarm.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that: the invention adopts a deep Q network model as a super-selling strategy model, which is used for determining a super-selling strategy; preprocessing data of an input depth Q network model, reducing the storage space of original load data and reducing the input dimension of a subsequent reinforcement learning model; and the action output by the deep Q network model is evaluated by combining the reward function, and the deep Q network model is trained with the aim of reducing the risk of overhigh load caused by overstock and reducing resource waste, so that the resource utilization rate of the cluster is effectively improved.
Drawings
FIG. 1 is a flow chart of a method for optimizing the resource utilization of a container cloud cluster based on deep reinforcement learning.
Fig. 2 is a flow chart of the design of the event interception module according to the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
The embodiment provides a container cloud cluster resource utilization optimization method based on deep reinforcement learning, as shown in fig. 1, which is a flowchart of the container cloud cluster resource utilization optimization method based on deep reinforcement learning in the embodiment.
In the container cloud cluster resource utilization optimization method based on deep reinforcement learning provided by the embodiment, the method specifically comprises the following steps:
s1: preprocessing the original load data, and assembling the preprocessed original load data into an input state s.
In this embodiment, preprocessing the original load data according to the load update period includes a binning operation, which can reduce the space required for storing the original load data and reduce the input dimension of the subsequent reinforcement learning model. The method comprises the following specific steps:
assuming that the load update period is T is one cycle, in the kth time period, the original load data in the kth period of the node n isAnd original load data +.>The dimension of the sampling point is the same as the number of sampling points in the updating period T; setting the number of the sub-boxes as B, and corresponding boundary value B i The value range of (b) is { b } i I 0 is less than or equal to i is less than or equal to B, i is E N; assuming that M clusters are arranged, the number of nodes of the mth cluster is N m
Then to the original load dataThe expression formula for performing the binning operation is as follows:
wherein C is m,n Representing the actual resource capacity of the nth node in the cluster m; the I function is an indication function;representing node load when load data +>When the value of (2) falls within the ith binning interval, node load +.>The value of (2) is correspondingly increased by 1.
The input state s in the present embodiment includes node information C n Node container informationNode history loadWherein the node information C n The method comprises the steps of cluster ID where the nth node is located, city ID where the nth node is located, actual resource capacity and current super sales ratio; node container information->Representing the number of online service Pod, the number of offline service Pod and the total resource application amount of Pod on the nth node in the kth load updating period in the last 7 days; node history load->Representing the historical load of the last 7 days of the nodes after binning.
S2: and constructing a deep Q network model for determining the overstock strategy, inputting an input state s into the deep Q network model, randomly selecting an action a by the deep Q network model with a certain probability, or selecting the action a which enables the deep Q network model to be optimal, and executing one-time overstock ratio prediction.
In this embodiment, a deep Q network model proposed by Google deep team is used, and a neural network is used instead of the value function in reinforcement learning. In the reinforcement learning model, the deep Q network model in this embodiment is used as an agent to interact with the environment, and according to the observation of the environment, the agent takes a corresponding decision and performs a corresponding action. The deep Q network model comprises a target Q network and an online Q network, and parameters of the deep Q network model are respectively represented as theta and theta'.
In the step, the input state s is input into an online Q network, the online Q network randomly selects an action a with a certain probability, or selects the action a which enables the depth Q network model to be optimal, and one time of super sales ratio prediction is executed. Wherein each action a represents all possible super-selling ratios, which are themselves a continuous value, which is discretized into dimension B, which is the bin count, for simplicity; here, action a selected by the online Q network represents the predicted overstock ratio of the agent.
In addition, in this embodiment, after the input state s is input into the deep Q network model, the deep Q network model calculates a Q value corresponding to each action a according to the input state s, and determines whether the current action a optimizes the deep Q network model according to the Q value corresponding to the action a; the calculation formula of the Q value is as follows:
Q(s,a,θ)=r′ current +γr′ future
wherein the Q value represents the instantaneous prize r 'available to the deep Q network model in the case of state s and execution of action a' current And a predictive value r 'for future rewards' future The method comprises the steps of carrying out a first treatment on the surface of the Gamma represents the discount factors for the instant and future rewards; θ is a parameter of the deep Q network model.
S3: the selected action a is evaluated by the reward function, yielding the reward r and entering the next state s'.
When the agent selects an action a, that is, executes a super-sell ratio prediction, the action needs to be evaluated by using a reward function, so that the state of the super-sell node is estimated by using the relevant statistical characteristics of the cluster. The present embodiment includes the average resource utilization of cluster m by collecting important statistics (portraits) for each clusterResource utilization maximum->Resource utilizationMinimum->Etc.
In this step, the selected action a is evaluated by a reward function, wherein the calculation formula of the reward r is as follows:
the current decision cost of the intelligent agent is a weighted sum of excessive loss and loss of shortage, and the reciprocal of the decision cost is used as a reward function in the embodiment; w (w) o And w u Weight factors representing excessive loss and loss of shortage, respectively, and w o +w u =1. The two trade-off factors are also different for different types of resources, e.g. the risk of excessive amounts of memory resources is greater than for CPU and memory, thus corresponding excessive losses trade-off factor w o Will be set larger.
o k Indicating excessive loss, i.e. when the load of the node is higher than L target The high load alarm risk existing in the time node; the expression formula is as follows:
u k indicating loss of demand, i.e. when the load of the node is below L target The resource waste exists in the time node; the expression formula is as follows:
wherein L is target The target load level of the preset node is set according to practical experience, and the target load level is used for indicating that the utilization rate of the node resources near the level line is ideal; h is a o And h u Half-lives representing excessive loss and loss of ullage, respectively;
representing an estimate of the load of the super-sold node, the calculation of which uses different estimated calculations depending on its resource type. For compressible resources, the average resource utilization of cluster m is used +.>Estimating the load state of the super-sold node; for incompressible resources, when a node cannot provide the memory size required by a process, the process is directly terminated by the operating system, and the maximum utilization rate of cluster m is required to be adopted>The load status of the super-sold node is estimated for guiding the agent to take action more conservatively.
Taking the resource as CPU (compressible resource) as an example, estimation of the load of the super-sold nodeThe calculation formula of (2) is as follows:
s4: the input state s, action a, prize r, next state s 'are formed into quadruples (s, a, r, s') and placed in a buffer as training samples.
S5: and when a preset training interval is reached, e training samples are sampled from the buffer memory and input into the online Q network for training, and the parameter theta' of the online Q network is updated.
In this step, a gradient descent algorithm pair (y-Q (s, a, θ') 2 Updating the parameter theta'; wherein, the expression formula of y is as follows:
y=r+γmax a′ Q(s′,a′,θ)
where y represents the current target Q value and a 'represents the action performed in the next state s'.
S6: and after the online Q network is trained by the E round, assigning the parameter theta' to the parameter theta of the target Q network, updating the target Q network, and applying the target Q network with the updated parameters to determine the overstock strategy.
Further, the target Q network with the updated parameters is deployed outside the cluster to serve as a super-selling strategy interface, and an event interception module is deployed in the cluster to improve the processing capacity of node load change. And the event interception module intercepts Pod creation, deletion events and node heartbeat events by adopting an admission controller provided by k8 s.
Fig. 2 is a schematic flow chart of an event interception module according to the present embodiment.
The event interception module performs the steps of:
1) The k8s server hopes to store the real resource state of the node into a database through a heartbeat event;
2) Intercepting a node heartbeat event, and replacing the real resource state of the node with a super-selling calculation result;
3) Intercepting the creation and deletion events of Pod in the cluster, wherein the corresponding node is marked as a dirty node;
4) Adding the dirty node into a node pool;
5) The independent thread monitors the node state in the local cache;
6) When no Pod creation or deletion event occurs on the node, marking the node as an 'old node' and adding the 'old node' into a node pool;
7) The nodes in the node pool call the super-selling strategy interface concurrently;
8) And calling a super-selling value strategy interface for the node with high load alarm.
In the embodiment, k8s is used as a management scheme of the container cloud platform, and the container cloud cluster resource utilization optimization method based on deep reinforcement learning is provided for solving the problem of cluster resource utilization rate caused by a static scheduling mechanism built in k8s and improving the cluster resource utilization rate under the condition of reducing interference to running service as much as possible.
The embodiment fully considers the requirements of different services on the availability, quality and the like of the services, and the proposed super-selling scheme based on the capacity state of the k8s node resource has transparency on the containers at any stage, and can super-sell the cluster resources on the premise of meeting the service availability requirements of the service provider. The embodiment provides a technical scheme such as a deep reinforcement learning model for determining a super-selling strategy, a reward function based on cluster images, an event interception module design based on timing update and event driving, and the like, and specifically, the embodiment trains the super-selling strategy model (deep Q network model) based on the deep reinforcement learning, with the states (node information and current load states) of nodes in the clusters as network inputs, and with the aim of reducing the excessive load risk caused by super-selling and reducing resource waste, thereby effectively improving the resource utilization rate of the clusters.
The same or similar reference numerals correspond to the same or similar components;
the terms describing the positional relationship in the drawings are merely illustrative, and are not to be construed as limiting the present patent;
it is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims (10)

1. The container cloud cluster resource utilization optimization method based on deep reinforcement learning is characterized by comprising the following steps of:
s1: preprocessing original load data, and assembling the preprocessed original load data into an input state s; the preprocessing of the original load data comprises a box-dividing processing operation;setting the number of the sub-boxes as B, and corresponding boundary value B i The value range of (b) is { b } i |0≤i≤B,i∈N};
S2: constructing a depth Q network model for determining a super-selling strategy, inputting an input state s into the depth Q network model, randomly selecting an action a by the depth Q network model with a certain probability, or selecting the action a which enables the depth Q network model to be optimal, and executing super-selling ratio prediction once; after the input state s is input into the depth Q network model, the depth Q network model calculates a Q value corresponding to each action a according to the input state s, and judges whether the current action a optimizes the depth Q network model according to the Q value corresponding to the action a; the calculation formula of the Q value is as follows:
Q(s,a,θ)=r′ current +γr′ future
wherein the Q value represents the instantaneous prize r 'available to the deep Q network model in the case of state s and execution of action a' current And a predictive value r 'for future rewards' future The method comprises the steps of carrying out a first treatment on the surface of the Gamma represents the discount factors for the instant and future rewards; θ is a parameter of the deep Q network model;
s3: evaluating the selected action a through a reward function to obtain a reward r and entering a next state s'; wherein the selected action a is evaluated by a reward function, wherein the calculation formula of the reward r is as follows:
wherein w is o And w u Weight factors representing excessive loss and loss of shortage, respectively, and w o +w u =1;o k Indicating excessive loss, i.e. when the load of the node is higher than L target The high load alarm risk existing in the time node; the expression formula is as follows:
u k indicating loss of absence, i.e. whenThe load of the node is lower than L target The resource waste exists in the time node; the expression formula is as follows:
wherein L is target A target load level for a preset node; h is a o And h u Half-lives representing excessive loss and loss of ullage, respectively;representing an estimate of the superpose node load;
s4: the input state s, the action a, the reward r and the next state s' are formed into a quadruple and are taken as training samples to be put into a cache;
s5: when a preset training interval is reached, e training samples are sampled from the cache and input into the depth Q network model for training, and parameters of the depth Q network model are updated; wherein e is a positive integer;
s6: and after the deep Q network model is trained by the E wheel, applying the deep Q network model with the updated parameters to determine the overstock strategy.
2. The container cloud cluster resource utilization optimization method of claim 1, wherein in step S1, preprocessing the raw load data includes a binning operation; the method comprises the following specific steps:
assuming that the load update period is T, in the kth time period, the original load data in the kth period of the node n isAnd original load data +.>The dimension of the sampling point is the same as the number of sampling points in the updating period T; setting the number of the sub-boxes as B, and corresponding boundary value B i The value range of (b) is { b } i |0≤i≤B,iE N }; assuming that M clusters are arranged, the number of nodes of the mth cluster is N m
Then for the original load data l k n The expression formula for performing the binning operation is as follows:
wherein C is m,n Representing the actual resource capacity of the nth node in the cluster m; the I function is an indication function;representing node load when load data +>When the value of (2) falls within the ith binning interval, node load +.>The value of (2) is correspondingly increased by 1.
3. The container cloud cluster resource utilization optimization method of claim 2, wherein the input state s includes node information C therein n Node container informationNode history load->Wherein the node information C n The method comprises the steps of cluster ID where the nth node is located, city ID where the nth node is located, actual resource capacity and current super sales ratio; node container information->Representing the number of online service Pod, the number of offline service Pod and the total resource application amount of Pod on the nth node in the kth load updating period in the last 7 days; node history load->Representing the historical load of the last 7 days of the nodes after binning.
4. The container cloud cluster resource utilization optimization method according to claim 2, wherein in the step S2, after the input state S is input into the depth Q network model, the depth Q network model calculates a Q value corresponding to each action a according to the input state S, and judges whether the current action a optimizes the depth Q network model according to the Q value corresponding to the action a; the calculation formula of the Q value is as follows:
Q(s,a,θ)=r current +γr future
wherein the Q value represents the instantaneous prize r available to the deep Q network model in the case of state s and execution of action a current And a predicted value r of future rewards future The method comprises the steps of carrying out a first treatment on the surface of the Gamma represents the discount factors for the instant and future rewards; θ is a parameter of the deep Q network model.
5. The container cloud cluster resource utilization optimization method of claim 4, wherein in step S3, the selected action a is evaluated by a reward function, wherein a calculation formula of the reward r is as follows:
wherein w is o And w u Weight factors representing excessive loss and loss of shortage, respectively, and w o +w u =1;o k Indicating excessive loss, i.e. when the load of the node is higher than L target The high load alarm risk existing in the time node; the expression formula is as follows:
u k indicating loss of demand, i.e. when the load of the node is below L target The resource waste exists in the time node; the expression formula is as follows:
wherein L is target A target load level for a preset node; h is a o And h u Half-lives representing excessive loss and loss of ullage, respectively;representing an estimate of the superpose node load.
6. The container cloud cluster resource utilization optimization method of claim 5, wherein said estimate of super-sell node loadFor compressible resources, the average resource utilization of cluster m is used +.>Estimating the load state of the super-sold node; the calculation formula is as follows:
for incompressible resources, when a node cannot provide the memory required by a process, the process is directly terminated by an operating system, and the maximum utilization rate of a cluster m is required to be adoptedThe load status of the super-sold node is estimated.
7. The container cloud cluster resource utilization optimization method of claim 6, wherein said deep Q network model comprises a target Q network and an online Q network, parameters of which are denoted as θ and θ', respectively;
in the step S2, an online Q network is adopted to select an action a;
in the step S5, when a preset training interval is reached, e training samples are sampled from the cache and input into the online Q network for training, and the parameter theta' is updated;
in the step S6, after the online Q network is trained by the E-turn, the parameter θ' after completing the parameter update is used to update the parameter θ of the target Q network, and the target Q network is applied to determine the overstock policy.
8. The method of optimizing resource utilization of container cloud clusters as claimed in claim 7, wherein in said step S5, a gradient descent algorithm pair (y-Q (S, a, θ') 2 Updating the parameter theta'; wherein, the expression formula of y is as follows:
y=r+γmax a′ Q(s′,a′,θ)
where y represents the current target Q value and a 'represents the action performed in the next state s'.
9. The container cloud cluster resource utilization optimization method of any of claims 1-8, further comprising the steps of:
s7: deploying the depth Q network model with the updated parameters outside the cluster as a super-selling strategy interface to provide service, and deploying an event interception module in the cluster; and the event interception module intercepts Pod creation, deletion events and node heartbeat events by adopting an admission controller provided by k8 s.
10. The container cloud cluster resource utilization optimization method of claim 9, wherein the event interception module performs the steps of:
1) The k8s server hopes to store the real resource state of the node into a database through a heartbeat event;
2) Intercepting a node heartbeat event, and replacing the real resource state of the node with a super-selling calculation result;
3) Intercepting the creation and deletion events of Pod in the cluster, wherein the corresponding node is marked as a dirty node;
4) Adding the dirty node into a node pool;
5) The independent thread monitors the node state in the local cache;
6) When no Pod creation or deletion event occurs on the node, marking the node as an 'old node' and adding the 'old node' into a node pool;
7) The nodes in the node pool call the super-selling strategy interface concurrently;
8) And calling a super-selling value strategy interface for the node with high load alarm.
CN202011225270.4A 2020-11-05 2020-11-05 Container cloud cluster resource utilization optimization method based on deep reinforcement learning Active CN112416578B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011225270.4A CN112416578B (en) 2020-11-05 2020-11-05 Container cloud cluster resource utilization optimization method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011225270.4A CN112416578B (en) 2020-11-05 2020-11-05 Container cloud cluster resource utilization optimization method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN112416578A CN112416578A (en) 2021-02-26
CN112416578B true CN112416578B (en) 2023-08-15

Family

ID=74828183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011225270.4A Active CN112416578B (en) 2020-11-05 2020-11-05 Container cloud cluster resource utilization optimization method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN112416578B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113485792B (en) * 2021-07-08 2023-05-26 厦门服云信息科技有限公司 Pod scheduling method in kubernetes cluster, terminal equipment and storage medium
CN114389990A (en) * 2022-01-07 2022-04-22 中国人民解放军国防科技大学 Shortest path blocking method and device based on deep reinforcement learning
CN115130929B (en) * 2022-08-29 2022-11-15 中国西安卫星测控中心 Resource pool intelligent generation method based on machine learning classification

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109491790A (en) * 2018-11-02 2019-03-19 中山大学 Industrial Internet of Things edge calculations resource allocation methods and system based on container
CN110427261A (en) * 2019-08-12 2019-11-08 电子科技大学 A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree
CN110688202A (en) * 2019-10-09 2020-01-14 腾讯科技(深圳)有限公司 Service process scheduling method, device, equipment and storage medium
WO2020206705A1 (en) * 2019-04-10 2020-10-15 山东科技大学 Cluster node load state prediction-based job scheduling method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109491790A (en) * 2018-11-02 2019-03-19 中山大学 Industrial Internet of Things edge calculations resource allocation methods and system based on container
WO2020206705A1 (en) * 2019-04-10 2020-10-15 山东科技大学 Cluster node load state prediction-based job scheduling method
CN110427261A (en) * 2019-08-12 2019-11-08 电子科技大学 A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree
CN110688202A (en) * 2019-10-09 2020-01-14 腾讯科技(深圳)有限公司 Service process scheduling method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Mohamed Handaoui et al..ReLeaSER: A Reinforcement Learning Strategy for Optimizing Utilization Of Ephemeral Cloud Resources.《arXiv:2009.11208v3》.2020,第1-9页. *

Also Published As

Publication number Publication date
CN112416578A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
CN112416578B (en) Container cloud cluster resource utilization optimization method based on deep reinforcement learning
US8276143B2 (en) Dynamic scheduling of application tasks in a distributed task based system
CN111274036B (en) Scheduling method of deep learning task based on speed prediction
CN113806018B (en) Kubernetes cluster resource mixed scheduling method based on neural network and distributed cache
CN106020941A (en) Selecting Resource Allocation Policies and Resolving Resource Conflicts
US11757790B2 (en) Method and server for adjusting allocation of computing resources to plurality of virtualized network functions (VNFs)
CN112799817A (en) Micro-service resource scheduling system and method
CN111045820A (en) Container scheduling method based on time sequence prediction
CN115543577B (en) Covariate-based Kubernetes resource scheduling optimization method, storage medium and device
CN117349026B (en) Distributed computing power scheduling system for AIGC model training
CN113190342B (en) Method and system architecture for multi-application fine-grained offloading of cloud-edge collaborative networks
Yang et al. Design of kubernetes scheduling strategy based on LSTM and grey model
US20230254214A1 (en) Control apparatus, virtual network assignment method and program
CN116185584A (en) Multi-tenant database resource planning and scheduling method based on deep reinforcement learning
CN116389255A (en) Service function chain deployment method for improving double-depth Q network
CN114466014B (en) Service scheduling method and device, electronic equipment and storage medium
US20220051135A1 (en) Load balancing using data-efficient learning
Xiao et al. Learning task allocation for multiple flows in multi-agent systems
Qian et al. A Reinforcement Learning-based Orchestrator for Edge Computing Resource Allocation in Mobile Augmented Reality Systems
CN116820730B (en) Task scheduling method, device and storage medium of multi-engine computing system
Feng et al. Tango: Harmonious Management and Scheduling for Mixed Services Co-located among Distributed Edge-Clouds
CN116541178B (en) Dynamic load balancing method and device for Docker cloud platform
CN107729150A (en) A kind of addressing method of isomeric group safety supervision equipment least energy consumption node
CN117851041A (en) Fragment prediction and avoidance system and method based on resource pool scheduling
CN116090791A (en) Cloud edge production scheduling regulation and control method based on GRN-RL in high-frequency disturbance environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant