CN113886095A - Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning - Google Patents
Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning Download PDFInfo
- Publication number
- CN113886095A CN113886095A CN202111487809.8A CN202111487809A CN113886095A CN 113886095 A CN113886095 A CN 113886095A CN 202111487809 A CN202111487809 A CN 202111487809A CN 113886095 A CN113886095 A CN 113886095A
- Authority
- CN
- China
- Prior art keywords
- action
- value
- state
- reinforcement learning
- container
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/048—Fuzzy inferencing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45583—Memory management, e.g. access or allocation
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Automation & Control Theory (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a container memory elastic expansion method based on the combination of fuzzy inference and reinforcement learning, which maps a continuous high-dimensional state space represented by service performance and resource use condition variables into a discrete low-dimensional state space through a fuzzy inference system FIS; in the operation process of the container service in the cloud environment, the learning process of the container service is continuously optimized according to the monitoring data, the optimal decision is made, the elasticity coefficient is output to the load prediction algorithm, and the container is guided to obtain more appropriate resource allocation. Therefore, a closed loop is formed, and the problem of elastic expansion dynamic optimization in the cloud environment is solved.
Description
Technical Field
The invention relates to the technical field of container virtualization, in particular to a container memory elastic expansion method based on the combination of fuzzy reasoning and reinforcement learning.
Background
In recent years, cloud computing supports numerous application services by virtue of an emerging computing framework, strong computing power and a convenient management mode. The container virtualization technology represented by Docker gradually replaces the conventional virtual machine technology by virtue of the characteristic of light weight. However, the workload of cloud computing applications varies over time, and static resource allocation that meets peak demand can result in significant waste of resources, whereas maintaining average computing resources can result in reduced service performance and levels. Therefore, an elastic scaling technology is provided, which can automatically adjust the size of the computing resource according to the current or future business requirement change, thereby improving the service efficiency and reducing the service cost. Currently, a plurality of elastic expansion algorithms for the container memory exist, and how to dynamically optimize elastic expansion parameters to realize reasonable distribution of the container memory and guarantee service performance is a problem worthy of attention and research.
Elastic expansion is one of key technologies of cloud computing resource management, so that a cloud infrastructure can adjust supply resources according to load requirements of cloud application programs, and the functional requirements of cloud resource allocation according to requirements are met. The elastic scaling direction mainly has two modes, namely horizontal elasticity and vertical elasticity, wherein the horizontal elasticity requires an application program to provide distributed support so as to decompose the application program into a plurality of computing instances, and then the load capacity of an application service is adjusted by adding and removing the instances. However, vertical elastic performance is better when there are sufficient resources available. Vertical resiliency is the adjustment of the load capacity of an application by changing the resource quota of a single compute node or compute instance, which is a fine-grained adjustment. And vertical resiliency applies to any application, it eliminates the overhead of starting instances in horizontal resiliency, such as load balancers or replicated additional instances, ensuring that the application's communication connection is not interrupted when scaling, while horizontal resiliency only applies to replicated or decomposed applications.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a container memory elastic expansion method based on the combination of fuzzy reasoning and reinforcement learning. Therefore, a closed loop is formed, and the problem of elastic expansion dynamic optimization in the cloud environment is solved.
The object of the present invention is achieved by the following technical means. A container memory elastic expansion method based on fuzzy reasoning and reinforcement learning comprises the following steps:
(1) and high-dimensional continuous state space mapping based on a fuzzy inference system: mapping a continuous high-dimensional state space represented by service performance and resource use condition variables into a discrete low-dimensional state space through a fuzzy inference system FIS; when the state space of the container is specifically constructed, elastic expansion is performed by detecting the service quality performance of the container under different load conditions and combining the resource use condition;
(2) elastic stretching dynamic optimization based on reinforcement learning: with the continuous operation of container services in the cloud environment, the generated monitoring data is used as a training data set of the reinforcement learning algorithm, the reinforcement learning algorithm continuously optimizes the learning process of the reinforcement learning algorithm according to the monitored data, makes an optimal decision, outputs an elastic coefficient to a load prediction algorithm, and guides the container to obtain more appropriate resource allocation.
Furthermore, selecting GC duration as a service performance index, comparing the detected service quality with the expected service quality, and comparing the detected service quality with the expected service quality by using the difference e between the detected service quality and the expected service quality, namely selecting different GC difference values as the definition standard of the system state space; and (3) establishing a rule by using a fuzzy inference system, if the difference value is positive or negative, taking zero as a central state, extending to negative infinity and positive infinity from two sides, and taking the zero as a symmetric center to divide different state spaces.
Furthermore, a reinforcement learning algorithm is used for vertical elastic expansion decision in a cloud computing resource allocation scene, and the specific learning process is as follows:
(1) first define a good state andan action space, then initializing a Q value table according to the state and the action space, initializing the Q values to 0, detecting and selecting a state of the agent in the systemsSelecting corresponding action according to action selection strategyaAnd performing, i.e. selecting an optimized spring constantpoleAnd outputting the data to a prediction algorithm; the agent receives the feedback report after the execution is finishedrUpdating the Q value according to the feedback report, and converging to an optimal Q value table after circulating for multiple times; the Q value updating formula is as follows:
wherein the content of the first and second substances,represents in a stateTime-selective execution of actionsT denotes the number of the first,represents the learning rate, i.e., the ratio of the learned reward value to this Q value after the most recent action;for the reward decay factor, i.e. the impact factor of future rewards on the present, this formula represents that the maximum Q value selected in the next state is multiplied by the decay factor plus the real reward update Q value;
(2) and constructing an action space of the intelligent agent: selecting different elastic coefficients to form a system action space, and calculating the Q value benefits of each action, namely different elastic coefficients, in different states through a reinforcement learning algorithm so as to obtain the optimal elastic coefficient;
(3) firstly, an objective function is taken as a standard to measure the resource distribution benefit, and the objective function is defined as:
whereinWhich is indicative of the quality of service,which represents the cost of the resources to be used,andis used for showingAndthe weight of (c);
using the garbage recycling duration GC in the program running process as an index for specifically measuring the periodic service quality, and the resource costThen, the memory resource allocated to the target function is represented, and the target function is updated as follows:
objective functionRepresenting the value of the system state at the current moment, and defining the reward function as the difference value between the value of the system state after the execution of an action and the value of the system state at the previous moment t:
If the value of the system state is increased after a certain action is executed, the reward is positive, the updating Q value is increased, the action brings positive benefit, and when the state is met later, the probability of selecting the action is increased; conversely, the value of the system is reduced after a certain action is executed, which means that the action brings negative income, and then the probability of selecting the action is reduced;
(4) determining an action strategy, and starting a learning process: use ofPolicy, first defining oneAt the beginning of each experiment, randomly one is greater than 0 and less than 1Value if the value is less thanThen randomly selecting an action; if so, the action with the highest average income is selectedaThe formula is as follows:
the invention has the beneficial effects that: the invention provides a container memory elastic expansion optimization method based on the combination of a fuzzy inference system and reinforcement learning, the garbage recycling time can measure the service performance, the service performance can be used as the state representation of an intelligent agent, and the dimension reduction is carried out on a high-dimensional continuous state space based on the fuzzy inference system. And (3) learning dynamic optimization of the elastic coefficient in the elastic expansion and contraction of the container memory from the sequence data by using a Q-learning algorithm according to the monitoring data of the historical time sequence, so that the subsequent elastic expansion and contraction mechanism dynamic adjustment of the memory has foresight. The invention can solve the problem of dynamic adjustment and optimization of the container memory in the cloud computing elastic expansion problem, and saves memory resources while ensuring the service quality.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
FIG. 2 is a time differential layout of a state space.
Detailed Description
The invention will be described in detail below with reference to the following drawings:
the invention designs and realizes a container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning, and provides an algorithm for dynamically optimizing an elastic expansion coefficient based on combination of memory resource use conditions and garbage recovery service performance aiming at a memory elastic expansion mechanism algorithm under a container cloud environment. The invention combines a fuzzy inference system with reinforcement learning to realize dynamic elastic expansion coefficient optimization.
The method mainly comprises two parts, namely high-dimensional continuous state space mapping based on a fuzzy inference system and elastic stretching dynamic optimization based on reinforcement Learning (Q-Learning), and the specific working flow is shown in figure 1:
1. high-dimensional continuous state space mapping based on fuzzy inference system
Service performance and resource usage reflect that the memory load of the container is dynamically changing over time. The invention introduces Fuzzy Inference System (FIS) to map the continuous state space expressed by service performance and resource use condition variable into discrete Fuzzy semantics, thereby solving the problem of high-dimensional continuous state space.
The FIS maps a set of inputs to ideal outputs by fuzzy rules, and a FIS with n fuzzy rules can be expressed as:
is the environmental stateMiddle variableIn the first placeThe corresponding semantic values under the rule (e.g., "high", "medium", "low"),is the inference output for each rule, and b is the set in which the inference output resides. Rules in the form of fuzzy rules may take appropriate action in the face of certain situations, and when multiple rules are included in a fuzzy inference system, and the same input may satisfy multiple rules simultaneously, it is necessary to handle such conflicts through weighted defuzzification. For output y:
whereinRepresenting the number of rules, a high-dimensional continuous state space can be converted into a discrete state space using a fuzzy inference system. When the state space of the container is specifically constructed, elastic expansion and contraction are performed by detecting the service quality performance of the container under different load conditions and combining the resource use condition.
We have chosen the garbage reclamation duration as the service quality performance index. The Garbage Collection mechanism is also called as garpage Collection, GC for short, and aims to clear "Garbage" objects that are no longer used in the memory and release the memory space. The GC time consumption can intuitively reflect the service performance, the index can also be detected in the program running period, therefore, the GC time length is selected as the service performance index, the detected service quality is compared with the expected service quality, and the difference (e) between the detected service quality and the expected service quality is compared, namely, different GC difference values are selected as the definition standard of the system state space. Because the difference is continuous, a fuzzy inference system is used for constructing a rule, if the difference has positive or negative values, the state takes zero as a central state, extends to negative infinity and positive infinity from two sides, and takes the zero as a symmetrical center to divide different state spaces. For example, the time difference distribution of the state space is shown in fig. 2.
Wherein the time axis represents the difference between the GC time detected to be served by the container in unit time and the expected GC time, the leftmost side and the rightmost side represent positive infinity and negative infinity in sequence, and the time axis represents time from left to right in sequenceAnd the time points of the same subscript take a zero point as a symmetrical center. The time interval can then be divided according to the time point as a specific representation of the state s.
Then the fuzzy inference rule is as follows:
based on the fuzzy inference system rule, the high-dimensional continuous state space is mapped into the low-dimensional discrete state space, so that the subsequent reinforcement learning algorithm is convenient to perform elastic expansion dynamic optimization,
2. elastic stretching dynamic optimization based on reinforcement Learning (Q-Learning)
Reinforcement learning is the science of trying to make optimal decisions. The intelligent agent learning method simulates the process of learning by human, enables the intelligent agent to learn in a trial and error mode, and guides own behavior by interacting with the environment to obtain rewards, and aims to enable the intelligent agent to obtain the maximum rewards. Reinforcement learning differs from supervised learning in that reinforcement signals, generated through feedback of own experience, are an assessment of the goodness and badness of behavioral actions, and thus do not require a large number of labeled dataset labels. The advantage of no need of prior knowledge makes reinforcement learning suitable for solving the problem of resource allocation in a complex cloud environment.
In the research scenario of the invention, the intelligent agent is a container acted by a load prediction algorithm in an elastic expansion mechanism, the purpose is to optimize the elastic coefficient in the load prediction algorithm, and the purpose is to accurately predict the memory resource supply in the next time period, so that the container cloud service keeps higher performance and does not waste excessive resources, and the system environment is the load born by the application service and the load processing condition.
The optimized object is an elastic coefficient which represents the dynamic change rate of the container memory when the container elastically stretches. Since there are different elastic rate requirements for different scenarios. Elastic force coefficients cannot be accurately set at first, positive correlation prediction can be made only according to load fluctuation conditions, and moreover, a sufficient labeled training data set is not available in resource adjustment of a container cloud environment, so that a learning model which needs a large amount of historical data and a large amount of training time is not suitable for use. In contrast, under the action of the load prediction algorithm and the elastic scaling mechanism, the container service faces different traffic loads, different state expressions are generated, and the monitor acquires relevant performance data, so that the data required by the reinforcement learning training becomes data. Therefore, along with the continuous operation of the container service in the cloud environment, the generated monitoring data can be used as a training data set of the reinforcement learning algorithm, the reinforcement learning algorithm continuously optimizes the learning process of the reinforcement learning algorithm according to the monitored data, makes an optimal decision, outputs an elastic coefficient to the load prediction algorithm, and guides the container to obtain more appropriate resource allocation, so that the problem of the elastic coefficient optimization of the load prediction algorithm in the cloud environment is solved.
The invention provides a method for performing vertical elastic expansion and contraction decision by using a Q-learning algorithm in a cloud computing resource allocation scene, which is characterized in that the Q-learning algorithm is combined with a Fuzzy Inference System (FIS), wherein the Fuzzy inference system is introduced in the previous section for reducing the dimension of a high-dimensional continuous state space. Q-learning is described in detail below.
The core Q-learning algorithm is an off-phase time-difference learning algorithm, i.e. two control strategies are used, one for selecting a new action, e.g.One strategy, another for updating the cost function, such as a greedy strategy. So it is also called iso-policy.
The core of the Q-learning algorithm is to construct a Q value table, namely a state-action value table, wherein the Q value represents the value generated by executing each action in a certain state, and then the action capable of obtaining the maximum profit is selected according to the Q value. For example, as shown in the following table, the first leftmost column contains three statesThe top row includes two actions,Represents in a stateTime-selective execution of actionsThe number of states and the number of actions determine the dimensionality of the Q value table, and the larger the dimensionality, the higher the convergence complexity of the Q value.
Q value Table example
The learning process of the Q-learning algorithm comprises the steps of firstly defining a state space and an action space, then initializing a Q value table according to the state space and the action space, initializing Q values to be 0, detecting and selecting a certain state of an intelligent agent in the systemsSelecting corresponding action according to action selection strategyaAnd performing, i.e. selecting an optimized spring constantpoleAnd outputting the data to a prediction algorithm. The agent receives the feedback report after the execution is finished rAnd updating the Q value according to the feedback report, and converging to the optimal Q value table after circulating for multiple times. The Q value updating formula is as follows:
whereinRepresents the learning rate, i.e., the ratio of the learned reward value to this Q value after the most recent action;the decay factor, i.e., the impact factor of future rewards on the present, is awarded. This equation represents the maximum Q selected in the next state multiplied by the decay factor plus the true reward update Q. Therefore, the Q-learning algorithm can select local optimal action according to the existing knowledge, and can learn a global optimal strategy through continuously optimizing the Q value table by feedback and reward.
By utilizing historical data in the system and service operation process, the estimation learning process of the maximum Q values of different elastic parameters under different container cloud service environments can be completed. The main work in the model building process is described below.
(1) The state space of the agent has been described in the previous section.
(2) Constructing an action space of the agent: the purpose of using the reinforcement learning method is to optimize the elastic coefficient in the load prediction algorithm, so different elastic coefficients are directly selected to form a system action space, and Q-value benefits of various actions, namely different elastic coefficients, in different states are calculated through a Q-learning algorithm, so that the optimal elastic coefficient is obtained. The elastic coefficient takes the elastic coefficient as a symmetrical center, and different multiplying powers are respectively increased upwards and downwards. So the motion space a can be expressed as:
(3) designing reward functions of the system to better research the reasonable distribution problem of cloud resources andrationalizing the parameters, firstly, measuring the resource allocation benefit by taking an objective function as a standard, and defining the objective function as:
whereinWhich is indicative of the quality of service,which represents the cost of the resources to be used,andis used for showingAndand these two weights will be predefined by the cloud provider reflecting its different preferences for service performance and resource cost. For example, compareHigh isPresentation pairIs more demanding and therefore may require more resources to guarantee the same workloadIn addition ifHigher, meaning a higher sensitivity to resource cost, and therefore less resources need to be controlled to handle the workload. In practice, the quality of service and the cost of resources are essentially inversely proportional, which is undoubted, so a more reasonable resource allocation plan should be to control the minimization of the objective function value.
Quality of serviceThe measurement indexes of (1) mainly comprise availability, throughput, time delay, packet loss rate and the like. In the algorithm, the garbage recycling duration in the program operation process selected in the previous section is still used as an index for specifically measuring the periodic service quality. Cost of resourcesIt indicates the memory resources allocated for it, not the memory resources actually used for it, because the ultimate goal is to optimize the resource allocation policy, rather than reduce the resource consumption of the program. Then the objective function can be updated as:
in order to reduce resource cost, a good resource allocation strategy should be allocated as needed for service performance during program operation. So the objective functionRepresenting the value of the system state at the current time, defining the reward function as the difference between the value of the system state after the execution of an action and the value of the system state at the previous time:
if the value of the system state increases after an action is performed, the reward is positive, the update Q value increases, indicating that the action brings a positive benefit, and the probability of selecting such an action increases when such a state is encountered later. Conversely, the value of the system decreases after an action is performed, indicating that the action brings negative benefits, and then the probability of selecting such action decreases.
(4) And determining an action strategy and starting a learning process. The intelligent agent is completely strange to the environment at the beginning, does not know how to work, needs to explore and learn step by step, and finally makes the best decision. This process is the always-emphasized employment of reinforcement learning&exploration, i.e., a tradeoff between exploration and utilization. Since the agent can only initially be explored by trial and error, i.e. the process of exploration. Certainly, the trial and error herein does not cause the memory resource expansion and contraction to be performed in the opposite direction, but the degree of elastic expansion and contraction is changed on the premise of positive correlation, and the utilization process directly adopts the known behavior that can obtain good feedback. How to obtain a larger long-term reward by sacrificing some of the short-term rewards will be used hereStrategies solve this problem. First, define oneAt the beginning of each experiment, randomly one is greater than 0 and less than 1Value if the value is less thanThen randomly selecting an action; if so, the action with the highest current average benefit is selected. Formula (II)aIs represented as follows:
whereinThe value of (c) may decay with learning. At the beginning, willThe action selected in this way is almost random, so that the intelligent agent can become familiar with the system environment as soon as possible, i.e. the intelligent agent can be explored as far as possible, and with the progress of the learning process,the value decays and there is a greater likelihood of selecting the action with the greatest value reward, i.e. making the best decision using the learned result as much as possible.
It should be understood that equivalent substitutions and changes to the technical solution and the inventive concept of the present invention should be made by those skilled in the art to the protection scope of the appended claims.
Claims (3)
1. A container memory elastic expansion method based on fuzzy reasoning and reinforcement learning is characterized in that: the method comprises the following steps:
(1) and high-dimensional continuous state space mapping based on a fuzzy inference system: mapping a continuous high-dimensional state space represented by service performance and resource use condition variables into a discrete low-dimensional state space through a fuzzy inference system FIS; when the state space of the container is specifically constructed, elastic expansion is performed by detecting the service quality performance of the container under different load conditions and combining the resource use condition;
(2) elastic stretching dynamic optimization based on reinforcement learning: with the continuous operation of container services in the cloud environment, the generated monitoring data is used as a training data set of the reinforcement learning algorithm, the reinforcement learning algorithm continuously optimizes the learning process of the reinforcement learning algorithm according to the monitored data, makes an optimal decision, outputs an elastic coefficient to a load prediction algorithm, and guides the container to obtain more appropriate resource allocation.
2. The container memory elastic expansion method based on the combination of fuzzy inference and reinforcement learning of claim 1, which is characterized in that: selecting GC duration as a service performance index, comparing the detected service quality with the expected service quality, and comparing the detected service quality with the expected service quality by using a difference e between the detected service quality and the expected service quality, namely selecting different GC difference values as definition standards of a system state space; and (3) establishing a rule by using a fuzzy inference system, if the difference value is positive or negative, taking zero as a central state, extending to negative infinity and positive infinity from two sides, and taking the zero as a symmetric center to divide different state spaces.
3. The container memory elastic expansion method based on the combination of fuzzy inference and reinforcement learning of claim 1, which is characterized in that: in a cloud computing resource allocation scene, a reinforcement learning algorithm is used for making a vertical elastic expansion decision, and the specific learning process is as follows:
(1) firstly, defining state and action space, then initializing a Q value table according to the state and action space, initializing Q values to 0, detecting and selecting a certain state of the intelligent agent in the systemsSelecting corresponding action according to action selection strategyaAnd performing, i.e. selecting an optimized spring constantpoleAnd outputting the data to a prediction algorithm; the agent receives the feedback report after the execution is finishedrUpdating the Q value according to the feedback report, and converging to an optimal Q value table after circulating for multiple times; the Q value updating formula is as follows:
wherein the content of the first and second substances,represents in a stateTime-selective execution of actionsT denotes the number of the first,represents the learning rate, i.e., the ratio of the learned reward value to this Q value after the most recent action;for the reward decay factor, i.e. the impact factor of future rewards on the present, this formula represents that the maximum Q value selected in the next state is multiplied by the decay factor plus the real reward update Q value;
(2) and constructing an action space of the intelligent agent: selecting different elastic coefficients to form a system action space, and calculating the Q value benefits of each action, namely different elastic coefficients, in different states through a reinforcement learning algorithm so as to obtain the optimal elastic coefficient;
(3) firstly, an objective function is taken as a standard to measure the resource distribution benefit, and the objective function is defined as:
whereinWhich is indicative of the quality of service,which represents the cost of the resources to be used,andis used for showingAndthe weight of (c);
using the garbage recycling duration GC in the program running process as an index for specifically measuring the periodic service quality, and the resource costThen, the memory resource allocated to the target function is represented, and the target function is updated as follows:
objective functionRepresenting the value of the system state at the current moment, and defining the reward function as the difference between the value of the system state after the execution of an action and the value of the system state at the previous moment t:
if the value of the system state is increased after a certain action is executed, the reward is positive, the updating Q value is increased, the action brings positive benefit, and when the state is met later, the probability of selecting the action is increased; conversely, the value of the system is reduced after a certain action is executed, which means that the action brings negative income, and then the probability of selecting the action is reduced;
(4) determining an action strategy, and starting a learning process: use ofPolicy, first defining oneAt the beginning of each experiment, randomly one is greater than 0 and less than 1Value if the value is less thanThen randomly selecting an action; if so, the action with the highest average income is selectedaThe formula is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111487809.8A CN113886095A (en) | 2021-12-08 | 2021-12-08 | Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111487809.8A CN113886095A (en) | 2021-12-08 | 2021-12-08 | Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113886095A true CN113886095A (en) | 2022-01-04 |
Family
ID=79016511
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111487809.8A Pending CN113886095A (en) | 2021-12-08 | 2021-12-08 | Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113886095A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116126534A (en) * | 2023-01-28 | 2023-05-16 | 哈尔滨工业大学(威海) | Cloud resource dynamic expansion method and system |
CN115460217B (en) * | 2022-11-10 | 2023-07-14 | 军事科学院系统工程研究院网络信息研究所 | Cloud service high availability decision-making method based on reinforcement learning |
CN116610534A (en) * | 2023-07-18 | 2023-08-18 | 贵州海誉科技股份有限公司 | Improved predictive elastic telescoping method based on Kubernetes cluster resources |
CN116610454A (en) * | 2023-07-17 | 2023-08-18 | 中国海洋大学 | MADDPG algorithm-based hybrid cloud resource elastic expansion system and operation method |
CN117891619A (en) * | 2024-03-18 | 2024-04-16 | 山东吉谷信息科技有限公司 | Host resource synchronization method and system based on virtualization platform |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112000459A (en) * | 2020-03-31 | 2020-11-27 | 华为技术有限公司 | Method for expanding and contracting service and related equipment |
CN112311578A (en) * | 2019-07-31 | 2021-02-02 | 中国移动通信集团浙江有限公司 | VNF scheduling method and device based on deep reinforcement learning |
CN113760497A (en) * | 2021-01-05 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Scheduling task configuration method and device |
-
2021
- 2021-12-08 CN CN202111487809.8A patent/CN113886095A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112311578A (en) * | 2019-07-31 | 2021-02-02 | 中国移动通信集团浙江有限公司 | VNF scheduling method and device based on deep reinforcement learning |
CN112000459A (en) * | 2020-03-31 | 2020-11-27 | 华为技术有限公司 | Method for expanding and contracting service and related equipment |
CN113760497A (en) * | 2021-01-05 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Scheduling task configuration method and device |
Non-Patent Citations (3)
Title |
---|
ARABNEJAD, ET AL.: "A Comparison of Reinforcement Learning Techniques for Fuzzy Cloud Auto-Scaling", 《2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING》 * |
FABIANA ROSSI,ET AL.: "Horizontal and Vertical Scaling of Container-based Applications using Reinforcement Learning", 《2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD)》 * |
曹宇,杨军: "一种基于深度学习的云平台弹性伸缩算法", 《计算机与现代化》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115460217B (en) * | 2022-11-10 | 2023-07-14 | 军事科学院系统工程研究院网络信息研究所 | Cloud service high availability decision-making method based on reinforcement learning |
CN116126534A (en) * | 2023-01-28 | 2023-05-16 | 哈尔滨工业大学(威海) | Cloud resource dynamic expansion method and system |
CN116610454A (en) * | 2023-07-17 | 2023-08-18 | 中国海洋大学 | MADDPG algorithm-based hybrid cloud resource elastic expansion system and operation method |
CN116610454B (en) * | 2023-07-17 | 2023-10-17 | 中国海洋大学 | MADDPG algorithm-based hybrid cloud resource elastic expansion system and operation method |
CN116610534A (en) * | 2023-07-18 | 2023-08-18 | 贵州海誉科技股份有限公司 | Improved predictive elastic telescoping method based on Kubernetes cluster resources |
CN116610534B (en) * | 2023-07-18 | 2023-10-03 | 贵州海誉科技股份有限公司 | Improved predictive elastic telescoping method based on Kubernetes cluster resources |
CN117891619A (en) * | 2024-03-18 | 2024-04-16 | 山东吉谷信息科技有限公司 | Host resource synchronization method and system based on virtualization platform |
CN117891619B (en) * | 2024-03-18 | 2024-06-11 | 山东吉谷信息科技有限公司 | Host resource synchronization method and system based on virtualization platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113886095A (en) | Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning | |
Gazori et al. | Saving time and cost on the scheduling of fog-based IoT applications using deep reinforcement learning approach | |
Elgendy et al. | Joint computation offloading and task caching for multi-user and multi-task MEC systems: reinforcement learning-based algorithms | |
Qi et al. | Knowledge-driven service offloading decision for vehicular edge computing: A deep reinforcement learning approach | |
CN112134916A (en) | Cloud edge collaborative computing migration method based on deep reinforcement learning | |
CN111641681A (en) | Internet of things service unloading decision method based on edge calculation and deep reinforcement learning | |
CN115686846B (en) | Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation | |
CN114615744A (en) | Knowledge migration reinforcement learning network slice general-purpose sensing calculation resource collaborative optimization method | |
CN116126534A (en) | Cloud resource dynamic expansion method and system | |
Li et al. | DQN-enabled content caching and quantum ant colony-based computation offloading in MEC | |
Li et al. | A modular neural network-based population prediction strategy for evolutionary dynamic multi-objective optimization | |
CN112036651A (en) | Electricity price prediction method based on quantum immune optimization BP neural network algorithm | |
Aslam et al. | Using artificial neural network for VM consolidation approach to enhance energy efficiency in green cloud | |
Chai et al. | A computation offloading algorithm based on multi-objective evolutionary optimization in mobile edge computing | |
CN112131089B (en) | Software defect prediction method, classifier, computer device and storage medium | |
CN117851056A (en) | Time-varying task scheduling method and system based on constraint near-end policy optimization | |
Ma et al. | Dynamic neural network-based resource management for mobile edge computing in 6g networks | |
CN112312299A (en) | Service unloading method, device and system | |
CN116959244A (en) | Vehicle network channel congestion control method and system based on regional danger | |
Huang et al. | Learning-aided fine grained offloading for real-time applications in edge-cloud computing | |
Li et al. | Dependency-aware task offloading based on deep reinforcement learning in mobile edge computing networks | |
Xin et al. | Genetic based fuzzy Q-learning energy management for smart grid | |
Tong et al. | D2op: A fair dual-objective weighted scheduling scheme in internet of everything | |
CN114385359B (en) | Cloud edge task time sequence cooperation method for Internet of things | |
CN111917854B (en) | Cooperation type migration decision method and system facing MCC |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220104 |
|
RJ01 | Rejection of invention patent application after publication |