CN113886095A

CN113886095A - Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning

Info

Publication number: CN113886095A
Application number: CN202111487809.8A
Authority: CN
Inventors: 刘东海; 徐育毅; 庞辉富
Original assignee: Hangzhou Youyun Software Co ltd; Beijing Guangtong Youyun Technology Co ltd
Current assignee: Hangzhou Youyun Software Co ltd; Beijing Guangtong Youyun Technology Co ltd
Priority date: 2021-12-08
Filing date: 2021-12-08
Publication date: 2022-01-04

Abstract

The invention provides a container memory elastic expansion method based on the combination of fuzzy inference and reinforcement learning, which maps a continuous high-dimensional state space represented by service performance and resource use condition variables into a discrete low-dimensional state space through a fuzzy inference system FIS; in the operation process of the container service in the cloud environment, the learning process of the container service is continuously optimized according to the monitoring data, the optimal decision is made, the elasticity coefficient is output to the load prediction algorithm, and the container is guided to obtain more appropriate resource allocation. Therefore, a closed loop is formed, and the problem of elastic expansion dynamic optimization in the cloud environment is solved.

Description

Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning

Technical Field

The invention relates to the technical field of container virtualization, in particular to a container memory elastic expansion method based on the combination of fuzzy reasoning and reinforcement learning.

Background

In recent years, cloud computing supports numerous application services by virtue of an emerging computing framework, strong computing power and a convenient management mode. The container virtualization technology represented by Docker gradually replaces the conventional virtual machine technology by virtue of the characteristic of light weight. However, the workload of cloud computing applications varies over time, and static resource allocation that meets peak demand can result in significant waste of resources, whereas maintaining average computing resources can result in reduced service performance and levels. Therefore, an elastic scaling technology is provided, which can automatically adjust the size of the computing resource according to the current or future business requirement change, thereby improving the service efficiency and reducing the service cost. Currently, a plurality of elastic expansion algorithms for the container memory exist, and how to dynamically optimize elastic expansion parameters to realize reasonable distribution of the container memory and guarantee service performance is a problem worthy of attention and research.

Elastic expansion is one of key technologies of cloud computing resource management, so that a cloud infrastructure can adjust supply resources according to load requirements of cloud application programs, and the functional requirements of cloud resource allocation according to requirements are met. The elastic scaling direction mainly has two modes, namely horizontal elasticity and vertical elasticity, wherein the horizontal elasticity requires an application program to provide distributed support so as to decompose the application program into a plurality of computing instances, and then the load capacity of an application service is adjusted by adding and removing the instances. However, vertical elastic performance is better when there are sufficient resources available. Vertical resiliency is the adjustment of the load capacity of an application by changing the resource quota of a single compute node or compute instance, which is a fine-grained adjustment. And vertical resiliency applies to any application, it eliminates the overhead of starting instances in horizontal resiliency, such as load balancers or replicated additional instances, ensuring that the application's communication connection is not interrupted when scaling, while horizontal resiliency only applies to replicated or decomposed applications.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a container memory elastic expansion method based on the combination of fuzzy reasoning and reinforcement learning. Therefore, a closed loop is formed, and the problem of elastic expansion dynamic optimization in the cloud environment is solved.

The object of the present invention is achieved by the following technical means. A container memory elastic expansion method based on fuzzy reasoning and reinforcement learning comprises the following steps:

(1) and high-dimensional continuous state space mapping based on a fuzzy inference system: mapping a continuous high-dimensional state space represented by service performance and resource use condition variables into a discrete low-dimensional state space through a fuzzy inference system FIS; when the state space of the container is specifically constructed, elastic expansion is performed by detecting the service quality performance of the container under different load conditions and combining the resource use condition;

(2) elastic stretching dynamic optimization based on reinforcement learning: with the continuous operation of container services in the cloud environment, the generated monitoring data is used as a training data set of the reinforcement learning algorithm, the reinforcement learning algorithm continuously optimizes the learning process of the reinforcement learning algorithm according to the monitored data, makes an optimal decision, outputs an elastic coefficient to a load prediction algorithm, and guides the container to obtain more appropriate resource allocation.

Furthermore, selecting GC duration as a service performance index, comparing the detected service quality with the expected service quality, and comparing the detected service quality with the expected service quality by using the difference e between the detected service quality and the expected service quality, namely selecting different GC difference values as the definition standard of the system state space; and (3) establishing a rule by using a fuzzy inference system, if the difference value is positive or negative, taking zero as a central state, extending to negative infinity and positive infinity from two sides, and taking the zero as a symmetric center to divide different state spaces.

Furthermore, a reinforcement learning algorithm is used for vertical elastic expansion decision in a cloud computing resource allocation scene, and the specific learning process is as follows:

(1) first define a good state andan action space, then initializing a Q value table according to the state and the action space, initializing the Q values to 0, detecting and selecting a state of the agent in the systemsSelecting corresponding action according to action selection strategyaAnd performing, i.e. selecting an optimized spring constantpoleAnd outputting the data to a prediction algorithm; the agent receives the feedback report after the execution is finishedrUpdating the Q value according to the feedback report, and converging to an optimal Q value table after circulating for multiple times; the Q value updating formula is as follows:

wherein the content of the first and second substances,

represents in a state

Time-selective execution of actions

T denotes the number of the first,

represents the learning rate, i.e., the ratio of the learned reward value to this Q value after the most recent action;

for the reward decay factor, i.e. the impact factor of future rewards on the present, this formula represents that the maximum Q value selected in the next state is multiplied by the decay factor plus the real reward update Q value;

(2) and constructing an action space of the intelligent agent: selecting different elastic coefficients to form a system action space, and calculating the Q value benefits of each action, namely different elastic coefficients, in different states through a reinforcement learning algorithm so as to obtain the optimal elastic coefficient;

(3) firstly, an objective function is taken as a standard to measure the resource distribution benefit, and the objective function is defined as:

wherein

Which is indicative of the quality of service,

which represents the cost of the resources to be used,

and

is used for showing

And

the weight of (c);

using the garbage recycling duration GC in the program running process as an index for specifically measuring the periodic service quality, and the resource cost

Then, the memory resource allocated to the target function is represented, and the target function is updated as follows:

objective function

Representing the value of the system state at the current moment, and defining the reward function as the difference value between the value of the system state after the execution of an action and the value of the system state at the previous moment t：

If the value of the system state is increased after a certain action is executed, the reward is positive, the updating Q value is increased, the action brings positive benefit, and when the state is met later, the probability of selecting the action is increased; conversely, the value of the system is reduced after a certain action is executed, which means that the action brings negative income, and then the probability of selecting the action is reduced;

(4) determining an action strategy, and starting a learning process: use of

Policy, first defining one

At the beginning of each experiment, randomly one is greater than 0 and less than 1

Value if the value is less than

Then randomly selecting an action; if so, the action with the highest average income is selectedaThe formula is as follows:

。

the invention has the beneficial effects that: the invention provides a container memory elastic expansion optimization method based on the combination of a fuzzy inference system and reinforcement learning, the garbage recycling time can measure the service performance, the service performance can be used as the state representation of an intelligent agent, and the dimension reduction is carried out on a high-dimensional continuous state space based on the fuzzy inference system. And (3) learning dynamic optimization of the elastic coefficient in the elastic expansion and contraction of the container memory from the sequence data by using a Q-learning algorithm according to the monitoring data of the historical time sequence, so that the subsequent elastic expansion and contraction mechanism dynamic adjustment of the memory has foresight. The invention can solve the problem of dynamic adjustment and optimization of the container memory in the cloud computing elastic expansion problem, and saves memory resources while ensuring the service quality.

Drawings

FIG. 1 is a schematic flow chart of the present invention.

FIG. 2 is a time differential layout of a state space.

Detailed Description

The invention will be described in detail below with reference to the following drawings:

the invention designs and realizes a container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning, and provides an algorithm for dynamically optimizing an elastic expansion coefficient based on combination of memory resource use conditions and garbage recovery service performance aiming at a memory elastic expansion mechanism algorithm under a container cloud environment. The invention combines a fuzzy inference system with reinforcement learning to realize dynamic elastic expansion coefficient optimization.

The method mainly comprises two parts, namely high-dimensional continuous state space mapping based on a fuzzy inference system and elastic stretching dynamic optimization based on reinforcement Learning (Q-Learning), and the specific working flow is shown in figure 1:

1. high-dimensional continuous state space mapping based on fuzzy inference system

Service performance and resource usage reflect that the memory load of the container is dynamically changing over time. The invention introduces Fuzzy Inference System (FIS) to map the continuous state space expressed by service performance and resource use condition variable into discrete Fuzzy semantics, thereby solving the problem of high-dimensional continuous state space.

The FIS maps a set of inputs to ideal outputs by fuzzy rules, and a FIS with n fuzzy rules can be expressed as:

is the environmental state

Middle variable

In the first place

The corresponding semantic values under the rule (e.g., "high", "medium", "low"),

is the inference output for each rule, and b is the set in which the inference output resides. Rules in the form of fuzzy rules may take appropriate action in the face of certain situations, and when multiple rules are included in a fuzzy inference system, and the same input may satisfy multiple rules simultaneously, it is necessary to handle such conflicts through weighted defuzzification. For output y:

wherein

Representing the number of rules, a high-dimensional continuous state space can be converted into a discrete state space using a fuzzy inference system. When the state space of the container is specifically constructed, elastic expansion and contraction are performed by detecting the service quality performance of the container under different load conditions and combining the resource use condition.

We have chosen the garbage reclamation duration as the service quality performance index. The Garbage Collection mechanism is also called as garpage Collection, GC for short, and aims to clear "Garbage" objects that are no longer used in the memory and release the memory space. The GC time consumption can intuitively reflect the service performance, the index can also be detected in the program running period, therefore, the GC time length is selected as the service performance index, the detected service quality is compared with the expected service quality, and the difference (e) between the detected service quality and the expected service quality is compared, namely, different GC difference values are selected as the definition standard of the system state space. Because the difference is continuous, a fuzzy inference system is used for constructing a rule, if the difference has positive or negative values, the state takes zero as a central state, extends to negative infinity and positive infinity from two sides, and takes the zero as a symmetrical center to divide different state spaces. For example, the time difference distribution of the state space is shown in fig. 2.

Wherein the time axis represents the difference between the GC time detected to be served by the container in unit time and the expected GC time, the leftmost side and the rightmost side represent positive infinity and negative infinity in sequence, and the time axis represents time from left to right in sequence

And the time points of the same subscript take a zero point as a symmetrical center. The time interval can then be divided according to the time point as a specific representation of the state s.

Then the fuzzy inference rule is as follows:

based on the fuzzy inference system rule, the high-dimensional continuous state space is mapped into the low-dimensional discrete state space, so that the subsequent reinforcement learning algorithm is convenient to perform elastic expansion dynamic optimization,

2. elastic stretching dynamic optimization based on reinforcement Learning (Q-Learning)

Reinforcement learning is the science of trying to make optimal decisions. The intelligent agent learning method simulates the process of learning by human, enables the intelligent agent to learn in a trial and error mode, and guides own behavior by interacting with the environment to obtain rewards, and aims to enable the intelligent agent to obtain the maximum rewards. Reinforcement learning differs from supervised learning in that reinforcement signals, generated through feedback of own experience, are an assessment of the goodness and badness of behavioral actions, and thus do not require a large number of labeled dataset labels. The advantage of no need of prior knowledge makes reinforcement learning suitable for solving the problem of resource allocation in a complex cloud environment.

In the research scenario of the invention, the intelligent agent is a container acted by a load prediction algorithm in an elastic expansion mechanism, the purpose is to optimize the elastic coefficient in the load prediction algorithm, and the purpose is to accurately predict the memory resource supply in the next time period, so that the container cloud service keeps higher performance and does not waste excessive resources, and the system environment is the load born by the application service and the load processing condition.

The optimized object is an elastic coefficient which represents the dynamic change rate of the container memory when the container elastically stretches. Since there are different elastic rate requirements for different scenarios. Elastic force coefficients cannot be accurately set at first, positive correlation prediction can be made only according to load fluctuation conditions, and moreover, a sufficient labeled training data set is not available in resource adjustment of a container cloud environment, so that a learning model which needs a large amount of historical data and a large amount of training time is not suitable for use. In contrast, under the action of the load prediction algorithm and the elastic scaling mechanism, the container service faces different traffic loads, different state expressions are generated, and the monitor acquires relevant performance data, so that the data required by the reinforcement learning training becomes data. Therefore, along with the continuous operation of the container service in the cloud environment, the generated monitoring data can be used as a training data set of the reinforcement learning algorithm, the reinforcement learning algorithm continuously optimizes the learning process of the reinforcement learning algorithm according to the monitored data, makes an optimal decision, outputs an elastic coefficient to the load prediction algorithm, and guides the container to obtain more appropriate resource allocation, so that the problem of the elastic coefficient optimization of the load prediction algorithm in the cloud environment is solved.

The invention provides a method for performing vertical elastic expansion and contraction decision by using a Q-learning algorithm in a cloud computing resource allocation scene, which is characterized in that the Q-learning algorithm is combined with a Fuzzy Inference System (FIS), wherein the Fuzzy inference system is introduced in the previous section for reducing the dimension of a high-dimensional continuous state space. Q-learning is described in detail below.

The core Q-learning algorithm is an off-phase time-difference learning algorithm, i.e. two control strategies are used, one for selecting a new action, e.g.

One strategy, another for updating the cost function, such as a greedy strategy. So it is also called iso-policy.

The core of the Q-learning algorithm is to construct a Q value table, namely a state-action value table, wherein the Q value represents the value generated by executing each action in a certain state, and then the action capable of obtaining the maximum profit is selected according to the Q value. For example, as shown in the following table, the first leftmost column contains three states

The top row includes two actions

，

Represents in a state

Time-selective execution of actions

The number of states and the number of actions determine the dimensionality of the Q value table, and the larger the dimensionality, the higher the convergence complexity of the Q value.

Q value Table example

The learning process of the Q-learning algorithm comprises the steps of firstly defining a state space and an action space, then initializing a Q value table according to the state space and the action space, initializing Q values to be 0, detecting and selecting a certain state of an intelligent agent in the systemsSelecting corresponding action according to action selection strategyaAnd performing, i.e. selecting an optimized spring constantpoleAnd outputting the data to a prediction algorithm. The agent receives the feedback report after the execution is finished rAnd updating the Q value according to the feedback report, and converging to the optimal Q value table after circulating for multiple times. The Q value updating formula is as follows:

wherein

the decay factor, i.e., the impact factor of future rewards on the present, is awarded. This equation represents the maximum Q selected in the next state multiplied by the decay factor plus the true reward update Q. Therefore, the Q-learning algorithm can select local optimal action according to the existing knowledge, and can learn a global optimal strategy through continuously optimizing the Q value table by feedback and reward.

By utilizing historical data in the system and service operation process, the estimation learning process of the maximum Q values of different elastic parameters under different container cloud service environments can be completed. The main work in the model building process is described below.

(1) The state space of the agent has been described in the previous section.

(2) Constructing an action space of the agent: the purpose of using the reinforcement learning method is to optimize the elastic coefficient in the load prediction algorithm, so different elastic coefficients are directly selected to form a system action space, and Q-value benefits of various actions, namely different elastic coefficients, in different states are calculated through a Q-learning algorithm, so that the optimal elastic coefficient is obtained. The elastic coefficient takes the elastic coefficient as a symmetrical center, and different multiplying powers are respectively increased upwards and downwards. So the motion space a can be expressed as:

(3) designing reward functions of the system to better research the reasonable distribution problem of cloud resources and

rationalizing the parameters, firstly, measuring the resource allocation benefit by taking an objective function as a standard, and defining the objective function as:

wherein

Which is indicative of the quality of service,

which represents the cost of the resources to be used,

and

is used for showing

And

and these two weights will be predefined by the cloud provider reflecting its different preferences for service performance and resource cost. For example, compareHigh is

Presentation pair

Is more demanding and therefore may require more resources to guarantee the same workload

In addition if

Higher, meaning a higher sensitivity to resource cost, and therefore less resources need to be controlled to handle the workload. In practice, the quality of service and the cost of resources are essentially inversely proportional, which is undoubted, so a more reasonable resource allocation plan should be to control the minimization of the objective function value.

Quality of service

The measurement indexes of (1) mainly comprise availability, throughput, time delay, packet loss rate and the like. In the algorithm, the garbage recycling duration in the program operation process selected in the previous section is still used as an index for specifically measuring the periodic service quality. Cost of resources

It indicates the memory resources allocated for it, not the memory resources actually used for it, because the ultimate goal is to optimize the resource allocation policy, rather than reduce the resource consumption of the program. Then the objective function can be updated as:

in order to reduce resource cost, a good resource allocation strategy should be allocated as needed for service performance during program operation. So the objective function

Representing the value of the system state at the current time, defining the reward function as the difference between the value of the system state after the execution of an action and the value of the system state at the previous time:

if the value of the system state increases after an action is performed, the reward is positive, the update Q value increases, indicating that the action brings a positive benefit, and the probability of selecting such an action increases when such a state is encountered later. Conversely, the value of the system decreases after an action is performed, indicating that the action brings negative benefits, and then the probability of selecting such action decreases.

(4) And determining an action strategy and starting a learning process. The intelligent agent is completely strange to the environment at the beginning, does not know how to work, needs to explore and learn step by step, and finally makes the best decision. This process is the always-emphasized employment of reinforcement learning&exploration, i.e., a tradeoff between exploration and utilization. Since the agent can only initially be explored by trial and error, i.e. the process of exploration. Certainly, the trial and error herein does not cause the memory resource expansion and contraction to be performed in the opposite direction, but the degree of elastic expansion and contraction is changed on the premise of positive correlation, and the utilization process directly adopts the known behavior that can obtain good feedback. How to obtain a larger long-term reward by sacrificing some of the short-term rewards will be used here

Strategies solve this problem. First, define one

Value if the value is less than

Then randomly selecting an action; if so, the action with the highest current average benefit is selected. Formula (II)aIs represented as follows:

wherein

The value of (c) may decay with learning. At the beginning, will

The action selected in this way is almost random, so that the intelligent agent can become familiar with the system environment as soon as possible, i.e. the intelligent agent can be explored as far as possible, and with the progress of the learning process,

the value decays and there is a greater likelihood of selecting the action with the greatest value reward, i.e. making the best decision using the learned result as much as possible.

It should be understood that equivalent substitutions and changes to the technical solution and the inventive concept of the present invention should be made by those skilled in the art to the protection scope of the appended claims.

Claims

1. A container memory elastic expansion method based on fuzzy reasoning and reinforcement learning is characterized in that: the method comprises the following steps:

2. The container memory elastic expansion method based on the combination of fuzzy inference and reinforcement learning of claim 1, which is characterized in that: selecting GC duration as a service performance index, comparing the detected service quality with the expected service quality, and comparing the detected service quality with the expected service quality by using a difference e between the detected service quality and the expected service quality, namely selecting different GC difference values as definition standards of a system state space; and (3) establishing a rule by using a fuzzy inference system, if the difference value is positive or negative, taking zero as a central state, extending to negative infinity and positive infinity from two sides, and taking the zero as a symmetric center to divide different state spaces.

3. The container memory elastic expansion method based on the combination of fuzzy inference and reinforcement learning of claim 1, which is characterized in that: in a cloud computing resource allocation scene, a reinforcement learning algorithm is used for making a vertical elastic expansion decision, and the specific learning process is as follows:

(1) firstly, defining state and action space, then initializing a Q value table according to the state and action space, initializing Q values to 0, detecting and selecting a certain state of the intelligent agent in the systemsSelecting corresponding action according to action selection strategyaAnd performing, i.e. selecting an optimized spring constantpoleAnd outputting the data to a prediction algorithm; the agent receives the feedback report after the execution is finishedrUpdating the Q value according to the feedback report, and converging to an optimal Q value table after circulating for multiple times; the Q value updating formula is as follows:

wherein the content of the first and second substances,

represents in a state

Time-selective execution of actions

T denotes the number of the first,

wherein

Which is indicative of the quality of service,

which represents the cost of the resources to be used,

and

is used for showing

And

the weight of (c);

objective function

Representing the value of the system state at the current moment, and defining the reward function as the difference between the value of the system state after the execution of an action and the value of the system state at the previous moment t:

(4) determining an action strategy, and starting a learning process: use of

Policy, first defining one

Value if the value is less than

。