CN113886095A - Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning - Google Patents

Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning Download PDF

Info

Publication number
CN113886095A
CN113886095A CN202111487809.8A CN202111487809A CN113886095A CN 113886095 A CN113886095 A CN 113886095A CN 202111487809 A CN202111487809 A CN 202111487809A CN 113886095 A CN113886095 A CN 113886095A
Authority
CN
China
Prior art keywords
action
value
state
reinforcement learning
container
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111487809.8A
Other languages
Chinese (zh)
Inventor
刘东海
徐育毅
庞辉富
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Youyun Software Co ltd
Beijing Guangtong Youyun Technology Co ltd
Original Assignee
Hangzhou Youyun Software Co ltd
Beijing Guangtong Youyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Youyun Software Co ltd, Beijing Guangtong Youyun Technology Co ltd filed Critical Hangzhou Youyun Software Co ltd
Priority to CN202111487809.8A priority Critical patent/CN113886095A/en
Publication of CN113886095A publication Critical patent/CN113886095A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/048Fuzzy inferencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Automation & Control Theory (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a container memory elastic expansion method based on the combination of fuzzy inference and reinforcement learning, which maps a continuous high-dimensional state space represented by service performance and resource use condition variables into a discrete low-dimensional state space through a fuzzy inference system FIS; in the operation process of the container service in the cloud environment, the learning process of the container service is continuously optimized according to the monitoring data, the optimal decision is made, the elasticity coefficient is output to the load prediction algorithm, and the container is guided to obtain more appropriate resource allocation. Therefore, a closed loop is formed, and the problem of elastic expansion dynamic optimization in the cloud environment is solved.

Description

Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning
Technical Field
The invention relates to the technical field of container virtualization, in particular to a container memory elastic expansion method based on the combination of fuzzy reasoning and reinforcement learning.
Background
In recent years, cloud computing supports numerous application services by virtue of an emerging computing framework, strong computing power and a convenient management mode. The container virtualization technology represented by Docker gradually replaces the conventional virtual machine technology by virtue of the characteristic of light weight. However, the workload of cloud computing applications varies over time, and static resource allocation that meets peak demand can result in significant waste of resources, whereas maintaining average computing resources can result in reduced service performance and levels. Therefore, an elastic scaling technology is provided, which can automatically adjust the size of the computing resource according to the current or future business requirement change, thereby improving the service efficiency and reducing the service cost. Currently, a plurality of elastic expansion algorithms for the container memory exist, and how to dynamically optimize elastic expansion parameters to realize reasonable distribution of the container memory and guarantee service performance is a problem worthy of attention and research.
Elastic expansion is one of key technologies of cloud computing resource management, so that a cloud infrastructure can adjust supply resources according to load requirements of cloud application programs, and the functional requirements of cloud resource allocation according to requirements are met. The elastic scaling direction mainly has two modes, namely horizontal elasticity and vertical elasticity, wherein the horizontal elasticity requires an application program to provide distributed support so as to decompose the application program into a plurality of computing instances, and then the load capacity of an application service is adjusted by adding and removing the instances. However, vertical elastic performance is better when there are sufficient resources available. Vertical resiliency is the adjustment of the load capacity of an application by changing the resource quota of a single compute node or compute instance, which is a fine-grained adjustment. And vertical resiliency applies to any application, it eliminates the overhead of starting instances in horizontal resiliency, such as load balancers or replicated additional instances, ensuring that the application's communication connection is not interrupted when scaling, while horizontal resiliency only applies to replicated or decomposed applications.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a container memory elastic expansion method based on the combination of fuzzy reasoning and reinforcement learning. Therefore, a closed loop is formed, and the problem of elastic expansion dynamic optimization in the cloud environment is solved.
The object of the present invention is achieved by the following technical means. A container memory elastic expansion method based on fuzzy reasoning and reinforcement learning comprises the following steps:
(1) and high-dimensional continuous state space mapping based on a fuzzy inference system: mapping a continuous high-dimensional state space represented by service performance and resource use condition variables into a discrete low-dimensional state space through a fuzzy inference system FIS; when the state space of the container is specifically constructed, elastic expansion is performed by detecting the service quality performance of the container under different load conditions and combining the resource use condition;
(2) elastic stretching dynamic optimization based on reinforcement learning: with the continuous operation of container services in the cloud environment, the generated monitoring data is used as a training data set of the reinforcement learning algorithm, the reinforcement learning algorithm continuously optimizes the learning process of the reinforcement learning algorithm according to the monitored data, makes an optimal decision, outputs an elastic coefficient to a load prediction algorithm, and guides the container to obtain more appropriate resource allocation.
Furthermore, selecting GC duration as a service performance index, comparing the detected service quality with the expected service quality, and comparing the detected service quality with the expected service quality by using the difference e between the detected service quality and the expected service quality, namely selecting different GC difference values as the definition standard of the system state space; and (3) establishing a rule by using a fuzzy inference system, if the difference value is positive or negative, taking zero as a central state, extending to negative infinity and positive infinity from two sides, and taking the zero as a symmetric center to divide different state spaces.
Furthermore, a reinforcement learning algorithm is used for vertical elastic expansion decision in a cloud computing resource allocation scene, and the specific learning process is as follows:
(1) first define a good state andan action space, then initializing a Q value table according to the state and the action space, initializing the Q values to 0, detecting and selecting a state of the agent in the systemsSelecting corresponding action according to action selection strategyaAnd performing, i.e. selecting an optimized spring constantpoleAnd outputting the data to a prediction algorithm; the agent receives the feedback report after the execution is finishedrUpdating the Q value according to the feedback report, and converging to an optimal Q value table after circulating for multiple times; the Q value updating formula is as follows:
Figure 396763DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 740020DEST_PATH_IMAGE002
represents in a state
Figure 382354DEST_PATH_IMAGE003
Time-selective execution of actions
Figure 127456DEST_PATH_IMAGE004
T denotes the number of the first,
Figure 829833DEST_PATH_IMAGE005
represents the learning rate, i.e., the ratio of the learned reward value to this Q value after the most recent action;
Figure 660385DEST_PATH_IMAGE006
for the reward decay factor, i.e. the impact factor of future rewards on the present, this formula represents that the maximum Q value selected in the next state is multiplied by the decay factor plus the real reward update Q value;
(2) and constructing an action space of the intelligent agent: selecting different elastic coefficients to form a system action space, and calculating the Q value benefits of each action, namely different elastic coefficients, in different states through a reinforcement learning algorithm so as to obtain the optimal elastic coefficient;
(3) firstly, an objective function is taken as a standard to measure the resource distribution benefit, and the objective function is defined as:
Figure 840831DEST_PATH_IMAGE007
wherein
Figure 706019DEST_PATH_IMAGE008
Which is indicative of the quality of service,
Figure 313718DEST_PATH_IMAGE009
which represents the cost of the resources to be used,
Figure 631567DEST_PATH_IMAGE010
and
Figure 615703DEST_PATH_IMAGE011
is used for showing
Figure 335397DEST_PATH_IMAGE012
And
Figure 878112DEST_PATH_IMAGE009
the weight of (c);
using the garbage recycling duration GC in the program running process as an index for specifically measuring the periodic service quality, and the resource cost
Figure 417678DEST_PATH_IMAGE009
Then, the memory resource allocated to the target function is represented, and the target function is updated as follows:
Figure 205505DEST_PATH_IMAGE013
objective function
Figure 779706DEST_PATH_IMAGE014
Representing the value of the system state at the current moment, and defining the reward function as the difference value between the value of the system state after the execution of an action and the value of the system state at the previous moment t:
Figure 994787DEST_PATH_IMAGE015
If the value of the system state is increased after a certain action is executed, the reward is positive, the updating Q value is increased, the action brings positive benefit, and when the state is met later, the probability of selecting the action is increased; conversely, the value of the system is reduced after a certain action is executed, which means that the action brings negative income, and then the probability of selecting the action is reduced;
(4) determining an action strategy, and starting a learning process: use of
Figure 21648DEST_PATH_IMAGE016
Policy, first defining one
Figure 347588DEST_PATH_IMAGE017
At the beginning of each experiment, randomly one is greater than 0 and less than 1
Figure 776295DEST_PATH_IMAGE018
Value if the value is less than
Figure 162277DEST_PATH_IMAGE019
Then randomly selecting an action; if so, the action with the highest average income is selectedaThe formula is as follows:
Figure 942014DEST_PATH_IMAGE020
the invention has the beneficial effects that: the invention provides a container memory elastic expansion optimization method based on the combination of a fuzzy inference system and reinforcement learning, the garbage recycling time can measure the service performance, the service performance can be used as the state representation of an intelligent agent, and the dimension reduction is carried out on a high-dimensional continuous state space based on the fuzzy inference system. And (3) learning dynamic optimization of the elastic coefficient in the elastic expansion and contraction of the container memory from the sequence data by using a Q-learning algorithm according to the monitoring data of the historical time sequence, so that the subsequent elastic expansion and contraction mechanism dynamic adjustment of the memory has foresight. The invention can solve the problem of dynamic adjustment and optimization of the container memory in the cloud computing elastic expansion problem, and saves memory resources while ensuring the service quality.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
FIG. 2 is a time differential layout of a state space.
Detailed Description
The invention will be described in detail below with reference to the following drawings:
the invention designs and realizes a container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning, and provides an algorithm for dynamically optimizing an elastic expansion coefficient based on combination of memory resource use conditions and garbage recovery service performance aiming at a memory elastic expansion mechanism algorithm under a container cloud environment. The invention combines a fuzzy inference system with reinforcement learning to realize dynamic elastic expansion coefficient optimization.
The method mainly comprises two parts, namely high-dimensional continuous state space mapping based on a fuzzy inference system and elastic stretching dynamic optimization based on reinforcement Learning (Q-Learning), and the specific working flow is shown in figure 1:
1. high-dimensional continuous state space mapping based on fuzzy inference system
Service performance and resource usage reflect that the memory load of the container is dynamically changing over time. The invention introduces Fuzzy Inference System (FIS) to map the continuous state space expressed by service performance and resource use condition variable into discrete Fuzzy semantics, thereby solving the problem of high-dimensional continuous state space.
The FIS maps a set of inputs to ideal outputs by fuzzy rules, and a FIS with n fuzzy rules can be expressed as:
Figure 71644DEST_PATH_IMAGE021
Figure 354858DEST_PATH_IMAGE022
is the environmental state
Figure 177320DEST_PATH_IMAGE023
Middle variable
Figure 178774DEST_PATH_IMAGE024
In the first place
Figure 102910DEST_PATH_IMAGE025
The corresponding semantic values under the rule (e.g., "high", "medium", "low"),
Figure 506209DEST_PATH_IMAGE026
is the inference output for each rule, and b is the set in which the inference output resides. Rules in the form of fuzzy rules may take appropriate action in the face of certain situations, and when multiple rules are included in a fuzzy inference system, and the same input may satisfy multiple rules simultaneously, it is necessary to handle such conflicts through weighted defuzzification. For output y:
Figure 499573DEST_PATH_IMAGE027
wherein
Figure 722744DEST_PATH_IMAGE028
Representing the number of rules, a high-dimensional continuous state space can be converted into a discrete state space using a fuzzy inference system. When the state space of the container is specifically constructed, elastic expansion and contraction are performed by detecting the service quality performance of the container under different load conditions and combining the resource use condition.
We have chosen the garbage reclamation duration as the service quality performance index. The Garbage Collection mechanism is also called as garpage Collection, GC for short, and aims to clear "Garbage" objects that are no longer used in the memory and release the memory space. The GC time consumption can intuitively reflect the service performance, the index can also be detected in the program running period, therefore, the GC time length is selected as the service performance index, the detected service quality is compared with the expected service quality, and the difference (e) between the detected service quality and the expected service quality is compared, namely, different GC difference values are selected as the definition standard of the system state space. Because the difference is continuous, a fuzzy inference system is used for constructing a rule, if the difference has positive or negative values, the state takes zero as a central state, extends to negative infinity and positive infinity from two sides, and takes the zero as a symmetrical center to divide different state spaces. For example, the time difference distribution of the state space is shown in fig. 2.
Wherein the time axis represents the difference between the GC time detected to be served by the container in unit time and the expected GC time, the leftmost side and the rightmost side represent positive infinity and negative infinity in sequence, and the time axis represents time from left to right in sequence
Figure 459756DEST_PATH_IMAGE029
And the time points of the same subscript take a zero point as a symmetrical center. The time interval can then be divided according to the time point as a specific representation of the state s.
Then the fuzzy inference rule is as follows:
Figure 451982DEST_PATH_IMAGE030
based on the fuzzy inference system rule, the high-dimensional continuous state space is mapped into the low-dimensional discrete state space, so that the subsequent reinforcement learning algorithm is convenient to perform elastic expansion dynamic optimization,
2. elastic stretching dynamic optimization based on reinforcement Learning (Q-Learning)
Reinforcement learning is the science of trying to make optimal decisions. The intelligent agent learning method simulates the process of learning by human, enables the intelligent agent to learn in a trial and error mode, and guides own behavior by interacting with the environment to obtain rewards, and aims to enable the intelligent agent to obtain the maximum rewards. Reinforcement learning differs from supervised learning in that reinforcement signals, generated through feedback of own experience, are an assessment of the goodness and badness of behavioral actions, and thus do not require a large number of labeled dataset labels. The advantage of no need of prior knowledge makes reinforcement learning suitable for solving the problem of resource allocation in a complex cloud environment.
In the research scenario of the invention, the intelligent agent is a container acted by a load prediction algorithm in an elastic expansion mechanism, the purpose is to optimize the elastic coefficient in the load prediction algorithm, and the purpose is to accurately predict the memory resource supply in the next time period, so that the container cloud service keeps higher performance and does not waste excessive resources, and the system environment is the load born by the application service and the load processing condition.
The optimized object is an elastic coefficient which represents the dynamic change rate of the container memory when the container elastically stretches. Since there are different elastic rate requirements for different scenarios. Elastic force coefficients cannot be accurately set at first, positive correlation prediction can be made only according to load fluctuation conditions, and moreover, a sufficient labeled training data set is not available in resource adjustment of a container cloud environment, so that a learning model which needs a large amount of historical data and a large amount of training time is not suitable for use. In contrast, under the action of the load prediction algorithm and the elastic scaling mechanism, the container service faces different traffic loads, different state expressions are generated, and the monitor acquires relevant performance data, so that the data required by the reinforcement learning training becomes data. Therefore, along with the continuous operation of the container service in the cloud environment, the generated monitoring data can be used as a training data set of the reinforcement learning algorithm, the reinforcement learning algorithm continuously optimizes the learning process of the reinforcement learning algorithm according to the monitored data, makes an optimal decision, outputs an elastic coefficient to the load prediction algorithm, and guides the container to obtain more appropriate resource allocation, so that the problem of the elastic coefficient optimization of the load prediction algorithm in the cloud environment is solved.
The invention provides a method for performing vertical elastic expansion and contraction decision by using a Q-learning algorithm in a cloud computing resource allocation scene, which is characterized in that the Q-learning algorithm is combined with a Fuzzy Inference System (FIS), wherein the Fuzzy inference system is introduced in the previous section for reducing the dimension of a high-dimensional continuous state space. Q-learning is described in detail below.
The core Q-learning algorithm is an off-phase time-difference learning algorithm, i.e. two control strategies are used, one for selecting a new action, e.g.
Figure 350668DEST_PATH_IMAGE031
One strategy, another for updating the cost function, such as a greedy strategy. So it is also called iso-policy.
The core of the Q-learning algorithm is to construct a Q value table, namely a state-action value table, wherein the Q value represents the value generated by executing each action in a certain state, and then the action capable of obtaining the maximum profit is selected according to the Q value. For example, as shown in the following table, the first leftmost column contains three states
Figure 326715DEST_PATH_IMAGE032
The top row includes two actions
Figure 601838DEST_PATH_IMAGE033
Figure 714151DEST_PATH_IMAGE034
Represents in a state
Figure 49317DEST_PATH_IMAGE035
Time-selective execution of actions
Figure 981501DEST_PATH_IMAGE036
The number of states and the number of actions determine the dimensionality of the Q value table, and the larger the dimensionality, the higher the convergence complexity of the Q value.
Q value Table example
Figure 60315DEST_PATH_IMAGE037
The learning process of the Q-learning algorithm comprises the steps of firstly defining a state space and an action space, then initializing a Q value table according to the state space and the action space, initializing Q values to be 0, detecting and selecting a certain state of an intelligent agent in the systemsSelecting corresponding action according to action selection strategyaAnd performing, i.e. selecting an optimized spring constantpoleAnd outputting the data to a prediction algorithm. The agent receives the feedback report after the execution is finished rAnd updating the Q value according to the feedback report, and converging to the optimal Q value table after circulating for multiple times. The Q value updating formula is as follows:
Figure 791249DEST_PATH_IMAGE038
wherein
Figure 31737DEST_PATH_IMAGE039
Represents the learning rate, i.e., the ratio of the learned reward value to this Q value after the most recent action;
Figure 982376DEST_PATH_IMAGE040
the decay factor, i.e., the impact factor of future rewards on the present, is awarded. This equation represents the maximum Q selected in the next state multiplied by the decay factor plus the true reward update Q. Therefore, the Q-learning algorithm can select local optimal action according to the existing knowledge, and can learn a global optimal strategy through continuously optimizing the Q value table by feedback and reward.
By utilizing historical data in the system and service operation process, the estimation learning process of the maximum Q values of different elastic parameters under different container cloud service environments can be completed. The main work in the model building process is described below.
(1) The state space of the agent has been described in the previous section.
(2) Constructing an action space of the agent: the purpose of using the reinforcement learning method is to optimize the elastic coefficient in the load prediction algorithm, so different elastic coefficients are directly selected to form a system action space, and Q-value benefits of various actions, namely different elastic coefficients, in different states are calculated through a Q-learning algorithm, so that the optimal elastic coefficient is obtained. The elastic coefficient takes the elastic coefficient as a symmetrical center, and different multiplying powers are respectively increased upwards and downwards. So the motion space a can be expressed as:
Figure 599302DEST_PATH_IMAGE041
(3) designing reward functions of the system to better research the reasonable distribution problem of cloud resources and
Figure 686207DEST_PATH_IMAGE042
rationalizing the parameters, firstly, measuring the resource allocation benefit by taking an objective function as a standard, and defining the objective function as:
Figure 832017DEST_PATH_IMAGE043
wherein
Figure 4372DEST_PATH_IMAGE044
Which is indicative of the quality of service,
Figure 424989DEST_PATH_IMAGE045
which represents the cost of the resources to be used,
Figure 366401DEST_PATH_IMAGE046
and
Figure 214271DEST_PATH_IMAGE047
is used for showing
Figure 873922DEST_PATH_IMAGE048
And
Figure 832651DEST_PATH_IMAGE045
and these two weights will be predefined by the cloud provider reflecting its different preferences for service performance and resource cost. For example, compareHigh is
Figure 628569DEST_PATH_IMAGE046
Presentation pair
Figure 647340DEST_PATH_IMAGE044
Is more demanding and therefore may require more resources to guarantee the same workload
Figure 27244DEST_PATH_IMAGE044
In addition if
Figure 789664DEST_PATH_IMAGE047
Higher, meaning a higher sensitivity to resource cost, and therefore less resources need to be controlled to handle the workload. In practice, the quality of service and the cost of resources are essentially inversely proportional, which is undoubted, so a more reasonable resource allocation plan should be to control the minimization of the objective function value.
Quality of service
Figure 705667DEST_PATH_IMAGE044
The measurement indexes of (1) mainly comprise availability, throughput, time delay, packet loss rate and the like. In the algorithm, the garbage recycling duration in the program operation process selected in the previous section is still used as an index for specifically measuring the periodic service quality. Cost of resources
Figure 895340DEST_PATH_IMAGE045
It indicates the memory resources allocated for it, not the memory resources actually used for it, because the ultimate goal is to optimize the resource allocation policy, rather than reduce the resource consumption of the program. Then the objective function can be updated as:
Figure 264004DEST_PATH_IMAGE049
in order to reduce resource cost, a good resource allocation strategy should be allocated as needed for service performance during program operation. So the objective function
Figure 564536DEST_PATH_IMAGE050
Representing the value of the system state at the current time, defining the reward function as the difference between the value of the system state after the execution of an action and the value of the system state at the previous time:
Figure 335046DEST_PATH_IMAGE051
if the value of the system state increases after an action is performed, the reward is positive, the update Q value increases, indicating that the action brings a positive benefit, and the probability of selecting such an action increases when such a state is encountered later. Conversely, the value of the system decreases after an action is performed, indicating that the action brings negative benefits, and then the probability of selecting such action decreases.
(4) And determining an action strategy and starting a learning process. The intelligent agent is completely strange to the environment at the beginning, does not know how to work, needs to explore and learn step by step, and finally makes the best decision. This process is the always-emphasized employment of reinforcement learning&exploration, i.e., a tradeoff between exploration and utilization. Since the agent can only initially be explored by trial and error, i.e. the process of exploration. Certainly, the trial and error herein does not cause the memory resource expansion and contraction to be performed in the opposite direction, but the degree of elastic expansion and contraction is changed on the premise of positive correlation, and the utilization process directly adopts the known behavior that can obtain good feedback. How to obtain a larger long-term reward by sacrificing some of the short-term rewards will be used here
Figure 695620DEST_PATH_IMAGE052
Strategies solve this problem. First, define one
Figure 551580DEST_PATH_IMAGE053
At the beginning of each experiment, randomly one is greater than 0 and less than 1
Figure 655802DEST_PATH_IMAGE018
Value if the value is less than
Figure 546398DEST_PATH_IMAGE054
Then randomly selecting an action; if so, the action with the highest current average benefit is selected. Formula (II)aIs represented as follows:
Figure 812294DEST_PATH_IMAGE055
wherein
Figure 155551DEST_PATH_IMAGE054
The value of (c) may decay with learning. At the beginning, will
Figure 797885DEST_PATH_IMAGE054
The action selected in this way is almost random, so that the intelligent agent can become familiar with the system environment as soon as possible, i.e. the intelligent agent can be explored as far as possible, and with the progress of the learning process,
Figure 41522DEST_PATH_IMAGE054
the value decays and there is a greater likelihood of selecting the action with the greatest value reward, i.e. making the best decision using the learned result as much as possible.
It should be understood that equivalent substitutions and changes to the technical solution and the inventive concept of the present invention should be made by those skilled in the art to the protection scope of the appended claims.

Claims (3)

1. A container memory elastic expansion method based on fuzzy reasoning and reinforcement learning is characterized in that: the method comprises the following steps:
(1) and high-dimensional continuous state space mapping based on a fuzzy inference system: mapping a continuous high-dimensional state space represented by service performance and resource use condition variables into a discrete low-dimensional state space through a fuzzy inference system FIS; when the state space of the container is specifically constructed, elastic expansion is performed by detecting the service quality performance of the container under different load conditions and combining the resource use condition;
(2) elastic stretching dynamic optimization based on reinforcement learning: with the continuous operation of container services in the cloud environment, the generated monitoring data is used as a training data set of the reinforcement learning algorithm, the reinforcement learning algorithm continuously optimizes the learning process of the reinforcement learning algorithm according to the monitored data, makes an optimal decision, outputs an elastic coefficient to a load prediction algorithm, and guides the container to obtain more appropriate resource allocation.
2. The container memory elastic expansion method based on the combination of fuzzy inference and reinforcement learning of claim 1, which is characterized in that: selecting GC duration as a service performance index, comparing the detected service quality with the expected service quality, and comparing the detected service quality with the expected service quality by using a difference e between the detected service quality and the expected service quality, namely selecting different GC difference values as definition standards of a system state space; and (3) establishing a rule by using a fuzzy inference system, if the difference value is positive or negative, taking zero as a central state, extending to negative infinity and positive infinity from two sides, and taking the zero as a symmetric center to divide different state spaces.
3. The container memory elastic expansion method based on the combination of fuzzy inference and reinforcement learning of claim 1, which is characterized in that: in a cloud computing resource allocation scene, a reinforcement learning algorithm is used for making a vertical elastic expansion decision, and the specific learning process is as follows:
(1) firstly, defining state and action space, then initializing a Q value table according to the state and action space, initializing Q values to 0, detecting and selecting a certain state of the intelligent agent in the systemsSelecting corresponding action according to action selection strategyaAnd performing, i.e. selecting an optimized spring constantpoleAnd outputting the data to a prediction algorithm; the agent receives the feedback report after the execution is finishedrUpdating the Q value according to the feedback report, and converging to an optimal Q value table after circulating for multiple times; the Q value updating formula is as follows:
Figure 778362DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 351295DEST_PATH_IMAGE004
represents in a state
Figure DEST_PATH_IMAGE006
Time-selective execution of actions
Figure DEST_PATH_IMAGE008
T denotes the number of the first,
Figure DEST_PATH_IMAGE010
represents the learning rate, i.e., the ratio of the learned reward value to this Q value after the most recent action;
Figure DEST_PATH_IMAGE012
for the reward decay factor, i.e. the impact factor of future rewards on the present, this formula represents that the maximum Q value selected in the next state is multiplied by the decay factor plus the real reward update Q value;
(2) and constructing an action space of the intelligent agent: selecting different elastic coefficients to form a system action space, and calculating the Q value benefits of each action, namely different elastic coefficients, in different states through a reinforcement learning algorithm so as to obtain the optimal elastic coefficient;
(3) firstly, an objective function is taken as a standard to measure the resource distribution benefit, and the objective function is defined as:
Figure DEST_PATH_IMAGE014
wherein
Figure DEST_PATH_IMAGE016
Which is indicative of the quality of service,
Figure DEST_PATH_IMAGE018
which represents the cost of the resources to be used,
Figure DEST_PATH_IMAGE020
and
Figure DEST_PATH_IMAGE022
is used for showing
Figure DEST_PATH_IMAGE024
And
Figure DEST_PATH_IMAGE025
the weight of (c);
using the garbage recycling duration GC in the program running process as an index for specifically measuring the periodic service quality, and the resource cost
Figure DEST_PATH_IMAGE026
Then, the memory resource allocated to the target function is represented, and the target function is updated as follows:
Figure DEST_PATH_IMAGE028
objective function
Figure DEST_PATH_IMAGE030
Representing the value of the system state at the current moment, and defining the reward function as the difference between the value of the system state after the execution of an action and the value of the system state at the previous moment t:
Figure DEST_PATH_IMAGE032
if the value of the system state is increased after a certain action is executed, the reward is positive, the updating Q value is increased, the action brings positive benefit, and when the state is met later, the probability of selecting the action is increased; conversely, the value of the system is reduced after a certain action is executed, which means that the action brings negative income, and then the probability of selecting the action is reduced;
(4) determining an action strategy, and starting a learning process: use of
Figure DEST_PATH_IMAGE034
Policy, first defining one
Figure DEST_PATH_IMAGE036
At the beginning of each experiment, randomly one is greater than 0 and less than 1
Figure DEST_PATH_IMAGE038
Value if the value is less than
Figure DEST_PATH_IMAGE040
Then randomly selecting an action; if so, the action with the highest average income is selectedaThe formula is as follows:
Figure DEST_PATH_IMAGE042
CN202111487809.8A 2021-12-08 2021-12-08 Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning Pending CN113886095A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111487809.8A CN113886095A (en) 2021-12-08 2021-12-08 Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111487809.8A CN113886095A (en) 2021-12-08 2021-12-08 Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning

Publications (1)

Publication Number Publication Date
CN113886095A true CN113886095A (en) 2022-01-04

Family

ID=79016511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111487809.8A Pending CN113886095A (en) 2021-12-08 2021-12-08 Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning

Country Status (1)

Country Link
CN (1) CN113886095A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116126534A (en) * 2023-01-28 2023-05-16 哈尔滨工业大学(威海) Cloud resource dynamic expansion method and system
CN115460217B (en) * 2022-11-10 2023-07-14 军事科学院系统工程研究院网络信息研究所 Cloud service high availability decision-making method based on reinforcement learning
CN116610534A (en) * 2023-07-18 2023-08-18 贵州海誉科技股份有限公司 Improved predictive elastic telescoping method based on Kubernetes cluster resources
CN116610454A (en) * 2023-07-17 2023-08-18 中国海洋大学 MADDPG algorithm-based hybrid cloud resource elastic expansion system and operation method
CN117891619A (en) * 2024-03-18 2024-04-16 山东吉谷信息科技有限公司 Host resource synchronization method and system based on virtualization platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112000459A (en) * 2020-03-31 2020-11-27 华为技术有限公司 Method for expanding and contracting service and related equipment
CN112311578A (en) * 2019-07-31 2021-02-02 中国移动通信集团浙江有限公司 VNF scheduling method and device based on deep reinforcement learning
CN113760497A (en) * 2021-01-05 2021-12-07 北京沃东天骏信息技术有限公司 Scheduling task configuration method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112311578A (en) * 2019-07-31 2021-02-02 中国移动通信集团浙江有限公司 VNF scheduling method and device based on deep reinforcement learning
CN112000459A (en) * 2020-03-31 2020-11-27 华为技术有限公司 Method for expanding and contracting service and related equipment
CN113760497A (en) * 2021-01-05 2021-12-07 北京沃东天骏信息技术有限公司 Scheduling task configuration method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ARABNEJAD, ET AL.: "A Comparison of Reinforcement Learning Techniques for Fuzzy Cloud Auto-Scaling", 《2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING》 *
FABIANA ROSSI,ET AL.: "Horizontal and Vertical Scaling of Container-based Applications using Reinforcement Learning", 《2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD)》 *
曹宇,杨军: "一种基于深度学习的云平台弹性伸缩算法", 《计算机与现代化》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115460217B (en) * 2022-11-10 2023-07-14 军事科学院系统工程研究院网络信息研究所 Cloud service high availability decision-making method based on reinforcement learning
CN116126534A (en) * 2023-01-28 2023-05-16 哈尔滨工业大学(威海) Cloud resource dynamic expansion method and system
CN116610454A (en) * 2023-07-17 2023-08-18 中国海洋大学 MADDPG algorithm-based hybrid cloud resource elastic expansion system and operation method
CN116610454B (en) * 2023-07-17 2023-10-17 中国海洋大学 MADDPG algorithm-based hybrid cloud resource elastic expansion system and operation method
CN116610534A (en) * 2023-07-18 2023-08-18 贵州海誉科技股份有限公司 Improved predictive elastic telescoping method based on Kubernetes cluster resources
CN116610534B (en) * 2023-07-18 2023-10-03 贵州海誉科技股份有限公司 Improved predictive elastic telescoping method based on Kubernetes cluster resources
CN117891619A (en) * 2024-03-18 2024-04-16 山东吉谷信息科技有限公司 Host resource synchronization method and system based on virtualization platform
CN117891619B (en) * 2024-03-18 2024-06-11 山东吉谷信息科技有限公司 Host resource synchronization method and system based on virtualization platform

Similar Documents

Publication Publication Date Title
CN113886095A (en) Container memory elastic expansion method based on combination of fuzzy reasoning and reinforcement learning
Gazori et al. Saving time and cost on the scheduling of fog-based IoT applications using deep reinforcement learning approach
Elgendy et al. Joint computation offloading and task caching for multi-user and multi-task MEC systems: reinforcement learning-based algorithms
Qi et al. Knowledge-driven service offloading decision for vehicular edge computing: A deep reinforcement learning approach
CN112134916A (en) Cloud edge collaborative computing migration method based on deep reinforcement learning
CN111641681A (en) Internet of things service unloading decision method based on edge calculation and deep reinforcement learning
CN115686846B (en) Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation
CN114615744A (en) Knowledge migration reinforcement learning network slice general-purpose sensing calculation resource collaborative optimization method
CN116126534A (en) Cloud resource dynamic expansion method and system
Li et al. DQN-enabled content caching and quantum ant colony-based computation offloading in MEC
Li et al. A modular neural network-based population prediction strategy for evolutionary dynamic multi-objective optimization
CN112036651A (en) Electricity price prediction method based on quantum immune optimization BP neural network algorithm
Aslam et al. Using artificial neural network for VM consolidation approach to enhance energy efficiency in green cloud
Chai et al. A computation offloading algorithm based on multi-objective evolutionary optimization in mobile edge computing
CN112131089B (en) Software defect prediction method, classifier, computer device and storage medium
CN117851056A (en) Time-varying task scheduling method and system based on constraint near-end policy optimization
Ma et al. Dynamic neural network-based resource management for mobile edge computing in 6g networks
CN112312299A (en) Service unloading method, device and system
CN116959244A (en) Vehicle network channel congestion control method and system based on regional danger
Huang et al. Learning-aided fine grained offloading for real-time applications in edge-cloud computing
Li et al. Dependency-aware task offloading based on deep reinforcement learning in mobile edge computing networks
Xin et al. Genetic based fuzzy Q-learning energy management for smart grid
Tong et al. D2op: A fair dual-objective weighted scheduling scheme in internet of everything
CN114385359B (en) Cloud edge task time sequence cooperation method for Internet of things
CN111917854B (en) Cooperation type migration decision method and system facing MCC

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220104

RJ01 Rejection of invention patent application after publication