CN112328355A

CN112328355A - Self-adaptive optimal memory reservation estimation method for long-life container

Info

Publication number: CN112328355A
Application number: CN202011073505.2A
Authority: CN
Inventors: 刘芳; 林嘉韵; 蔡振华; 黄志杰
Original assignee: National Sun Yat Sen University
Current assignee: National Sun Yat Sen University
Priority date: 2020-10-09
Filing date: 2020-10-09
Publication date: 2021-02-05
Anticipated expiration: 2040-10-09
Also published as: CN112328355B

Abstract

The invention discloses a self-adaptive optimal memory reservation estimation method for a long-life container, which is applied to different stages of a Spark distributed cluster of a data center and comprises the following steps: s1: executing a MEER + strategy at the initial stage of the server cluster, collecting historical data of application program operation in the server, and estimating the optimal memory reservation of the cluster at the initial stage by using the historical data; s2: and executing a DEEP-MEER strategy in a stable stage of the server cluster, obtaining an optimal memory reservation model in the stable stage by using historical data, and estimating the optimal memory reservation in the current stage by using the model. The method adopts different optimal memory reservation estimation strategies for Spark distributed clusters in the data center at different life cycle stages, approaches an optimal value by thinning step length at the initial stage of the clusters, improves estimation accuracy, and establishes a reinforcement learning model by utilizing rich historical data at the stable stage of the clusters, thereby ensuring the stability of application program performance.

Description

Self-adaptive optimal memory reservation estimation method for long-life container

Technical Field

The invention relates to the technical field of computers, in particular to a self-adaptive optimal memory reservation estimation method for a long-life container.

Background

With the development of big data, more and more memory computing works including machine learning, stream processing, interactive query, graph computing and the like are deployed on a shared cluster of a data center. Such a workload handles a large amount of data, and is called a Long Running Application (LRA). It can be observed from the results of public enterprise cluster tracking and analysis that LRAs have become the major workload of everyday online services in data centers today.

Existing large data processing systems, such as Spark and Flink, rely primarily on resource managers, such as YARN, messes, Omega, Borg, and kubernets, to allocate resources for applications. These managers schedule resources by packing the CPU and memory into containers. Unlike conventional short-lived containers used to process batch jobs, containers of LRAs remain active until the application is executed, and are therefore referred to as long-lived containers.

The same application is repeatedly executed for different data, which is particularly common in data centers. In this working mode, the only change is the data content, and the resource occupation mode is not changed. Therefore, it is of great significance to explore the resource occupation pattern and find out the optimal resource allocation strategy. If the resource manager reserves too much memory for an application, unnecessary waste can result. The application program only needs to occupy partial memory, and because the container is fixed before the application program is executed, the rest of the memory has no chance to be used by the application program and cannot be reallocated to other application programs, and before the container is released, the container becomes an idle memory fragment. Conversely, if the memory reservation is too small, the performance of the application may not be guaranteed, and the more serious result is that the application crashes or even does not complete smoothly. Therefore, it is meaningful to estimate an optimal memory reservation that can guarantee the performance of the application program without wasting resources.

In the prior art, the publication number is CN110187967A, and a memory prediction method and device suitable for a dependency analysis tool are disclosed in 2019, 8/30.s.a method specifically includes extracting a source code file from a Java package file; analyzing the extracted source code files to generate an abstract syntax tree, and acquiring the number of instance objects of each type of node classes in the abstract syntax tree generated by each source code file; calculating the memory occupied by the node class instance object of each class; calculating the size of an occupied memory of an abstract syntax tree generated by each source code file; and calculating the required memory size of the whole Java program package. The method only calculates the size of the memory occupied by the Java program package, and for the memory calculation, the memory is also required to be used for caching input data and intermediate variables, and the data quantity is related to the size of an input data set and cannot be obtained through source code analysis. Therefore, it is not suitable for predicting the memory of the entire long-life container.

Disclosure of Invention

The invention provides a self-adaptive optimal memory reservation estimation method for a long-life container, aiming at overcoming the defect that a resource manager cannot accurately and effectively estimate optimal memory reservation by using historical operating data of an application program for the long-life container in a data center in the prior art.

The primary objective of the present invention is to solve the above technical problems, and the technical solution of the present invention is as follows:

an adaptive optimal memory reservation estimation method for a long-life container, which is applied to different stages of a data center Spark distributed server cluster, comprises the following steps:

s1: executing a MEER + strategy at the initial stage of the Spark distributed cluster, collecting historical data of application program operation in a server, and estimating the optimal memory reservation of the Spark distributed cluster at the initial stage by using the historical data;

s2: and executing a DEEP-MEER strategy in a stable stage of the Spark distributed cluster, obtaining an optimal memory reservation model in the stable stage by using known historical data, and estimating the optimal memory reservation in the current stage by using the model.

In this scheme, the MEER + policy execution flow includes three processes, which are respectively recorded as: a trial run stage, an iterative search stage, an approximation stage,

the trial operation stage refers to that when the application program is submitted for the first time, the application program operates under excess reservation, initial memory occupation and program operation data are generated by trial operation, the initial memory occupation and program operation data are recorded by a history server and a measurement system, an expected value of the memory usage amount is calculated by using a histogram analysis model based on the memory occupation data, and then the expected value of the memory usage amount is transmitted to a resource manager;

the iterative search phase refers to the resource manager taking the last evaluation M when the application is resubmitted_n-1As the memory reservation of the current operation, the MEER records the memory occupation amount and the program operation time in the program operation process, and calculates the memory occupation expectation M_nAnd evaluating the performance, terminating the search if the performance satisfies any one of the termination conditions, M_n-2Namely the optimal reservation which is finally estimated; otherwise, MEER will calculate the expected value M_nAs the new memory reserved value of the application program, applying to the next execution; the termination conditions are three, one: the execution time is too long, and the condition two is as follows: and (3) the garbage is too time-consuming to recycle, and the condition is three: the memory utilization rate reaches the expected target, except for the third condition, the application program has to bear one time of time-consuming and inefficient operation for terminating the iterative search;

the approach stage means that the optimal memory reservation estimation has two branches according to different termination conditions; if the termination condition met in the iterative search stage is the condition one or the condition two, the memory reservation calculation formula of the MEER + is as follows:

M_n＝M_n-1+M_f，where M_f＜M_t-1-M_t， (1)

wherein M is_fIs an increment or decrement added to correct the estimation result, M_tIs an estimated memory reservation that meets the termination condition; the approximation stage is terminated when no termination condition is met any more, and the final optimal memory is reserved as the estimation result M of the last iteration_n-1(ii) a If the termination condition met in the iterative search stage is condition three, the MEER + executes the outer process, and the memory reservation calculation formula is as follows:

M_n＝M_n-1-M_f，where M_f＜M_t-M， (2)

and stopping until the condition I or the condition II is met, and reserving the final optimal memory as a memory reserved value used in the last iteration, namely an estimation result of the last iteration.

In the scheme, the optimal memory reservation estimation model used for executing the MEER + strategy is a histogram analysis model, the histogram analysis model is used for recording the current memory occupation every second in a measurement system for each operation of an application program, a corresponding histogram is drawn, two endpoints of a rectangle on a horizontal axis in the histogram represent the memory usage amount, the high representation frequency of each rectangle, namely the occurrence frequency of the memory occupation amount between the two endpoints, the probability density estimation is carried out on the memory occupation by utilizing a histogram analysis method, and the memory occupation expectation is calculated.

In the scheme, the average value of the memory occupation amount in the histogram analysis method at a certain moment is x_iThe probability within the interval of (a) is defined as:

where Count is the sum of all the rectangle heights, i.e., frequencies, Freq (x)_i) Denotes the mean value x_iThe frequency of (c).

In the scheme, the memory occupation expectation calculation formula is as follows:

wherein N is the number of rectangles in the frequency domain distribution histogram.

In this scheme, the execution mode of the DEEP-MEER strategy is as follows: when the application program is submitted, the resource manager will adopt the last optimal memory reservation estimation result M_n-1As the memory reservation of the current operation, the history server and the measurement system record the memory occupation data of the current operation, and the expectation of the memory usage amount is calculated by using the Actor-Critic modelValue M_nThe expected value is then communicated to the resource manager.

In the scheme, an optimal memory reservation estimation model used by the DEEP-MEER strategy is an Actor-criticc model, and the Actor-criticc model comprises the following steps: the system comprises an Actor unit, a Critic unit and a cluster for running application programs;

the Actor unit is responsible for providing a strategy, the Actor unit selects an action according to the probability and modifies the probability of the action to be selected according to the score provided by the Critic unit, the Actor unit implements a random strategy which maps the system state to the corresponding action, and the Actor unit comprises three layers of neurons: the method comprises the steps that an input layer, a hidden layer and an output layer are arranged, wherein the input of the input layer is from an environment state, an activation function of the hidden layer is Relu (), neurons of the output layer respectively correspond to how much memory reservation actions are set for a current application program, the output result of the output layer is finally converted into a value between 0 and 1 through a Softmax () function, and the sum of all output values is equal to 1;

the Critic unit is a value function of an evaluation strategy, evaluates the action selected by the Actor unit and provides feedback to help the Actor unit to adjust the strategy, an output layer of the Critic unit is only provided with one neuron, the neuron is a score given by the Critic unit, once the Critic unit calculates the score, the score is combined with a reward returned by an environment, and finally a loss value is calculated and used for guiding the Actor unit and the Critic unit to update parameters;

the cluster running the application program interacts with the Actor and the Critic, and the functions of the cluster comprise: firstly, executing an action selected by an Actor unit, namely, reserving a memory appointed by the Actor to run and submit an application program; second, return the state changed by performing the action and measure the size of the reward value that benefits from the action.

In this scheme, the Actor-criticic model calculation process includes the following steps:

s1: initialization state S₀；

S2 execution of Critic Unit, calculate V (S)₀)；

S3: let i equal to 1;

s4: execution of Actor element, based on S_i-1Calculating the probability P of each action_i-1Determining A with the maximum probability_i-1；

S5: performing action A_i-1Obtaining S_iAnd R_i；

S6 execution of Critic Unit, calculate V (S)_i)；

S7 calculating the TD error delta_i-1；

S8: calculating Loss value Loss (delta)_i-1)；

S9, updating the parameter omega of Actor and Critic under the guidance of the loss value;

s10, if i is not more than N, i is i +1, go back to S4

The initialization S₀The method comprises the steps of running an application program in a state of reserving excessive memory to obtain an initial environment state; v refers to a cost function.

In this embodiment, the TD error is defined as:

δ_t＝R_t+1+γV(S_t+1)-V(S_t). (5)

the loss function is freely selected according to the requirement; the parameter updating means that the neural network carries out back propagation according to a chain rule, calculates the derivative of the composite function, then propagates the gradient of the output neuron back to the input neuron, and adjusts the learnable parameters of the network according to the calculated gradient; n is the set iteration number;

the Actor-Critic model defines model parameters, and the model parameters comprise: a reward value, a state parameter and an action, the reward value being used to determine a benefit of performing a given action in a given state, the reward value at time t being defined by the formula:

wherein M is_tAnd T_tReferring to reserved memory and program running time at time t, respectively, each neuron of the output layer of the Actor unit corresponds toSetting a specific memory;

the state parameters include: Δ t represents increased program run time compared to the trial run;

e represents the expected value of the memory occupation;

max₁～max_nrepresenting n values with highest frequency occupied by a memory in histogram analysis;

p₁～p_ndenotes max₁～max_nThe corresponding frequency.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention adopts different optimal memory reservation estimation strategies for the server cluster in the data center at different life cycle stages, and improves the estimation accuracy by thinning the step length to approach the optimal value, thereby ensuring the stability of the application program performance.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a schematic diagram of the architecture and workflow of the MEER + policy of the present invention.

FIG. 3 is a frequency domain histogram of memory usage in the present invention.

FIG. 4 is a schematic diagram of the architecture and workflow of the DEEP-MEER strategy of the present invention.

FIG. 5 is a schematic structural diagram of an Actor-Critic model according to the present invention.

Fig. 6 is a schematic diagram of the average relative error of the MEER strategy, the MEER + strategy and the Deep-MEER strategy over four workloads in the embodiment of the present invention.

FIG. 7 is a schematic diagram of relative estimation errors of the MEER strategy, the MEER + strategy and the Deep-MEER strategy on four workloads in the embodiment of the present invention.

Fig. 8 is a schematic diagram illustrating changes in memory utilization during the Page Rank operation process in the embodiment of the present invention.

FIG. 9 is a diagram illustrating the variation of the runtime of an application with the number of repeated executions when the MEER policy, the MEER + policy, and the Deep-MEER policy are applied to a benchmark workload according to an embodiment of the present invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

Example 1

As shown in fig. 1, an adaptive optimal memory reservation estimation method for a long-life container, which is applied to different stages of a Spark distributed cluster, includes the following steps:

s1: executing a MEER + strategy at the initial stage of the Spark distributed cluster, collecting historical data of application program operation in a server, and obtaining the optimal memory reservation of the Spark distributed cluster at the initial stage by using the historical data;

it should be noted that Spark distributed cluster is in the initial stage. When the Spark distributed cluster just starts to work, although no program running experience exists and no estimation basis exists, the MEER + strategy can be adopted to sacrifice some application program performance for preliminary estimation. Meanwhile, the system generates a large amount of historical data of application program operation, so that original data is provided for model training, and the estimation model based on reinforcement learning can be trained.

S2: executing a DEEP-MEER strategy in a stable stage of the Spark distributed cluster, obtaining an optimal memory reservation model in the stable stage by using known historical data, and estimating the optimal memory reservation in the current stage by using the model;

fig. 2 is a schematic diagram illustrating the architecture and workflow of the MEER + policy. In this scheme, the MEER + policy execution flow includes three processes, which are respectively recorded as: a trial run stage, an iterative search stage, an approximation stage,

the iterative search phase refers to the resource manager taking the last evaluation M when the application is resubmitted_n-1As the memory reservation of the current operation, the MEER records the memory occupation amount and the program operation time in the program operation process, and calculates the memory occupation expectation M_nAnd evaluating the performance, terminating the search if the performance satisfies any one of the termination conditions, M_n-2Namely the optimal reservation which is finally estimated; otherwise, MEER will calculate the expected value M_nAnd the new memory reserved value of the application program is used for the next execution. The termination conditions are three, one: the execution time is too long, and the condition two is as follows: and (3) the garbage is too time-consuming to recycle, and the condition is three: memory utilization achieves the desired goal. Obviously, except for the third condition, to terminate the iterative search, the application program must undergo a time-consuming and inefficient run;

the approach stage means that the optimal memory reservation estimation has two branches according to different termination conditions; if the termination condition met in the iterative search stage is the condition one or the condition two, the internal side execution process of the MEER + is carried out, and the memory reservation calculation formula is as follows:

M_n＝M_n-1+M_f，where M_f＜M_t-1-M_t， (1)

M_n＝M_n-1-M_f，where M_f＜M_t-M， (2)

stopping until the condition one or the condition two is met, and finally reserving the optimal memory as a memory reserved value M used in the last iteration_n-2I.e. the estimation result of the previous iteration.

In the scheme, the optimal memory reservation model used in executing the MEER + strategy is a histogram analysis model. The histogram analysis model is to record the current memory occupation once every second in the measurement system for each operation of the application program, draw a corresponding histogram, as shown in fig. 3, two end points of a rectangle on a horizontal axis in the histogram represent the memory usage amount, mark the mean value of the two end points on the horizontal axis for the convenience of calculation, and perform probability density estimation on the memory occupation by using a histogram analysis method and calculate the memory occupation expectation, wherein the high representative frequency of each rectangle is the occurrence frequency of the memory occupation amount between the two end points.

In the scheme, the average value of the memory occupation amount in the histogram analysis method at a certain moment is x_iThe probability within the interval of (a) may be defined as:

It should be noted that the expectation value reflects the possible average cost of future server cluster application operations, which means that most of the memory requirements have a very high reference value for the estimation of the reserved memory. As long as proper parameters are added, a functional relation formula with the memory occupation expectation as an independent variable and the reserved memory as a dependent variable is formed, and a model for estimating the optimal memory reservation is constructed.

As shown in fig. 4, in the stable stage of the server, in this scheme, the DEEP-mer policy is executed in such a way that, when the application program is submitted, the resource manager will use the last optimal memory reservation estimation result M_n-1As the memory reservation of the current operation, the history server and the measurement system record the memory occupation data of the current operation, and calculate the expected value M of the memory usage amount by using the Actor-Critic model_nThe expected value is then communicated to the resource manager.

the Actor unit executes a random strategy, the random strategy is a probability value for mapping the system state to a corresponding action, the Actor unit selects an action according to the probability, and then modifies the probability of the action to be selected according to a feedback score provided by the Critic unit; the Actor unit includes an input layer, a hidden layer, and an output layer, as shown in fig. 5, each circle represents a neuron, and each line between two neurons represents a weight. Let ω be a set of such weight parameters. The value of each neuron is a weighted sum of the neurons of the previous layer, except that the value of the input neuron is provided by the environment. The input of the input layer is from a cluster state of an operating application program, the activation function of the hidden layer is Relu (), the neurons of the output layer respectively correspond to how many memory reserved actions are set for the application program, the output result of the output layer of the Actor unit is converted into a value between 0 and 1 through a Softmax () function, the sum of all output values is equal to 1, and each output value represents the probability that the corresponding action should be selected.

The criticic unit is a value function of an evaluation strategy, evaluates the action selected by the Actor unit and provides feedback to help the Actor unit to adjust the strategy; the only difference between the basic structure and the parametric form of the Critic unit and the Actor unit is that its output layer has only one neuron, which is the score given by the Critic unit. Once the Critic unit calculates the score, the Critic unit combines the score with the reward returned by the environment, and finally calculates a loss value for guiding the Actor unit and the Critic unit to carry out parameter updating.

note that, the initialization S₀The method comprises the steps of running an application program in a state of reserving excessive memory to obtain an initial environment state; v refers to a cost function;

at initialization S₀And reserving excessive memory, wherein the TD error is a common error for adjusting the strategy, and is defined as:

δ_t＝R_t+1+γV(S_t+1)-V(S_t). (5)

the loss function adopted for calculating the loss value can be freely selected according to personal needs, parameter updating refers to that the neural network carries out back propagation according to a chain rule, the derivative of the composite function is calculated, then the gradient of the output neuron is propagated back to the input neuron, and learnable parameters of the network are adjusted according to the calculated gradient.

The Actor-Critic model defines model parameters, and the model parameters comprise: the objective of the Actor-criticic model is to conserve memory while maintaining good program run performance. Therefore, the more memory is saved, the less time is required for completing the application program, and the more the system is expected, and the reward value at the time t is defined by the formula:

wherein M is_tAnd T_tThe reserved memory and the program running time at the moment t are respectively indicated, and each neuron of an output layer of the Actor unit corresponds to a specific memory setting;

e represents the expected value of the memory occupation;

p₁～p_ndenotes max₁～max_nThe corresponding frequency.

The actions are exemplified by the following settings: each neuron of the output layer of an Actor unit corresponds to a particular memory setting. If the neuron obtains the highest probability, selecting an action of setting the reserved memory to be 0.5 GB; if the neuron obtains the highest probability, selecting an action of setting the reserved memory to be 1 GB; the actions represented by each of the remaining neurons and so on.

In the following embodiment, the present invention is verified and analyzed in terms of estimation accuracy, generalization capability, memory utilization, and application performance.

First, estimate the precision

The relative error of the final result for each estimation method over four typical workloads is shown in fig. 6. Both the MEER + strategy and Deep-MEER strategy show better Estimation accuracy on most workloads than the reference [1] ([1] Xu, G., Xu, C.: MEER: Online Estimation of optical Memory responsiveness for Long live contacts In-Memory Cluster computing.in:39th IEEE International reference on Distributed Computing Systems (ICDCS). pp.23-34. (2019)). FIG. 7 illustrates the average relative error of the strategy across all workloads. It can be seen that the precision of the MEER + is the highest, and the Deep-MEER strategy is the second time, and the performances of the Deep-MEER strategy are better than the estimation strategy proposed in the document [1 ].

Second, generalization ability

The experimental results in FIG. 6 also show that the Deep-MEER strategy of the present invention has good generalization ability. In the experiment, the Actor-Critic model is trained by using data of workload Page Rank and Triangle Count, but when the trained model is used for estimating Shortest Paths and SVD + +, the estimation result still performs well. This result illustrates that once model training is complete, it can be used for optimal memory reservation estimation for any workload.

Memory utilization rate

The invention is beneficial to saving memory resources and improving the memory utilization rate. FIG. 8 is a graph of memory footprint as a function of run time. The memory utilization of the MEER + policy and Deep-MEER policy are all greater than that of reference [1], and their average utilization and peak memory utilization are superior to that of reference [1 ].

Fourth, application program performance

The invention can ensure that the application program does not experience sharp performance fluctuation in a stable stage, and is beneficial to improving user experience. As shown in fig. 9, the reference [1] and the MEER + used in the present invention for making the preliminary estimation cause performance jitter, and the application can always maintain satisfactory performance when the Deep-MEER strategy is used.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. An adaptive optimal memory reservation estimation method for long-life containers, which is applied to different stages of a Spark distributed cluster of a data center, is characterized by comprising the following steps:

2. The adaptive optimal memory reservation estimation method for long-life containers as claimed in claim 1, wherein the MEER + policy execution flow includes three processes: a trial run stage, an iterative search stage, an approximation stage,

the iterative search phase refers to the resource manager taking the last evaluation M when the application is resubmitted_n-1As the memory reservation of the current operation, the MEER records the memory occupation amount and the program operation time in the program operation process, and calculates the memory occupation expectation M_nAnd evaluating the performance, terminating the search if the performance satisfies any one of the termination conditions, M_n-2Is the most importantFinal estimated optimal reservation; otherwise, MEER will calculate the expected value M_nAs the new memory reserved value of the application program, applying to the next execution; the termination conditions are three, one: the execution time is too long, and the condition two is as follows: and (3) the garbage is too time-consuming to recycle, and the condition is three: the memory utilization rate reaches the expected target;

M_n＝M_n-1+M_f，where M_f＜M_t-1-M_t， (1)

M_n＝M_n-1-M_f，where M_f＜M_t-M， (2)

3. The method as claimed in claim 2, wherein the optimal memory reservation estimation model used in the MEER + policy is a histogram analysis model, the histogram analysis model is used for recording the current memory occupancy every second in the measurement system for each operation of the application program, a corresponding histogram is drawn, two endpoints of a rectangle on a horizontal axis in the histogram represent the memory usage, the frequency of occurrence of a high representation frequency of each rectangle, that is, the memory occupancy between the two endpoints, is estimated by using a probability density estimation method for the memory occupancy, and the memory occupancy expectation is calculated.

4. The adaptive optimal memory reservation estimation method for long-life containers of claim 3, wherein the average value of memory occupancy in histogram analysis is x at a certain moment_iThe probability within the interval of (a) is defined as:

5. The adaptive optimal memory reservation estimation method for long-life containers as claimed in claim 4, wherein the memory usage expectation calculation formula is as follows:

6. The adaptive optimal memory reservation estimation method for long-life containers according to claim 4, wherein the DEEP-mer policy is implemented by: when the application program is submitted, the resource manager will adopt the last optimal memory reservation estimation result M_n-1As the memory reservation of the current operation, the history server and the measurement system record the memory occupation data of the current operation, and calculate the expected value M of the memory usage amount by using the Actor-Critic model_nThe expected value is then communicated to the resource manager.

7. The adaptive optimal memory reservation estimation method for the long-life container according to claim 6, wherein the optimal memory reservation estimation model used by the DEEP-mer strategy is an Actor-critical model, and the Actor-critical model comprises: the system comprises an Actor unit, a Critic unit and a cluster for running application programs;

the Actor unit is responsible for providing strategies, selects an action according to the probability, modifies the probability of the action to be selected according to the score provided by the Critic unit, and realizes a random strategy which maps the system state to the corresponding action;

8. The adaptive optimal memory reservation estimation method for long-lived containers of claim 7, wherein the Actor unit comprises three layers of neurons: the input layer, the hidden layer and the output layer, wherein the input of the input layer is from an environment state, the activation function of the hidden layer is Relu (), the neurons of the output layer respectively correspond to how much memory reservation action is set for the current application program, the output result of the output layer is finally converted into a value between 0 and 1 through a Softmax () function, and the sum of all output values is equal to 1.

9. The adaptive optimal memory reservation estimation method for long-life containers according to claim 8, wherein the Actor-criticic model calculation procedure comprises the following steps:

s1: initialization state S₀；

S2: performing Critic unit to calculate V (S)₀)；

S3: let i equal to 1;

S5: performing action A_i-1Obtaining S_iAnd R_i；

S6: performing Critic unit to calculate V (S)_i)；

S7: calculating TD error delta_i-1；

S8: calculating Loss value Loss (delta)_i-1)；

S9: updating the parameter omega of Actor and Critic under the guidance of the loss value;

s10: if i is not more than N, i is i +1, go back to S4;

the initialization S₀The method comprises the steps of running an application program in a state of reserving excessive memory to obtain an initial environment state; v represents a cost function.

10. The adaptive optimal memory reservation estimation method for long-lived containers as claimed in claim 9, wherein the TD error is defined as:

δ_t＝R_t+1+γV(S_t+1)-V(S_t). (5)

e represents the expected value of the memory occupation;

p₁～p_ndenotes max₁～max_nThe corresponding frequency.