CN112328355A - Self-adaptive optimal memory reservation estimation method for long-life container - Google Patents
Self-adaptive optimal memory reservation estimation method for long-life container Download PDFInfo
- Publication number
- CN112328355A CN112328355A CN202011073505.2A CN202011073505A CN112328355A CN 112328355 A CN112328355 A CN 112328355A CN 202011073505 A CN202011073505 A CN 202011073505A CN 112328355 A CN112328355 A CN 112328355A
- Authority
- CN
- China
- Prior art keywords
- memory
- actor
- reservation
- stage
- memory reservation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000013459 approach Methods 0.000 claims abstract description 5
- 230000009471 action Effects 0.000 claims description 31
- 230000006870 function Effects 0.000 claims description 22
- 210000002569 neuron Anatomy 0.000 claims description 21
- 238000004458 analytical method Methods 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 14
- 230000003044 adaptive effect Effects 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 11
- 230000000875 corresponding effect Effects 0.000 claims description 10
- 238000005259 measurement Methods 0.000 claims description 9
- 230000008901 benefit Effects 0.000 claims description 6
- 210000002364 input neuron Anatomy 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 239000002131 composite material Substances 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000013209 evaluation strategy Methods 0.000 claims description 3
- 238000012821 model calculation Methods 0.000 claims description 3
- 210000004205 output neuron Anatomy 0.000 claims description 3
- 230000002787 reinforcement Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 8
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 230000004043 responsiveness Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45587—Isolation or security of virtual machine instances
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45595—Network integration; Enabling network access in virtual machine instances
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a self-adaptive optimal memory reservation estimation method for a long-life container, which is applied to different stages of a Spark distributed cluster of a data center and comprises the following steps: s1: executing a MEER + strategy at the initial stage of the server cluster, collecting historical data of application program operation in the server, and estimating the optimal memory reservation of the cluster at the initial stage by using the historical data; s2: and executing a DEEP-MEER strategy in a stable stage of the server cluster, obtaining an optimal memory reservation model in the stable stage by using historical data, and estimating the optimal memory reservation in the current stage by using the model. The method adopts different optimal memory reservation estimation strategies for Spark distributed clusters in the data center at different life cycle stages, approaches an optimal value by thinning step length at the initial stage of the clusters, improves estimation accuracy, and establishes a reinforcement learning model by utilizing rich historical data at the stable stage of the clusters, thereby ensuring the stability of application program performance.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a self-adaptive optimal memory reservation estimation method for a long-life container.
Background
With the development of big data, more and more memory computing works including machine learning, stream processing, interactive query, graph computing and the like are deployed on a shared cluster of a data center. Such a workload handles a large amount of data, and is called a Long Running Application (LRA). It can be observed from the results of public enterprise cluster tracking and analysis that LRAs have become the major workload of everyday online services in data centers today.
Existing large data processing systems, such as Spark and Flink, rely primarily on resource managers, such as YARN, messes, Omega, Borg, and kubernets, to allocate resources for applications. These managers schedule resources by packing the CPU and memory into containers. Unlike conventional short-lived containers used to process batch jobs, containers of LRAs remain active until the application is executed, and are therefore referred to as long-lived containers.
The same application is repeatedly executed for different data, which is particularly common in data centers. In this working mode, the only change is the data content, and the resource occupation mode is not changed. Therefore, it is of great significance to explore the resource occupation pattern and find out the optimal resource allocation strategy. If the resource manager reserves too much memory for an application, unnecessary waste can result. The application program only needs to occupy partial memory, and because the container is fixed before the application program is executed, the rest of the memory has no chance to be used by the application program and cannot be reallocated to other application programs, and before the container is released, the container becomes an idle memory fragment. Conversely, if the memory reservation is too small, the performance of the application may not be guaranteed, and the more serious result is that the application crashes or even does not complete smoothly. Therefore, it is meaningful to estimate an optimal memory reservation that can guarantee the performance of the application program without wasting resources.
In the prior art, the publication number is CN110187967A, and a memory prediction method and device suitable for a dependency analysis tool are disclosed in 2019, 8/30.s.a method specifically includes extracting a source code file from a Java package file; analyzing the extracted source code files to generate an abstract syntax tree, and acquiring the number of instance objects of each type of node classes in the abstract syntax tree generated by each source code file; calculating the memory occupied by the node class instance object of each class; calculating the size of an occupied memory of an abstract syntax tree generated by each source code file; and calculating the required memory size of the whole Java program package. The method only calculates the size of the memory occupied by the Java program package, and for the memory calculation, the memory is also required to be used for caching input data and intermediate variables, and the data quantity is related to the size of an input data set and cannot be obtained through source code analysis. Therefore, it is not suitable for predicting the memory of the entire long-life container.
Disclosure of Invention
The invention provides a self-adaptive optimal memory reservation estimation method for a long-life container, aiming at overcoming the defect that a resource manager cannot accurately and effectively estimate optimal memory reservation by using historical operating data of an application program for the long-life container in a data center in the prior art.
The primary objective of the present invention is to solve the above technical problems, and the technical solution of the present invention is as follows:
an adaptive optimal memory reservation estimation method for a long-life container, which is applied to different stages of a data center Spark distributed server cluster, comprises the following steps:
s1: executing a MEER + strategy at the initial stage of the Spark distributed cluster, collecting historical data of application program operation in a server, and estimating the optimal memory reservation of the Spark distributed cluster at the initial stage by using the historical data;
s2: and executing a DEEP-MEER strategy in a stable stage of the Spark distributed cluster, obtaining an optimal memory reservation model in the stable stage by using known historical data, and estimating the optimal memory reservation in the current stage by using the model.
In this scheme, the MEER + policy execution flow includes three processes, which are respectively recorded as: a trial run stage, an iterative search stage, an approximation stage,
the trial operation stage refers to that when the application program is submitted for the first time, the application program operates under excess reservation, initial memory occupation and program operation data are generated by trial operation, the initial memory occupation and program operation data are recorded by a history server and a measurement system, an expected value of the memory usage amount is calculated by using a histogram analysis model based on the memory occupation data, and then the expected value of the memory usage amount is transmitted to a resource manager;
the iterative search phase refers to the resource manager taking the last evaluation M when the application is resubmittedn-1As the memory reservation of the current operation, the MEER records the memory occupation amount and the program operation time in the program operation process, and calculates the memory occupation expectation MnAnd evaluating the performance, terminating the search if the performance satisfies any one of the termination conditions, Mn-2Namely the optimal reservation which is finally estimated; otherwise, MEER will calculate the expected value MnAs the new memory reserved value of the application program, applying to the next execution; the termination conditions are three, one: the execution time is too long, and the condition two is as follows: and (3) the garbage is too time-consuming to recycle, and the condition is three: the memory utilization rate reaches the expected target, except for the third condition, the application program has to bear one time of time-consuming and inefficient operation for terminating the iterative search;
the approach stage means that the optimal memory reservation estimation has two branches according to different termination conditions; if the termination condition met in the iterative search stage is the condition one or the condition two, the memory reservation calculation formula of the MEER + is as follows:
Mn=Mn-1+Mf,where Mf<Mt-1-Mt, (1)
wherein M isfIs an increment or decrement added to correct the estimation result, MtIs an estimated memory reservation that meets the termination condition; the approximation stage is terminated when no termination condition is met any more, and the final optimal memory is reserved as the estimation result M of the last iterationn-1(ii) a If the termination condition met in the iterative search stage is condition three, the MEER + executes the outer process, and the memory reservation calculation formula is as follows:
Mn=Mn-1-Mf,where Mf<Mt-M, (2)
and stopping until the condition I or the condition II is met, and reserving the final optimal memory as a memory reserved value used in the last iteration, namely an estimation result of the last iteration.
In the scheme, the optimal memory reservation estimation model used for executing the MEER + strategy is a histogram analysis model, the histogram analysis model is used for recording the current memory occupation every second in a measurement system for each operation of an application program, a corresponding histogram is drawn, two endpoints of a rectangle on a horizontal axis in the histogram represent the memory usage amount, the high representation frequency of each rectangle, namely the occurrence frequency of the memory occupation amount between the two endpoints, the probability density estimation is carried out on the memory occupation by utilizing a histogram analysis method, and the memory occupation expectation is calculated.
In the scheme, the average value of the memory occupation amount in the histogram analysis method at a certain moment is xiThe probability within the interval of (a) is defined as:
where Count is the sum of all the rectangle heights, i.e., frequencies, Freq (x)i) Denotes the mean value xiThe frequency of (c).
In the scheme, the memory occupation expectation calculation formula is as follows:
wherein N is the number of rectangles in the frequency domain distribution histogram.
In this scheme, the execution mode of the DEEP-MEER strategy is as follows: when the application program is submitted, the resource manager will adopt the last optimal memory reservation estimation result Mn-1As the memory reservation of the current operation, the history server and the measurement system record the memory occupation data of the current operation, and the expectation of the memory usage amount is calculated by using the Actor-Critic modelValue MnThe expected value is then communicated to the resource manager.
In the scheme, an optimal memory reservation estimation model used by the DEEP-MEER strategy is an Actor-criticc model, and the Actor-criticc model comprises the following steps: the system comprises an Actor unit, a Critic unit and a cluster for running application programs;
the Actor unit is responsible for providing a strategy, the Actor unit selects an action according to the probability and modifies the probability of the action to be selected according to the score provided by the Critic unit, the Actor unit implements a random strategy which maps the system state to the corresponding action, and the Actor unit comprises three layers of neurons: the method comprises the steps that an input layer, a hidden layer and an output layer are arranged, wherein the input of the input layer is from an environment state, an activation function of the hidden layer is Relu (), neurons of the output layer respectively correspond to how much memory reservation actions are set for a current application program, the output result of the output layer is finally converted into a value between 0 and 1 through a Softmax () function, and the sum of all output values is equal to 1;
the Critic unit is a value function of an evaluation strategy, evaluates the action selected by the Actor unit and provides feedback to help the Actor unit to adjust the strategy, an output layer of the Critic unit is only provided with one neuron, the neuron is a score given by the Critic unit, once the Critic unit calculates the score, the score is combined with a reward returned by an environment, and finally a loss value is calculated and used for guiding the Actor unit and the Critic unit to update parameters;
the cluster running the application program interacts with the Actor and the Critic, and the functions of the cluster comprise: firstly, executing an action selected by an Actor unit, namely, reserving a memory appointed by the Actor to run and submit an application program; second, return the state changed by performing the action and measure the size of the reward value that benefits from the action.
In this scheme, the Actor-criticic model calculation process includes the following steps:
s1: initialization state S0;
S2 execution of Critic Unit, calculate V (S)0);
S3: let i equal to 1;
s4: execution of Actor element, based on Si-1Calculating the probability P of each actioni-1Determining A with the maximum probabilityi-1;
S5: performing action Ai-1Obtaining SiAnd Ri;
S6 execution of Critic Unit, calculate V (S)i);
S7 calculating the TD error deltai-1;
S8: calculating Loss value Loss (delta)i-1);
S9, updating the parameter omega of Actor and Critic under the guidance of the loss value;
s10, if i is not more than N, i is i +1, go back to S4
The initialization S0The method comprises the steps of running an application program in a state of reserving excessive memory to obtain an initial environment state; v refers to a cost function.
In this embodiment, the TD error is defined as:
δt=Rt+1+γV(St+1)-V(St). (5)
the loss function is freely selected according to the requirement; the parameter updating means that the neural network carries out back propagation according to a chain rule, calculates the derivative of the composite function, then propagates the gradient of the output neuron back to the input neuron, and adjusts the learnable parameters of the network according to the calculated gradient; n is the set iteration number;
the Actor-Critic model defines model parameters, and the model parameters comprise: a reward value, a state parameter and an action, the reward value being used to determine a benefit of performing a given action in a given state, the reward value at time t being defined by the formula:
wherein M istAnd TtReferring to reserved memory and program running time at time t, respectively, each neuron of the output layer of the Actor unit corresponds toSetting a specific memory;
the state parameters include: Δ t represents increased program run time compared to the trial run;
e represents the expected value of the memory occupation;
max1~maxnrepresenting n values with highest frequency occupied by a memory in histogram analysis;
p1~pndenotes max1~maxnThe corresponding frequency.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention adopts different optimal memory reservation estimation strategies for the server cluster in the data center at different life cycle stages, and improves the estimation accuracy by thinning the step length to approach the optimal value, thereby ensuring the stability of the application program performance.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of the architecture and workflow of the MEER + policy of the present invention.
FIG. 3 is a frequency domain histogram of memory usage in the present invention.
FIG. 4 is a schematic diagram of the architecture and workflow of the DEEP-MEER strategy of the present invention.
FIG. 5 is a schematic structural diagram of an Actor-Critic model according to the present invention.
Fig. 6 is a schematic diagram of the average relative error of the MEER strategy, the MEER + strategy and the Deep-MEER strategy over four workloads in the embodiment of the present invention.
FIG. 7 is a schematic diagram of relative estimation errors of the MEER strategy, the MEER + strategy and the Deep-MEER strategy on four workloads in the embodiment of the present invention.
Fig. 8 is a schematic diagram illustrating changes in memory utilization during the Page Rank operation process in the embodiment of the present invention.
FIG. 9 is a diagram illustrating the variation of the runtime of an application with the number of repeated executions when the MEER policy, the MEER + policy, and the Deep-MEER policy are applied to a benchmark workload according to an embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
Example 1
As shown in fig. 1, an adaptive optimal memory reservation estimation method for a long-life container, which is applied to different stages of a Spark distributed cluster, includes the following steps:
s1: executing a MEER + strategy at the initial stage of the Spark distributed cluster, collecting historical data of application program operation in a server, and obtaining the optimal memory reservation of the Spark distributed cluster at the initial stage by using the historical data;
it should be noted that Spark distributed cluster is in the initial stage. When the Spark distributed cluster just starts to work, although no program running experience exists and no estimation basis exists, the MEER + strategy can be adopted to sacrifice some application program performance for preliminary estimation. Meanwhile, the system generates a large amount of historical data of application program operation, so that original data is provided for model training, and the estimation model based on reinforcement learning can be trained.
S2: executing a DEEP-MEER strategy in a stable stage of the Spark distributed cluster, obtaining an optimal memory reservation model in the stable stage by using known historical data, and estimating the optimal memory reservation in the current stage by using the model;
fig. 2 is a schematic diagram illustrating the architecture and workflow of the MEER + policy. In this scheme, the MEER + policy execution flow includes three processes, which are respectively recorded as: a trial run stage, an iterative search stage, an approximation stage,
the trial operation stage refers to that when the application program is submitted for the first time, the application program operates under excess reservation, initial memory occupation and program operation data are generated by trial operation, the initial memory occupation and program operation data are recorded by a history server and a measurement system, an expected value of the memory usage amount is calculated by using a histogram analysis model based on the memory occupation data, and then the expected value of the memory usage amount is transmitted to a resource manager;
the iterative search phase refers to the resource manager taking the last evaluation M when the application is resubmittedn-1As the memory reservation of the current operation, the MEER records the memory occupation amount and the program operation time in the program operation process, and calculates the memory occupation expectation MnAnd evaluating the performance, terminating the search if the performance satisfies any one of the termination conditions, Mn-2Namely the optimal reservation which is finally estimated; otherwise, MEER will calculate the expected value MnAnd the new memory reserved value of the application program is used for the next execution. The termination conditions are three, one: the execution time is too long, and the condition two is as follows: and (3) the garbage is too time-consuming to recycle, and the condition is three: memory utilization achieves the desired goal. Obviously, except for the third condition, to terminate the iterative search, the application program must undergo a time-consuming and inefficient run;
the approach stage means that the optimal memory reservation estimation has two branches according to different termination conditions; if the termination condition met in the iterative search stage is the condition one or the condition two, the internal side execution process of the MEER + is carried out, and the memory reservation calculation formula is as follows:
Mn=Mn-1+Mf,where Mf<Mt-1-Mt, (1)
wherein M isfIs an increment or decrement added to correct the estimation result, MtIs an estimated memory reservation that meets the termination condition; the approximation stage is terminated when no termination condition is met any more, and the final optimal memory is reserved as the estimation result M of the last iterationn-1(ii) a If the termination condition met in the iterative search stage is condition three, the MEER + executes the outer process, and the memory reservation calculation formula is as follows:
Mn=Mn-1-Mf,where Mf<Mt-M, (2)
stopping until the condition one or the condition two is met, and finally reserving the optimal memory as a memory reserved value M used in the last iterationn-2I.e. the estimation result of the previous iteration.
In the scheme, the optimal memory reservation model used in executing the MEER + strategy is a histogram analysis model. The histogram analysis model is to record the current memory occupation once every second in the measurement system for each operation of the application program, draw a corresponding histogram, as shown in fig. 3, two end points of a rectangle on a horizontal axis in the histogram represent the memory usage amount, mark the mean value of the two end points on the horizontal axis for the convenience of calculation, and perform probability density estimation on the memory occupation by using a histogram analysis method and calculate the memory occupation expectation, wherein the high representative frequency of each rectangle is the occurrence frequency of the memory occupation amount between the two end points.
In the scheme, the average value of the memory occupation amount in the histogram analysis method at a certain moment is xiThe probability within the interval of (a) may be defined as:
where Count is the sum of all the rectangle heights, i.e., frequencies, Freq (x)i) Denotes the mean value xiThe frequency of (c).
In the scheme, the memory occupation expectation calculation formula is as follows:
wherein N is the number of rectangles in the frequency domain distribution histogram.
It should be noted that the expectation value reflects the possible average cost of future server cluster application operations, which means that most of the memory requirements have a very high reference value for the estimation of the reserved memory. As long as proper parameters are added, a functional relation formula with the memory occupation expectation as an independent variable and the reserved memory as a dependent variable is formed, and a model for estimating the optimal memory reservation is constructed.
As shown in fig. 4, in the stable stage of the server, in this scheme, the DEEP-mer policy is executed in such a way that, when the application program is submitted, the resource manager will use the last optimal memory reservation estimation result Mn-1As the memory reservation of the current operation, the history server and the measurement system record the memory occupation data of the current operation, and calculate the expected value M of the memory usage amount by using the Actor-Critic modelnThe expected value is then communicated to the resource manager.
In the scheme, an optimal memory reservation estimation model used by the DEEP-MEER strategy is an Actor-criticc model, and the Actor-criticc model comprises the following steps: the system comprises an Actor unit, a Critic unit and a cluster for running application programs;
the Actor unit executes a random strategy, the random strategy is a probability value for mapping the system state to a corresponding action, the Actor unit selects an action according to the probability, and then modifies the probability of the action to be selected according to a feedback score provided by the Critic unit; the Actor unit includes an input layer, a hidden layer, and an output layer, as shown in fig. 5, each circle represents a neuron, and each line between two neurons represents a weight. Let ω be a set of such weight parameters. The value of each neuron is a weighted sum of the neurons of the previous layer, except that the value of the input neuron is provided by the environment. The input of the input layer is from a cluster state of an operating application program, the activation function of the hidden layer is Relu (), the neurons of the output layer respectively correspond to how many memory reserved actions are set for the application program, the output result of the output layer of the Actor unit is converted into a value between 0 and 1 through a Softmax () function, the sum of all output values is equal to 1, and each output value represents the probability that the corresponding action should be selected.
The criticic unit is a value function of an evaluation strategy, evaluates the action selected by the Actor unit and provides feedback to help the Actor unit to adjust the strategy; the only difference between the basic structure and the parametric form of the Critic unit and the Actor unit is that its output layer has only one neuron, which is the score given by the Critic unit. Once the Critic unit calculates the score, the Critic unit combines the score with the reward returned by the environment, and finally calculates a loss value for guiding the Actor unit and the Critic unit to carry out parameter updating.
The cluster running the application program interacts with the Actor and the Critic, and the functions of the cluster comprise: firstly, executing an action selected by an Actor unit, namely, reserving a memory appointed by the Actor to run and submit an application program; second, return the state changed by performing the action and measure the size of the reward value that benefits from the action.
In this scheme, the Actor-criticic model calculation process includes the following steps:
note that, the initialization S0The method comprises the steps of running an application program in a state of reserving excessive memory to obtain an initial environment state; v refers to a cost function;
at initialization S0And reserving excessive memory, wherein the TD error is a common error for adjusting the strategy, and is defined as:
δt=Rt+1+γV(St+1)-V(St). (5)
the loss function adopted for calculating the loss value can be freely selected according to personal needs, parameter updating refers to that the neural network carries out back propagation according to a chain rule, the derivative of the composite function is calculated, then the gradient of the output neuron is propagated back to the input neuron, and learnable parameters of the network are adjusted according to the calculated gradient.
The Actor-Critic model defines model parameters, and the model parameters comprise: the objective of the Actor-criticic model is to conserve memory while maintaining good program run performance. Therefore, the more memory is saved, the less time is required for completing the application program, and the more the system is expected, and the reward value at the time t is defined by the formula:
wherein M istAnd TtThe reserved memory and the program running time at the moment t are respectively indicated, and each neuron of an output layer of the Actor unit corresponds to a specific memory setting;
the state parameters include: Δ t represents increased program run time compared to the trial run;
e represents the expected value of the memory occupation;
max1~maxnrepresenting n values with highest frequency occupied by a memory in histogram analysis;
p1~pndenotes max1~maxnThe corresponding frequency.
The actions are exemplified by the following settings: each neuron of the output layer of an Actor unit corresponds to a particular memory setting. If the neuron obtains the highest probability, selecting an action of setting the reserved memory to be 0.5 GB; if the neuron obtains the highest probability, selecting an action of setting the reserved memory to be 1 GB; the actions represented by each of the remaining neurons and so on.
In the following embodiment, the present invention is verified and analyzed in terms of estimation accuracy, generalization capability, memory utilization, and application performance.
First, estimate the precision
The relative error of the final result for each estimation method over four typical workloads is shown in fig. 6. Both the MEER + strategy and Deep-MEER strategy show better Estimation accuracy on most workloads than the reference [1] ([1] Xu, G., Xu, C.: MEER: Online Estimation of optical Memory responsiveness for Long live contacts In-Memory Cluster computing.in:39th IEEE International reference on Distributed Computing Systems (ICDCS). pp.23-34. (2019)). FIG. 7 illustrates the average relative error of the strategy across all workloads. It can be seen that the precision of the MEER + is the highest, and the Deep-MEER strategy is the second time, and the performances of the Deep-MEER strategy are better than the estimation strategy proposed in the document [1 ].
Second, generalization ability
The experimental results in FIG. 6 also show that the Deep-MEER strategy of the present invention has good generalization ability. In the experiment, the Actor-Critic model is trained by using data of workload Page Rank and Triangle Count, but when the trained model is used for estimating Shortest Paths and SVD + +, the estimation result still performs well. This result illustrates that once model training is complete, it can be used for optimal memory reservation estimation for any workload.
Memory utilization rate
The invention is beneficial to saving memory resources and improving the memory utilization rate. FIG. 8 is a graph of memory footprint as a function of run time. The memory utilization of the MEER + policy and Deep-MEER policy are all greater than that of reference [1], and their average utilization and peak memory utilization are superior to that of reference [1 ].
Fourth, application program performance
The invention can ensure that the application program does not experience sharp performance fluctuation in a stable stage, and is beneficial to improving user experience. As shown in fig. 9, the reference [1] and the MEER + used in the present invention for making the preliminary estimation cause performance jitter, and the application can always maintain satisfactory performance when the Deep-MEER strategy is used.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (10)
1. An adaptive optimal memory reservation estimation method for long-life containers, which is applied to different stages of a Spark distributed cluster of a data center, is characterized by comprising the following steps:
s1: executing a MEER + strategy at the initial stage of the Spark distributed cluster, collecting historical data of application program operation in a server, and estimating the optimal memory reservation of the Spark distributed cluster at the initial stage by using the historical data;
s2: and executing a DEEP-MEER strategy in a stable stage of the Spark distributed cluster, obtaining an optimal memory reservation model in the stable stage by using known historical data, and estimating the optimal memory reservation in the current stage by using the model.
2. The adaptive optimal memory reservation estimation method for long-life containers as claimed in claim 1, wherein the MEER + policy execution flow includes three processes: a trial run stage, an iterative search stage, an approximation stage,
the trial operation stage refers to that when the application program is submitted for the first time, the application program operates under excess reservation, initial memory occupation and program operation data are generated by trial operation, the initial memory occupation and program operation data are recorded by a history server and a measurement system, an expected value of the memory usage amount is calculated by using a histogram analysis model based on the memory occupation data, and then the expected value of the memory usage amount is transmitted to a resource manager;
the iterative search phase refers to the resource manager taking the last evaluation M when the application is resubmittedn-1As the memory reservation of the current operation, the MEER records the memory occupation amount and the program operation time in the program operation process, and calculates the memory occupation expectation MnAnd evaluating the performance, terminating the search if the performance satisfies any one of the termination conditions, Mn-2Is the most importantFinal estimated optimal reservation; otherwise, MEER will calculate the expected value MnAs the new memory reserved value of the application program, applying to the next execution; the termination conditions are three, one: the execution time is too long, and the condition two is as follows: and (3) the garbage is too time-consuming to recycle, and the condition is three: the memory utilization rate reaches the expected target;
the approach stage means that the optimal memory reservation estimation has two branches according to different termination conditions; if the termination condition met in the iterative search stage is the condition one or the condition two, the memory reservation calculation formula of the MEER + is as follows:
Mn=Mn-1+Mf,where Mf<Mt-1-Mt, (1)
wherein M isfIs an increment or decrement added to correct the estimation result, MtIs an estimated memory reservation that meets the termination condition; the approximation stage is terminated when no termination condition is met any more, and the final optimal memory is reserved as the estimation result M of the last iterationn-1(ii) a If the termination condition met in the iterative search stage is condition three, the MEER + executes the outer process, and the memory reservation calculation formula is as follows:
Mn=Mn-1-Mf,where Mf<Mt-M, (2)
stopping until the condition one or the condition two is met, and finally reserving the optimal memory as a memory reserved value M used in the last iterationn-2I.e. the estimation result of the previous iteration.
3. The method as claimed in claim 2, wherein the optimal memory reservation estimation model used in the MEER + policy is a histogram analysis model, the histogram analysis model is used for recording the current memory occupancy every second in the measurement system for each operation of the application program, a corresponding histogram is drawn, two endpoints of a rectangle on a horizontal axis in the histogram represent the memory usage, the frequency of occurrence of a high representation frequency of each rectangle, that is, the memory occupancy between the two endpoints, is estimated by using a probability density estimation method for the memory occupancy, and the memory occupancy expectation is calculated.
4. The adaptive optimal memory reservation estimation method for long-life containers of claim 3, wherein the average value of memory occupancy in histogram analysis is x at a certain momentiThe probability within the interval of (a) is defined as:
where Count is the sum of all the rectangle heights, i.e., frequencies, Freq (x)i) Denotes the mean value xiThe frequency of (c).
6. The adaptive optimal memory reservation estimation method for long-life containers according to claim 4, wherein the DEEP-mer policy is implemented by: when the application program is submitted, the resource manager will adopt the last optimal memory reservation estimation result Mn-1As the memory reservation of the current operation, the history server and the measurement system record the memory occupation data of the current operation, and calculate the expected value M of the memory usage amount by using the Actor-Critic modelnThe expected value is then communicated to the resource manager.
7. The adaptive optimal memory reservation estimation method for the long-life container according to claim 6, wherein the optimal memory reservation estimation model used by the DEEP-mer strategy is an Actor-critical model, and the Actor-critical model comprises: the system comprises an Actor unit, a Critic unit and a cluster for running application programs;
the Actor unit is responsible for providing strategies, selects an action according to the probability, modifies the probability of the action to be selected according to the score provided by the Critic unit, and realizes a random strategy which maps the system state to the corresponding action;
the Critic unit is a value function of an evaluation strategy, evaluates the action selected by the Actor unit and provides feedback to help the Actor unit to adjust the strategy, an output layer of the Critic unit is only provided with one neuron, the neuron is a score given by the Critic unit, once the Critic unit calculates the score, the score is combined with a reward returned by an environment, and finally a loss value is calculated and used for guiding the Actor unit and the Critic unit to update parameters;
the cluster running the application program interacts with the Actor and the Critic, and the functions of the cluster comprise: firstly, executing an action selected by an Actor unit, namely, reserving a memory appointed by the Actor to run and submit an application program; second, return the state changed by performing the action and measure the size of the reward value that benefits from the action.
8. The adaptive optimal memory reservation estimation method for long-lived containers of claim 7, wherein the Actor unit comprises three layers of neurons: the input layer, the hidden layer and the output layer, wherein the input of the input layer is from an environment state, the activation function of the hidden layer is Relu (), the neurons of the output layer respectively correspond to how much memory reservation action is set for the current application program, the output result of the output layer is finally converted into a value between 0 and 1 through a Softmax () function, and the sum of all output values is equal to 1.
9. The adaptive optimal memory reservation estimation method for long-life containers according to claim 8, wherein the Actor-criticic model calculation procedure comprises the following steps:
s1: initialization state S0;
S2: performing Critic unit to calculate V (S)0);
S3: let i equal to 1;
s4: execution of Actor element, based on Si-1Calculating the probability P of each actioni-1Determining A with the maximum probabilityi-1;
S5: performing action Ai-1Obtaining SiAnd Ri;
S6: performing Critic unit to calculate V (S)i);
S7: calculating TD error deltai-1;
S8: calculating Loss value Loss (delta)i-1);
S9: updating the parameter omega of Actor and Critic under the guidance of the loss value;
s10: if i is not more than N, i is i +1, go back to S4;
the initialization S0The method comprises the steps of running an application program in a state of reserving excessive memory to obtain an initial environment state; v represents a cost function.
10. The adaptive optimal memory reservation estimation method for long-lived containers as claimed in claim 9, wherein the TD error is defined as:
δt=Rt+1+γV(St+1)-V(St). (5)
the loss function is freely selected according to the requirement; the parameter updating means that the neural network carries out back propagation according to a chain rule, calculates the derivative of the composite function, then propagates the gradient of the output neuron back to the input neuron, and adjusts the learnable parameters of the network according to the calculated gradient; n is the set iteration number;
the Actor-Critic model defines model parameters, and the model parameters comprise: a reward value, a state parameter and an action, the reward value being used to determine a benefit of performing a given action in a given state, the reward value at time t being defined by the formula:
wherein M istAnd TtThe reserved memory and the program running time at the moment t are respectively indicated, and each neuron of an output layer of the Actor unit corresponds to a specific memory setting;
the state parameters include: Δ t represents increased program run time compared to the trial run;
e represents the expected value of the memory occupation;
max1~maxnrepresenting n values with highest frequency occupied by a memory in histogram analysis;
p1~pndenotes max1~maxnThe corresponding frequency.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011073505.2A CN112328355B (en) | 2020-10-09 | 2020-10-09 | Adaptive optimal memory reservation estimation method for long-life container |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011073505.2A CN112328355B (en) | 2020-10-09 | 2020-10-09 | Adaptive optimal memory reservation estimation method for long-life container |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112328355A true CN112328355A (en) | 2021-02-05 |
CN112328355B CN112328355B (en) | 2024-04-23 |
Family
ID=74314814
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011073505.2A Active CN112328355B (en) | 2020-10-09 | 2020-10-09 | Adaptive optimal memory reservation estimation method for long-life container |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112328355B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160371108A1 (en) * | 2015-06-16 | 2016-12-22 | Vmware, Inc. | Reservation for a multi-machine application |
CN108415776A (en) * | 2018-03-06 | 2018-08-17 | 华中科技大学 | A kind of memory in distributed data processing system estimates the method with configuration optimization |
CN110390345A (en) * | 2018-04-20 | 2019-10-29 | 复旦大学 | A kind of big data cluster adaptive resource dispatching method based on cloud platform |
CN111176832A (en) * | 2019-12-06 | 2020-05-19 | 重庆邮电大学 | Performance optimization and parameter configuration method based on memory computing framework Spark |
CN111666149A (en) * | 2020-05-06 | 2020-09-15 | 西北工业大学 | Ultra-dense edge computing network mobility management method based on deep reinforcement learning |
-
2020
- 2020-10-09 CN CN202011073505.2A patent/CN112328355B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160371108A1 (en) * | 2015-06-16 | 2016-12-22 | Vmware, Inc. | Reservation for a multi-machine application |
CN108415776A (en) * | 2018-03-06 | 2018-08-17 | 华中科技大学 | A kind of memory in distributed data processing system estimates the method with configuration optimization |
CN110390345A (en) * | 2018-04-20 | 2019-10-29 | 复旦大学 | A kind of big data cluster adaptive resource dispatching method based on cloud platform |
CN111176832A (en) * | 2019-12-06 | 2020-05-19 | 重庆邮电大学 | Performance optimization and parameter configuration method based on memory computing framework Spark |
CN111666149A (en) * | 2020-05-06 | 2020-09-15 | 西北工业大学 | Ultra-dense edge computing network mobility management method based on deep reinforcement learning |
Non-Patent Citations (1)
Title |
---|
孟红涛 等: "Spark内存管理及缓存策略研究", 计算机科学, vol. 44, no. 6, pages 31 - 36 * |
Also Published As
Publication number | Publication date |
---|---|
CN112328355B (en) | 2024-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110737529B (en) | Short-time multi-variable-size data job cluster scheduling adaptive configuration method | |
US20220300812A1 (en) | Workflow optimization | |
US9262216B2 (en) | Computing cluster with latency control | |
Gupta et al. | PQR: Predicting query execution times for autonomous workload management | |
CN110390345B (en) | Cloud platform-based big data cluster self-adaptive resource scheduling method | |
CN110321222B (en) | Decision tree prediction-based data parallel operation resource allocation method | |
US11614978B2 (en) | Deep reinforcement learning for workflow optimization using provenance-based simulation | |
US20150363226A1 (en) | Run time estimation system optimization | |
CA3090095C (en) | Methods and systems to determine and optimize reservoir simulator performance in a cloud computing environment | |
CN111738434A (en) | Method for executing deep neural network on heterogeneous processing unit | |
CN111314120A (en) | Cloud software service resource self-adaptive management framework based on iterative QoS model | |
CN113157421A (en) | Distributed cluster resource scheduling method based on user operation process | |
CN109710372B (en) | Calculation intensive cloud workflow scheduling method based on owl search algorithm | |
CN112084035B (en) | Task scheduling method and system based on ant colony algorithm | |
CN115168027A (en) | Calculation power resource measurement method based on deep reinforcement learning | |
Li et al. | Weighted double deep Q-network based reinforcement learning for bi-objective multi-workflow scheduling in the cloud | |
Nascimento et al. | A reinforcement learning scheduling strategy for parallel cloud-based workflows | |
CN113641445B (en) | Cloud resource self-adaptive configuration method and system based on depth deterministic strategy | |
Baheri | Mars: Multi-scalable actor-critic reinforcement learning scheduler | |
CN112328355B (en) | Adaptive optimal memory reservation estimation method for long-life container | |
Ye et al. | Parameters tuning of multi-model database based on deep reinforcement learning | |
US20230004870A1 (en) | Machine learning model determination system and machine learning model determination method | |
Gao et al. | DBN based cloud service response time prediction method | |
CN115827225A (en) | Distribution method of heterogeneous operation, model training method, device, chip, equipment and medium | |
Sen et al. | Predictive price-performance optimization for serverless query processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |