CN117076106A

CN117076106A - Elastic telescoping method and system for cloud server resource management

Info

Publication number: CN117076106A
Application number: CN202310920219.2A
Authority: CN
Inventors: 温林峰; 徐敏贤; 叶可江; 须成忠
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2023-07-25
Filing date: 2023-07-25
Publication date: 2023-11-17

Abstract

The application relates to the field of cloud computing resource scheduling, in particular to an elastic expansion method and system for cloud server resource management. The method comprises the following steps: according to the elastic expansion method and the system for cloud server resource management, the load pressure line is generated in a random gradient descending and shifting mode, the load state of the load data is judged based on the load pressure line, and an applicable elastic expansion method is selected according to the judging result.

Description

Elastic telescoping method and system for cloud server resource management

Technical Field

The application relates to the field of cloud computing resource scheduling, in particular to an elastic expansion method and system for cloud server resource management.

Background

With development and popularization of cloud native technology, an elastic expansion strategy of a cloud server has become one of important indexes for testing load capacity of the cloud server. A good elastic expansion strategy should give consideration to resource saving and service performance, that is, guaranteeing user experience of cloud server users under the condition of using minimum resources. And the system is also robust under high varying loads. Currently, the main cloud server elastic expansion strategy mainly includes a passive elastic expansion strategy and an active elastic expansion strategy.

Passive elastic telescoping strategy: for cloud server online applications without obvious periodic variation rules, it is often difficult to predict the load of the next period a priori. The passive elastic expansion strategy is mainly to collect load data of the micro service in the last period, set a proper target threshold, and immediately trigger expansion and contraction operation once some indexes are found to exceed the set threshold, so as to maintain the stable state of the resource utilization rate. Currently, most cloud data centers use passive elastic scaling strategies, such as automatically expanding micro-services using Kubernetes (a container orchestration engine that is open to Google) developed by Google.

Active elastic telescoping strategy: aiming at the online application of the cloud server with obvious periodic variation rules, by collecting historical load data, carrying out mathematical modeling and portrayal on the load, predicting the load at the next moment and analyzing the resource demand, and timely distributing or recovering the corresponding resources, the method can well solve the delay problem existing in expansion and contraction capacity. The existing active elastic expansion strategy is divided into a prediction model based on machine learning, a prediction model based on deep learning and a prediction model based on reinforcement learning, such as Light-GBM, DNN, Q-learning and the like.

The adoption of the passive elastic expansion strategy leads to generally lower resource utilization rate and serious redundancy. For example, kubernetes favored by cloud service manufacturers at home and abroad, because of the simplistic horizontal telescoping design, can only be compared to calculate the number of required copies based on a real-time perceived load value and a predefined resource water level threshold. This approach lacks a risk management mechanism and is not suitable for industrial production. The passive elastic expansion strategy can only respond in real time, and service resource requirements at future time cannot be predicted, so that the workload change of the cloud server is always earlier than the elastic expansion adjustment of the cloud server, and even if the cloud server is vertically expanded and contracted with quick response time, a certain delay is unavoidable, and the delay can cause the problems of reduced service quality, increased SLA default rate and the like.

The problem that prediction errors can occur inevitably when an active elastic expansion strategy is adopted for automatic expansion and contraction, if prediction is low, the problems that the expansion is insufficient, the service quality is reduced, the SLA violation rate of a user is increased and the like are caused, and the response speed of the elastic expansion strategy based on the prediction is not as fast as that of the passive elastic expansion strategy when the elastic expansion strategy faces sudden traffic. And at the early stage of policy execution, it is difficult to efficiently perform resource management due to the lack of a large amount of historical load data for mathematical analysis.

Therefore, the prior art has a disadvantage and needs to be further improved.

Disclosure of Invention

The embodiment of the application provides an elastic expansion method and an elastic expansion system for cloud server resource management, which at least solve the technical problems that a passive elastic expansion strategy is adopted to cause lower resource utilization rate or an active elastic expansion strategy is adopted to cause prediction errors.

According to an embodiment of the present application, there is provided an elastic expansion method for cloud server resource management, including the following steps:

collecting load data of historical work;

generating a load pressure line in a random gradient descending and shifting mode, and judging a load state of load data based on the load pressure line, wherein the load state comprises a stable state and an unstable state;

selecting an elastic expansion strategy to allocate resources according to the current load state, wherein the elastic expansion strategy comprises a vertical expansion strategy and a horizontal expansion strategy;

based on the allocated resources, a specific horizontal expansion strategy or a specific vertical expansion strategy is executed through the cloud server, so that resource allocation management is realized.

Further, a load pressure line is generated in a random gradient descent and offset mode, a load state where load data are located is judged based on the load pressure line, and the load state comprises a stable state and an unstable state and further comprises:

carrying out data preprocessing on the load data, wherein the data preprocessing comprises deleting abnormal data, and calculating the average value of each parameter with the same time stamp;

and performing supervised learning conversion on the load data, wherein the supervised learning conversion is performed on the load data to obtain a supervised learning sequence with labels.

Further, according to the current load state, selecting an elastic expansion strategy to allocate resources, wherein the elastic expansion strategy comprises a vertical expansion strategy and a horizontal expansion strategy, and the vertical expansion strategy comprises:

performing resource allocation by adopting a differential vertical expansion strategy;

the resource allocation method adopting the vertical expansion strategy of the item-by-item difference comprises the following steps:

wherein C is _N For the resource allocation amount, the five coefficients of alpha, beta, gamma, delta and epsilon are respectively five times of resource usage data C collected in the last time window _L1 、C _L2 、C _L3 、C _L4 、C _L5 ρ represents a margin of resource allocation in vertical scaling;

the above five differential coefficients satisfy the relationship:

α+4λ＝β+3λ＝γ+2λ＝δ+λ＝ε

and lambda is a difference number set by cloud server management personnel according to requirements, lambda is set to 0, namely the resource allocation type of Kubernetes, and lambda is not a negative number.

a vertical expansion strategy based on load data prediction is adopted;

the vertical expansion strategy based on load data prediction is specifically as follows:

carrying out mathematical modeling and portrayal on load data by collecting the load data of historical work;

predicting load data at the next moment based on mathematical modeling and portrayal;

and analyzing the demand of the resources based on the load data at the next moment, and timely distributing or recovering the corresponding resources.

Further, based on mathematical modeling and portrayal, predicting load data at a next time is specifically:

and predicting load data at the next moment by adopting a distributed gradient lifting framework (LightGBM) based on a decision tree algorithm.

carrying out fine-granularity load data management by adopting a horizontal telescoping strategy and cooperating with vertical telescoping;

the load data management of fine granularity by adopting a horizontal telescoping strategy and cooperating with vertical telescoping is specifically as follows:

adopting a pre-configured horizontal telescoping strategy, carrying out fine-granularity resource management by combining with vertical telescoping, and carrying out resource allocation in the vertical direction at a preset decay rate after pre-configuring in the horizontal direction, so as to achieve the flexibility of resource adjustment and the saving of cost;

resource allocation is performed based on an exponential backoff method, which is as follows:

wherein C is _N For the allocation of resources, C _A And (3) for the total amount of the resource allocation after the horizontal expansion, phi is a base number in an exponential backoff algorithm, n is the number of rounds to be backed off, and t is different moments.

Further, based on the mathematical modeling and representation, predicting the next time load data includes:

and predicting load data by a gradient lifting frame method, and selecting a vertical expansion strategy or a horizontal expansion strategy to allocate resources according to a prediction result.

An elastic telescoping system for cloud server resource management, comprising:

the data collection module is used for collecting load data of historical work;

the load state judging module is used for generating a load pressure line in a random gradient descending and shifting mode, judging the load state of the load data based on the load pressure line, wherein the load state comprises a stable state and an unstable state;

the strategy selection module is used for selecting an elastic expansion strategy to allocate resources according to the current load state, wherein the elastic expansion strategy comprises a vertical expansion strategy and a horizontal expansion strategy;

and the resource scheduler is used for executing a specific horizontal expansion strategy or a specific vertical expansion strategy on the cloud server based on the allocated resources to realize resource allocation management.

A computer readable medium storing one or more programs executable by one or more processors to implement steps in a method of resilient scaling of cloud server resource management as in any of the above.

A terminal device, comprising: a processor, a memory, and a communication bus; a memory having stored thereon a computer readable program executable by a processor;

the communication bus realizes the connection communication between the processor and the memory;

the steps in a flexible method of cloud server resource management according to any of the above are implemented when a computer readable program is executed by a processor.

According to the elastic expansion method and the system for cloud server resource management, the load pressure line is generated in a random gradient descending and shifting mode, the load state of the load data is judged based on the load pressure line, and an applicable elastic expansion method is selected according to the judging result.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of an elastic telescoping method of cloud server resource management according to the present application;

FIG. 2 is a flowchart of an embodiment of an elastic expansion method for cloud server resource management according to the present application;

FIG. 3 is a key portion code of the present application;

FIG. 4 is a first test result of randomly intercepting a piece of data in a Kubernetes cluster according to the present application;

FIG. 5 is a second test result of randomly intercepting a piece of data in a Kubernetes cluster according to the present application;

FIG. 6 is a third test result of randomly intercepting a piece of data in a Kubernetes cluster according to the present application;

FIG. 7 is a fourth test result of randomly intercepting a piece of data in a Kubernetes cluster according to the present application;

FIG. 8 is a left graph of the application showing response times for various methods;

FIG. 9 is a graph of resource utilization for various methods of the present application;

FIG. 10 illustrates performance quantization scoring using a dynamic time adjustment algorithm in accordance with the present application;

FIG. 11 is a schematic diagram of an elastic expansion system for cloud server resource management;

FIG. 12 is a diagram of an embodiment of an elastic expansion system for cloud server resource management;

fig. 13 is a diagram of an inventive terminal device.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

According to an embodiment of the present application, an elastic expansion method for cloud server resource management is provided, referring to fig. 1, including the following steps:

s100: collecting load data of historical work;

s200: generating a load pressure line in a random gradient descending and shifting mode, and judging a load state of load data based on the load pressure line, wherein the load state comprises a stable state and an unstable state;

s300, selecting an elastic expansion strategy to allocate resources according to the current load state, wherein the elastic expansion strategy comprises a vertical expansion strategy and a horizontal expansion strategy;

and S400, executing a specific horizontal expansion strategy or a specific vertical expansion strategy through the cloud server based on the allocated resources to realize resource allocation management.

According to the elastic expansion method for cloud server resource management, the load pressure line is generated in a random gradient descending and shifting mode, the load state of load data is judged based on the load pressure line, and an applicable elastic expansion method is selected according to the judging result.

Specifically, the application provides a high-efficiency intelligent elastic expansion strategy for real-time adjustment of micro-service resources based on a machine learning algorithm, which mainly adopts a state detection method to analyze and judge different micro-services and execute different elastic expansion strategies according to analysis results. The application aims to solve the problem of resource management in a cloud server by utilizing the intelligent micro-service elastic telescoping strategy.

Specifically, the load state of the load data is judged according to the load pressure line. The pressure line in the embodiment can realize load adaptation based on parameter adjustment, and is a representation method of a pressure line function provided in the embodiment:

the pressure line is a linear function, and the general formula of the function of the pressure line in this embodiment is:

f(t)＝kt+b+αc _v

wherein k is the pressure line slope, t is the time argument, b is the pressure line constant term, c _v The discrete coefficient is the ratio of the standard deviation sigma of data to the corresponding average mu, and represents the discrete degree in a sample interval, and is also the reserved wide margin of a pressure line, and the larger the discrete degree is, the larger the allocated buffer space is, and the smaller the buffer space is on the contrary. Alpha is the adjustment parameter of the pressure line marginA number.

The determination of the pressure line slope and constant term can be done using a polynomial fit, given first a set of fit data:

(t _i ,Load _i ) Wherein: i=0, 1,2, m-1

The first order fitting function b+kt translates to a minimum problem of mean square error, which can be considered preferable if there is a set of fitting coefficients that minimizes e. The mean square error calculation formula is as follows:

then, the following two partial derivatives can be calculated for E respectively, and each partial derivative is set to be 0, so that k and b can be solved by simultaneous equations. The simultaneous equations are:

in addition, the value of the pressure line adjustment parameter needs to be determined, and the following problem definition is performed:

assuming that the load state determiner is not used for determining, there will be n times when the load is underestimated to reduce the performance, and the objective is to find the n times when the load is underestimated. If the load state judgment device is used for judging, the load is unstable at p times, namely potential load underestimation points. Ideally, the pi times are in one-to-one correspondence with the pi times. Assuming that M points in the Pi moments are in one-to-one correspondence with M points in the pi, the judging accuracy of the load state judging device is thatThe load underestimation point finding rate is +.>The judgment accuracy represents the accuracy of the load state judgment device, if the judgment accuracy is low, the system judges a plurality of unnecessary load points as potentialUnderestimating the points results in unnecessary waste of resources. The finding rate represents the finding probability of the real load underestimated point, if the finding rate is too low, the system cannot find most of the load underestimated points, resulting in performance degradation. The goal of the load state judging device is to ensure the finding rate of the underestimated load point at the same time>And judging accuracy->In a high position.

The choice of parameters will have a great influence on the result and therefore it is necessary to determine the appropriate parameters. Based on a large number of experiments, the finding rate and the accuracy rate are almost opposite in change trend, and in the state, the finding of the high-order state of the finding rate and the accuracy rate at the same time is difficult, and corresponding trade-offs should be made. The smaller the parameter, the higher the accuracy and the lower the find rate, which is suitable for aggressive resource allocation systems, and the partial performance is rejected in order to achieve higher resource utilization. The larger the parameter, the lower the accuracy and the higher the finding, which is the case for a conservative resource allocation system, and many resources are used in addition to achieve higher performance. Both the above cases are extreme, preferably the parameter α takes an intermediate value (20 < α < 40), and this value range comprehensively considers the influence on performance and the use of resources, and particularly when the finding rate is equal to the judging accuracy, the balance between the performance influence and the use of resources is achieved.

The step S100 specifically includes:

historical workload information, a resource usage collector Metrics Server built in the Kubernetes is used to collect historical workload data. Metrics Server is a Kubernetes cluster monitoring and performance analysis tool that can collect index data on nodes.

The step S200 specifically further includes:

s201: carrying out data preprocessing on the load data, wherein the data preprocessing comprises deleting abnormal data, and calculating the average value of each parameter with the same time stamp;

s202: and performing supervised learning conversion on the load data, wherein the supervised learning conversion is performed on the load data to obtain a supervised learning sequence with labels.

Abnormal data is deleted, then supervised learning conversion is carried out, and the data is converted into a supervised learning sequence with labels by using a time window, so that the prediction accuracy is improved.

The vertical expansion strategy has the advantage of fast response time, and for sudden load, the vertical expansion can be used for timely coping so as to reduce the influence caused by the reduction of service performance. Furthermore, using vertical scaling allows finer granularity of resource management for cloud servers. The vertical scaling strategy comprises a differential vertical scaling strategy and a vertical scaling strategy based on load prediction.

If the micro-service load is relatively stable, a more aggressive resource management method is selected to save resources, such as a machine learning method based on load prediction, namely LightGBM (Light Gradient Boosting Machine, lightweight gradient lifting machine learning). In contrast, if the micro-Service load is relatively unstable, a relatively conservative resource management method is adopted to ensure that an SLA (Service-Level agent) rule is not violated, such as a responsive elastic expansion policy adopting a difference-by-difference method.

The above all act on vertical expansion and contraction, and if the current load (exceeding a certain specific threshold value) is difficult to support by the resource in the vertical direction, a horizontal expansion and contraction strategy is triggered, so that the utilization rate of the resource is maintained within a certain range. In order to avoid shaking, a period of cooling time is also set to prevent frequent horizontal expansion and contraction from affecting service performance. The horizontal capacity expansion range is larger, and normally, after a period of operation, the service is switched to smooth operation, or the resources are excessive due to the fact that the peak value is over, so that after the horizontal capacity expansion, the vertical resource rollback fine adjustment is performed, the vertical resource limit is slowly lowered by adopting the proposed exponential rollback method, and the unused resources are recovered.

The step S300 specifically includes:

and resource allocation is carried out by adopting a differential vertical expansion strategy.

The vertical telescoping strategy of item-by-item difference is adopted: aiming at the online application of a cloud server without obvious periodic variation rules, the load of the next period is difficult to predict in advance, the passive elastic expansion strategy mainly collects the load data of the last period of the micro service, a proper target threshold is set, and once certain indexes are found to exceed the set threshold, expansion and contraction operations are immediately triggered, so that the stable state of the resource utilization rate is maintained.

Existing cloud data centers use popular tools and frameworks, such as automatically expanding micro services using Kubernetes developed by Google, which automatically expands micro services by monitoring historical resource usage C in a past time window _L Then allocating the resource allocation amount C of the next time window _N The method comprises the following steps:

C _N ＝C _L (1+ρ)

wherein ρ is a security coefficient flexibly set by a cloud server administrator according to requirements.

Obviously, the design of the Kubernetes telescopic strategy is too simple, and the required resources can be calculated only by comparing the real-time perceived load value with the predefined resource water level threshold, so that the utilization rate of the resources is generally low, the redundancy is serious, and the lack of a risk management and control mechanism in the way is not suitable for industrial production.

In the application, the resource allocation type of the vertical expansion strategy which is differentiated from item to item can alleviate the influence caused by insufficient supply and excessive supply of resources to a certain extent. For example, the sliding time window is set to five, then the resource allocation formula is:

wherein C is _N For the resource allocation amount, the five coefficients of alpha, beta, gamma, delta and epsilon are respectively 5 times of resource usage data C collected in the last time window _L1 、C _L2 、C _L3 、C _L4 、C _L5 Is included. The above five differential coefficients satisfy the relationship:

α+4λ＝β+3λ＝γ+2λ＝δ+λ＝ε

wherein λ is a difference value set by cloud server manager according to requirements, and λ is set to 0, namely, is a resource allocation of Kubernetes, λ is not set too large, if λ is set too large, the resource allocation and C will be caused _L5 Over-correlation, λ is not set to be negative, otherwise it would lead to resource allocation and C _L1 Strong correlation.

The step S300 specifically includes:

a vertical expansion strategy based on load data prediction is adopted;

s301: carrying out mathematical modeling and portrayal on load data by collecting the load data of historical work;

s302: predicting load data at the next moment based on mathematical modeling and portrayal;

s3021: carrying out load data prediction by a gradient lifting frame method, and selecting a vertical expansion strategy or a horizontal expansion strategy to carry out resource allocation according to a prediction result;

s303: and analyzing the demand of the resources based on the load data at the next moment, and timely distributing or recovering the corresponding resources.

A vertical telescoping strategy based on load prediction is adopted: aiming at the online application of the cloud server with the obvious periodic variation rule, the load is subjected to mathematical modeling and portrayal by collecting historical load data, the load at the next moment is predicted, the resource demand is analyzed, and corresponding resources are distributed or recovered in time, so that the problem of delay in expansion and contraction capacity can be well solved. The application adopts a distributed gradient lifting framework LightGBM (Light Gradient Boosting Machine, lightweight gradient lifting machine learning) based on a decision tree algorithm to carry out load prediction analysis. The LightGBM is an open source framework for gradient lifting, is one of frameworks for realizing GBDT algorithm, and supports efficient parallel training.

GBDT (gradient descent tree) is a model in machine learning, and the main idea is to use weak classifier (decision tree) to perform iterative training to obtain an optimal model, and the model has the advantages of good training effect, difficult overfitting and the like. GBDT requires traversing the entire training data multiple times at each iteration. If the whole training data is loaded into the memory, the size of the training data is limited; if the memory is not filled, the repeated reading and writing of the training data consumes a very large amount of time. Especially in the face of industrial-scale massive data, the common GBDT algorithm cannot meet the requirements.

The existing GBDT tool XGBoost (an optimized distributed gradient enhancement library) is a decision tree algorithm based on a pre-ordering method, and the algorithm for constructing the decision tree has obvious defects: firstly, the space consumption is large; secondly, there is also a greater consumption in time; finally, it is not friendly to optimize cache (cache). In order to avoid the defect of XGBoost and to accelerate the training speed of GBDT model without damaging accuracy, lightGBM is optimized on the conventional GBDT algorithm as follows: unilateral gradient sampling, mutual exclusion feature binding, leaf growth strategy with depth limitation, direct support of category features, support of efficient parallelism and optimization of Cache hit rate. After optimization, the LightGBM can perform model training with little time expenditure, obtain higher prediction accuracy, and enable the GBDT to be better and faster used in cloud load prediction practice.

According to the application, after the load prediction is performed by using the LightGBM, the relevant configuration of the resource can be performed in advance according to the load quantity, and the model of the LightGBM is periodically maintained and updated.

The step S300 specifically includes:

and carrying out fine-grained load data management by adopting a horizontal telescoping strategy and cooperating with vertical telescoping.

In particular, the horizontal telescoping strategy is simple and effective and thus widely used in production, but the technical level of the horizontal telescoping system has some drawbacks in design. Firstly, the horizontal scaling technique increases or decreases the number of service copies to achieve a target resource utilization state in a future period of time, and when the load suddenly and fluctuates, the efficiency is low, and oversupply or undersupply may be caused; second, the implementation of horizontal scaling requires a certain amount of time, and thus some cloud computing resource management systems need to seek to perform horizontal scaling operations before a load burst, which is extremely difficult. On the one hand, if the error judgment is horizontally expanded in advance, the resource utilization rate is low, and the economic benefit is poor; on the other hand, if horizontal expansion is not performed in time, service performance is reduced and even service is not available.

The application introduces a cooling period technology in the automatic scaling system, and does not execute the automatic scaling operation within a period of time after the last operation, thereby reducing adverse effects caused by system jolt and shake. Secondly, the horizontal expansion strategy in the application is used for carrying out fine-grained resource management in cooperation with the vertical expansion strategy, mainly adopts a pre-configuration method, combines the vertical expansion strategy to carry out fine-grained resource management, carries out capacity reduction operation in the vertical direction at a certain or preset decay rate after being pre-configured in the horizontal direction, and gives consideration to the flexibility of resource adjustment and the cost saving.

The technical scheme of the application is specifically described as follows:

referring to fig. 2, the main flow of the method of the present patent is as follows:

step one, collection of historical workload information or data: the data is collected using a resource usage collector Metrics Server built in Kubernetes.

Step two, data preprocessing: workload preprocessing, including deleting outlier data, calculating an average value for each parameter with the same timestamp, etc.

Step three, supervising the learning conversion: the data is converted into the supervised learning sequence with the labels by using the time window, so that the prediction accuracy is improved.

Step four, load data state judgment: and generating a load pressure line in a random gradient descending and shifting mode, and judging whether the load data is in a stable state or not.

Step five, a strategy selector: and selecting a proper elastic telescoping strategy according to the current state of the load, wherein the proper elastic telescoping strategy comprises a vertical telescoping strategy and a horizontal telescoping strategy. The main idea is seen in the policy selector pseudocode of fig. 3.

Load prediction: and carrying out load prediction by a machine learning method LightGBM, and configuring resources according to a prediction result.

Step six, responsive expansion: and resource allocation is performed through a vertical expansion strategy resource allocation formula which is differentiated item by item, so that the performance of the responsive elastic expansion strategy is improved.

Step seven, an optimizer: and executing a specific horizontal telescopic strategy and a specific vertical telescopic strategy on the cloud server.

The application provides a cloud server elastic telescoping strategy based on machine learning and state detection, which can effectively solve the problem of resource management of micro services in a cloud server cluster and improve the resource allocation utilization rate while guaranteeing that QoS is not lost.

In order to achieve the purpose, the method provided by the application detects the state of the load of the cloud server cluster, and then two different scheduling methods can be selected according to the detection result: active strategy, passive strategy, and two different scheduling modes are adopted: horizontally and vertically.

In addition, the application provides a differential resource allocation formula for carrying out responsive resource allocation item by item, and time labels of time sequence data are reasonably utilized to allocate different weights for historical load data at different moments, so that the responsive elastic expansion strategy has partial prediction effect. In addition, the application provides a horizontal stretching pre-configuration method, and performs resource shrinkage at a certain decay rate in cooperation with vertical stretching, so as to realize finer granularity resource management.

Compared with the prior art, the method adopts an elastic expansion strategy based on machine learning and micro-service state detection, and the algorithm utilizes the internal characteristics of a micro-service load time sequence diagram to perform state detection, judges whether the load at the current moment is difficult to predict, and then selects an applicable elastic expansion method according to a judging result. The method well integrates active and passive elastic expansion strategies, and compared with the traditional micro-service elastic expansion strategy, the method solves the problems that most active elastic expansion strategies are low in prediction accuracy and applicability, and the passive elastic expansion strategy can only respond in real time and cannot predict cluster resource requirements at future time. To a certain extent, the resource utilization rate can be improved while the QoS is not lost.

The present experiments used workload data sets from Alibaba cloud data centers. The load generator based on Locut (open source load testing tool) and the cloud server cluster scheduler based on a machine learning model are used, and several different scheduling algorithms commonly used in the current scheduling field are compared.

The elastic expansion strategy based on prediction is difficult to predict accurately when the load height changes, the larger buffer area setting can cause resource waste, and the smaller buffer area setting can cause frequent violation of SLA (Service-Level Agreement) rules. The load pressure line generated by random gradient descent can judge whether the current load is difficult to predict according to the load characteristics, and the elastic expansion strategy is converted in advance before the arrival of the load underestimation point, and the hit accuracy of the load underestimation point can reach 71.03% through test.

Fig. 4 to fig. 7 are graphs of experimental results of the method of the present application, comparing the vertical expansion performance of the method and Hyscale, showar, XGboost, and the experimental results are shown in fig. 4 to fig. 7.

FIG. 8 shows response times for each of the methods, comparing the maximum response time to the average response time for four different methods under the same load, the average response time for the method of the present application was 18.5% less than Hyscale, 21.42% less than Showa, and 15.38% less than XGboost.

FIG. 9 shows the resource utilization of each method, and the CPU utilization is compared when four different methods are executed under the same load, the method of the application has a small improvement compared with the responsive strategy, but adopts a conservative resource allocation strategy due to the unstable state of the load, and the resource utilization is reduced compared with the active strategy.

Fig. 10 is a quantitative comparison of the fit of the load using the DTW algorithm according to fig. 4-7. In fig. 10, a dynamic time adjustment algorithm is adopted to perform performance quantization scoring (the lower the score is, the better the method is), performance difference is visualized by performing performance evaluation through similarity of quantization curves, points on two curves are not in one-to-one correspondence, a certain offset exists, and the number of points is not generally the same, so that a common euclidean distance method cannot work, and a dynamic time adjustment algorithm (DTW algorithm) is more applicable and has better effect. Dynamic time adjustment algorithms (DTWs) are typically used to detect the similarity of two voices, which are not perfectly matched due to the different length of each letter pronunciation for each utterance, and stretch or compress the voices so that they are aligned as much as possible. The scoring result is: the method of the application is 41.6 minutes, hyscale84.0 minutes, showar70.3 minutes and XGboost55.3 minutes. The result proves that the method is superior to the existing method in the field of elastic expansion of the cloud server.

Example 2

According to another embodiment of the present application, there is provided an elastic expansion system for cloud server resource management, referring to fig. 11, including:

a data collection module 100 for collecting load data of historical work;

the load state judging module 200 is configured to generate a load pressure line in a random gradient descent and offset manner, and judge a load state in which the load data is located based on the load pressure line, where the load state includes a stable state and an unstable state;

the policy selection module 300 is configured to select an elastic expansion policy according to a current load state to perform resource allocation, where the elastic expansion policy includes a vertical expansion policy and a horizontal expansion policy;

the resource scheduler 400 is configured to execute a specific horizontal scaling policy or a specific vertical scaling policy on the cloud server based on the allocated resources, so as to implement resource allocation management.

In particular, the design objective of the present application is to provide an efficient and stable resource management system that can be applied by service providers to configure resources reasonably. Fig. 12 shows the main components of the present application, and fig. 3 shows the key part codes of this patent.

As shown in fig. 12, the system model of the present patent includes the following components: load generator, workload analyzer, cluster scheduler.

Load generator: the load generator uses the actual load data to simulate and can generate a series of HTTP access requests to verify the validity of the system. And selecting partial data from the Alibaba load set, preprocessing and converting the data, and performing pressure test by means of Locust.

Workload analyzer: the work load analyzer is used for analyzing load characteristics so as to adopt an optimal resource scheduling strategy, firstly, the characteristics of inaccurate prediction points are mined out on a load prediction graph, and then, the state of the current load is judged in a load pressure line mode. The pressure lines are calculated using regression and offset, and the relevant parameters can be dynamically adjusted and adapted.

Cluster scheduler: the cluster scheduler is responsible for receiving a resource scheduling instruction of the workload analyzer and acts on the cloud server cluster, so that the resource allocation is more reasonable. The scheduling strategy mainly comprises two types of vertical telescoping and horizontal telescoping.

According to the elastic expansion system for cloud server resource management, disclosed by the embodiment of the application, the load pressure line is generated in a random gradient descending and shifting mode, the load state of load data is judged based on the load pressure line, and an applicable elastic expansion method is selected according to the judging result.

Example 3

Based on the above-mentioned elastic expansion method for cloud server resource management, the present embodiment provides a computer readable storage medium storing one or more programs executable by one or more processors to implement the steps in the elastic expansion method for cloud server resource management according to the above-mentioned embodiments.

Example 4

A terminal device, comprising: a processor, a memory, and a communication bus; a memory having stored thereon a computer readable program executable by a processor; the communication bus realizes the connection communication between the processor and the memory; the steps in the elastic expansion method for cloud server resource management are realized when the processor executes the computer readable program.

Based on the above elastic expansion method of cloud server resource management, the present application provides a terminal device, as shown in fig. 13, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory) 22, which may also include a communication interface (Communications Interface) 23 and a bus 24. Wherein the processor 20, the display 21, the memory 22 and the communication interface 23 may communicate with each other via a bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may invoke logic instructions in the memory 22 to perform the methods of the embodiments described above.

Further, the logic instructions in the memory 22 described above may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand alone product.

The memory 22, as a computer readable storage medium, may be configured to store a software program, a computer executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 performs functional applications and data processing, i.e. implements the methods of the embodiments described above, by running software programs, instructions or modules stored in the memory 22.

The memory 22 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the terminal device, etc. In addition, the memory 22 may include high-speed random access memory, and may also include nonvolatile memory. For example, a plurality of media capable of storing program codes such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or a transitory storage medium may be used.

In addition, the specific processes that the storage medium and the plurality of instruction processors in the terminal device load and execute are described in detail in the above method, and are not stated here.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. The elastic expansion method for cloud server resource management is characterized by comprising the following steps of:

collecting load data of historical work;

generating a load pressure line in a random gradient descending and shifting mode, and judging a load state of the load data based on the load pressure line, wherein the load state comprises a stable state and an unstable state;

based on the allocated resources, the cloud server executes the specific horizontal expansion strategy or the vertical expansion strategy to realize resource allocation management.

2. The elastic telescoping method according to claim 1, wherein the generating a load pressure line by means of random gradient descent and offset, and determining a load state of the load data based on the load pressure line, wherein the load state includes a stable state and an unstable state, and further includes:

performing data preprocessing on the load data, wherein the data preprocessing comprises deleting abnormal data, and calculating an average value of each parameter with the same time stamp;

and performing supervised learning conversion on the load data, wherein the supervised learning conversion is used for converting the load data into a supervised learning sequence with labels through a time window.

3. The flexible method according to claim 1, wherein selecting a flexible policy for resource allocation according to the current load state, the flexible policy including a vertical flexible policy and a horizontal flexible policy includes:

performing resource allocation by adopting the vertical expansion strategy in a term-by-term difference mode;

the resource allocation formula adopting the vertical expansion strategy of item-by-item difference is specifically as follows:

the above five differential coefficients satisfy the relationship:

α+4λ＝β+3λ＝γ+2λ＝δ+λ＝ε；

4. The flexible method according to claim 1, wherein selecting a flexible policy for resource allocation according to the current load state, the flexible policy including a vertical flexible policy and a horizontal flexible policy includes:

a vertical expansion strategy based on the load data prediction is adopted;

the vertical expansion strategy based on the load data prediction is specifically as follows:

mathematically modeling and portraying said load data by collecting said load data for historical operation;

predicting load data at a next moment based on the mathematical modeling and representation;

5. The elastic telescoping method of claim 4, wherein said predicting load data at a next time based on said mathematical modeling and representation is specifically:

and predicting the load data at the next moment by adopting a distributed gradient lifting framework (LightGBM) based on a decision tree algorithm.

6. The method according to claim 4, wherein selecting an elastic scaling policy for resource allocation according to the current load state, the elastic scaling policy including a vertical scaling policy and a horizontal scaling policy includes:

carrying out fine-grained load data management by adopting the horizontal expansion strategy and the vertical expansion strategy;

the load data management of fine granularity by adopting the horizontal expansion strategy to cooperate with the vertical expansion is specifically as follows:

adopting the pre-configured horizontal expansion strategy, carrying out fine-granularity resource management by combining the vertical expansion, and carrying out resource allocation in the vertical direction at a preset decay rate after pre-configuration in the horizontal direction, thereby taking into account the flexibility of resource adjustment and the cost saving;

the resource allocation is performed based on an exponential backoff method, which is as follows:

7. The elastic telescoping method of claim 4, wherein said predicting next moment load data based on said mathematical modeling and representation comprises:

and predicting load data by a gradient lifting frame method, and selecting the vertical expansion strategy or the horizontal expansion strategy to allocate resources according to the prediction result.

8. An elastic expansion system for cloud server resource management, comprising:

the data collection module is used for collecting load data of historical work;

and the resource scheduler is used for executing the specific horizontal expansion strategy or the vertical expansion strategy on the cloud server based on the allocated resources to realize resource allocation management.

9. A computer readable medium, characterized in that the computer readable storage medium stores one or more programs executable by one or more processors to implement the steps in the elastic telescoping method of any of claims 1-7.

10. A terminal device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the communication bus realizes connection communication between the processor and the memory;

the processor, when executing the computer readable program, implements the steps of the elastic telescoping method of any of claims 1-7.