CN117971337A

CN117971337A - Hybrid cloud automatic configuration method based on LSTM model

Info

Publication number: CN117971337A
Application number: CN202311718289.6A
Authority: CN
Inventors: 王小乾; 潘晓东; 吴晓清; 李伟泽; 郭海燕
Original assignee: Tianyi Cloud Technology Co Ltd
Current assignee: Tianyi Cloud Technology Co Ltd
Priority date: 2023-12-14
Filing date: 2023-12-14
Publication date: 2024-05-03

Abstract

The invention relates to I T and software development technical field, in particular discloses a mixed cloud automatic configuration method based on an LSTM model, which is based on an LSTM neural network, acquires performance data under different parameters from monitoring data of a component, performs preprocessing operations such as cleaning, balancing, normalizing and the like, and uses the LSTM model to construct a prediction model so as to predict performance data probability of different parameters, thereby achieving the purpose of dynamic adjustment configuration. The method can dynamically adjust the parameters of the hybrid cloud component and take effect in real time, can continuously dynamically shrink the performance interval until the performance interval is optimal, and has a good effect on improving the resource utilization rate and the component performance.

Description

Hybrid cloud automatic configuration method based on LSTM model

Technical Field

The invention relates to the technical field of IT and software development, in particular to a hybrid cloud automatic configuration method based on an LSTM model.

Background

The hybrid cloud management platform supports unified scheduling management of multiple CPU architecture resource pools, provides abundant multi-calculation power for users, is widely compatible with 'one cloud and multiple cores', is safe and reliable, and keeps the unique advantages of integration, upgradeability, evolutionability and light operation and maintenance of the space wing cloud management product on the basis of abstraction and unification, and provides powerful kinetic energy for enterprise innovation. The hybrid cloud management platform is composed of a plurality of service components, the configuration files of the components are many, the configuration items are complex, and the resource occupation condition of each component is different. Therefore, how to dynamically adjust the configuration needs to be considered to improve the performance and resource utilization of the management platform. This is an important technical problem, and a suitable method needs to be found to solve the problem.

Disclosure of Invention

LSTM, collectively referred to as Long Short-Term Memory (Long Short-Term Memory), is a deep learning model commonly used to process sequence data. It can effectively capture long-term dependencies in a time series and generate predictions therefrom. Compared with the traditional Recurrent Neural Network (RNN), the LSTM has stronger memory capability and better noise immunity. The resource occupancy rate of the hybrid cloud component is predicted based on the LSTM model, so that the component configuration is dynamically adjusted, and the resource utilization rate and performance of the component are improved.

The invention aims to provide a hybrid cloud automatic configuration method based on an LSTM model, which comprises the following steps:

Step 1, collecting monitoring data of all components of a hybrid cloud platform, and constructing an LSTM training model through the monitoring data;

Step 2, obtaining a model prediction result according to the LSTM training model, and constructing different parameter sets and a performance interval set of the corresponding optimal performance interval according to the LSTM training model;

And 3, automatically configuring parameters of the component according to the model prediction result, the different parameter groups and the corresponding optimal performance interval sets.

Further, in the step 1, monitoring data of each component of the hybrid cloud platform is collected, and an LSTM training model is constructed according to the monitoring data, including:

collecting monitoring data of components in a hybrid cloud platform, and selecting an LSTM model suitable for the monitoring data according to the monitoring data; and training the LSTM model according to the monitoring data to obtain the LSTM training model.

Further, the monitoring data includes a log and a key index.

Further, the key indexes comprise a time stamp, a server name, a CPU utilization rate and a memory utilization rate.

Further, after preprocessing the monitoring data, constructing an LSTM training model by the preprocessed monitoring data;

and the preprocessing is to remove abnormal data in the monitoring data to obtain effective data, and then perform normalization processing after performing unified format processing on the effective data to obtain available monitoring data.

Further, the abnormal data is a negative value, a sudden high value and/or a sudden low value.

Further, in the step 2, obtaining a model prediction result according to the LSTM training model includes:

And predicting the LSTM training model by using a test set sample to obtain a prediction output, adding the prediction output into the test set sample, and continuing the next-round prediction by using the test set sample added with the prediction output until the model prediction result is obtained.

Further, in the step 2, obtaining a model prediction result according to the LSTM training model, further includes: and evaluating the model prediction result, and judging whether different parameter sets and corresponding optimal performance interval sets can be constructed by using the LSTM training model according to the evaluation result.

Further, the method for evaluating the model prediction result comprises the following steps: calculating MSE, RMSE and MAE values of the test set samples, calculating MSE, RMSE and MAE values predicted in the future, calculating R2 score of the test set samples, evaluating a predicted result, wherein the range of the value of the R2 score is between 0 and 1, and the closer the R2 score is to 1, the better the fitting degree of the model is, and the worse the fitting degree is otherwise.

Further, when the test result is that the fitting degree is good, the LSTM training model is judged to be capable of constructing different parameter sets and the corresponding optimal performance interval sets.

Further, in the step 2, a performance interval set of different parameter sets and the corresponding optimal performance intervals is constructed according to the LSTM training model, including:

And performing performance test on the components, limiting resources and resource scenes, performing performance test verification on the performance parameter combinations of the components, determining optimal parameter combinations according to the model prediction results, and storing the optimal parameter combinations in a configuration change area as preconditions for automatic configuration change of the components.

Further, the resources include, but are not limited to, CPU count and memory count.

Further, the step2 further includes automatically adjusting configuration parameters for the component according to the model prediction result, including:

defining resources and resource scenes by performing performance test on the components, performing performance test verification on the performance parameter combination of the components, determining an optimal parameter combination according to the model prediction result, and storing the optimal parameter combination in a configuration change area as a precondition for automatic configuration change of the components;

defining an automatic adjustment state identifier, wherein the automatic adjustment state identifier is used for marking whether an automatic adjustment component is required to define the automatic adjustment state identifier and is used for automatically adjusting success and/or failure information, and the automatic adjustment state identifier is 0 or 1; defining a parameter change monitor, which is used for monitoring the change of a parameter change area, and setting the automatic adjustment state identification symbol when the parameter change area has parameter storage; defining a component parameter reloading monitor, which is used for receiving the instruction of the parameter changing monitor and notifying the component reloading parameter;

And finishing configuration validation operation by configuring an automatic hot reload mechanism, marking the automatic adjustment state identifier as a result after parameter change, simultaneously collecting new monitoring data from a time point of updating the automatic adjustment state identifier, marking the new monitoring data to form a new monitoring data set, and resetting the mark number of the automatic adjustment state identifier to 0.

Further, the step 2 further includes automatically adjusting configuration parameters for the component according to the model prediction result, and further includes automatically adjusting an optimal performance interval of the parameters, specifically, continuously and dynamically shrinking the performance interval until the optimal performance interval is reached.

Further, the performance interval is continuously dynamically shrunk in a positive and negative step size mode until the performance interval is optimal.

Further, the dynamic contraction performance interval is continuously kept until the performance interval is optimal by adopting a positive and negative step length mode, and specifically comprises the following steps:

the default performance interval corresponding to the parameter combination is the first version, and the subsequent versions are sequentially increased; matching to the performance interval using the model prediction; and calculating the interval mean value and the contraction step length of the model prediction result, and judging the contraction direction of the performance interval according to the calculation result.

Further, in the step 3, the automatic configuration of the parameters of the component according to the model prediction result and the different parameter groups and the corresponding performance interval sets of the optimal performance intervals includes:

matching the model prediction result obtained by the LSTM prediction model in the performance interval set to obtain a matching result, and judging whether the matching result is consistent with the current parameters of the component;

when the judging result is consistent, the parameters of the components do not need to be automatically configured;

When the judging result is inconsistent, storing a change parameter combination into the parameter change area, and after the parameter change monitor monitors the change of the parameter change area, setting the automatic adjustment state identifier to be 1 and informing the component that the parameter change is required;

after the component parameter reloading monitor receives the change instruction, the automatic thermal reloading mechanism reloads to enable the new parameters to be effective;

After the new parameters take effect, judging whether the automatic updating is successful or not; when the automatic updating fails, resetting the automatic adjustment state identifier to 0, updating the automatic adjustment state identifier, recording a failure record and sending an alarm notification;

And when the automatic updating is successful, updating the automatic adjustment state identifier to be reset to 0, updating the automatic adjustment state identifier, and adding the automatic adjustment state identifier into the monitoring data set to form a new monitoring data set.

The invention has the advantages that:

The invention discloses a hybrid cloud automatic configuration method based on an LSTM model, which can accurately predict future performance trend of a component, process nonlinear, non-stable and multivariable relations and rapidly adapt to newly generated monitoring data through the LSTM model. The states of all stages are automatically adjusted through the automatic adjustment mark sign, the component parameter reloading monitor is introduced, the parameter changing monitor instruction is received, and the component is informed of completing the automatic hot reloading operation, so that the method is safe and efficient. And (3) performing automatic contraction optimization on the performance interval by adopting a positive and negative step size mode, and dynamically adjusting the performance interval corresponding to the component parameters by calculating the contraction amount of the left interval and the contraction amount of the right interval so as to achieve the optimal performance.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a schematic diagram of a scenario of a multi-component complex configuration to which the present invention is applicable;

FIG. 2 is one of the flowcharts of the hybrid cloud auto-configuration method based on the LSTM model of the present invention;

FIG. 3 is a second flowchart of the hybrid cloud automatic configuration method based on LSTM model of the present invention;

FIG. 4 is a third flowchart of the hybrid cloud auto-configuration method based on LSTM model according to the present invention.

Detailed Description

The technical scheme of the invention is further described below through the attached drawings and the embodiments.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention.

The invention provides a hybrid cloud automatic configuration method based on an LSTM model, which comprises the following steps:

It will be appreciated that the LSTM model performs well in processing time series data, which allows it to effectively use historical monitoring data to predict future performance. Such predictive capability may help enterprises configure resources more accurately in a hybrid cloud environment to cope with future load changes. Through the LSTM model, an enterprise may automatically configure individual components of the hybrid cloud. The method not only can reduce human errors, but also can automatically optimize configuration according to historical data so as to realize higher performance and efficiency. The method can also provide different parameter groups and corresponding performance interval sets of optimal performance intervals, and more configuration options are provided for enterprises. The enterprise can select the configuration most suitable for the enterprise according to the actual demand and the business target. Because the training and prediction processes of the LSTM model can be performed in a cloud environment, the method has good expandability. The system can be continuously improved along with the increase of monitoring data, and can also support a larger-scale mixed cloud environment. The method is not only suitable for the current mixed cloud environment, but also can be adjusted and expanded according to the needs so as to adapt to future demands and technical development. In general, the LSTM model-based hybrid cloud automatic configuration method can improve the efficiency and performance of enterprises and reduce human errors and operation cost.

In some embodiments of the present application, in the step 1, monitoring data of each component of the hybrid cloud platform is collected, and constructing an LSTM training model according to the monitoring data includes:

It can be appreciated that this approach can improve model accuracy because the LSTM model is trained from monitoring data that directly reflects the performance and status of the components of the hybrid cloud platform. The method can realize real-time monitoring and prediction. The state of the platform can be known in real time by collecting the monitoring data of each component of the hybrid cloud platform, and the future state can be predicted by utilizing the prediction function of the LSTM model, so that the hybrid cloud platform is better managed and optimized. The method realizes automation and intellectualization of the hybrid cloud platform. By collecting and analyzing the monitoring data, the LSTM model can be automatically trained and adjusted, thereby realizing intelligent prediction and management. This approach has high flexibility and scalability. Because the LSTM model can train for different monitoring data, the LSTM model can be easily expanded to different hybrid cloud platforms and environments. This approach can improve efficiency. By automatically and intelligently managing and predicting the state of the hybrid cloud platform, the need for manual intervention can be reduced, thereby improving overall efficiency. In general, the method for constructing the LSTM training model by collecting the monitoring data of each component of the hybrid cloud platform can improve the accuracy of the model, realize real-time monitoring and prediction, improve the efficiency and have high flexibility and expandability.

In some embodiments of the application, the monitoring data includes a log and a key indicator.

It can be appreciated that by considering both logs and key metrics, the performance and status of the hybrid cloud platform can be more fully understood. The log may provide detailed error information, abnormal conditions, and information of other critical events, while the key indicators may provide real-time performance data, which in combination enable the model to more accurately reflect the state of the hybrid cloud platform. By collecting the monitoring data containing the time stamp, the state of the hybrid cloud platform at a specific time point can be known in real time, and meanwhile, the future state can be predicted by utilizing the prediction function of the LSTM model. This real-time monitoring and predictive capability may help administrators better understand the performance of the hybrid cloud platform and optimize it accordingly. CPU and memory utilization are important indicators for evaluating the performance of the hybrid cloud platform. By incorporating these two metrics into the monitoring data and using them to train the LSTM model, the performance of the hybrid cloud platform can be more accurately predicted and optimized, especially when processing large amounts of data or complex tasks. Because the LSTM model can train for different monitoring data, the LSTM model can be easily expanded to different hybrid cloud platforms and environments. Meanwhile, if new monitoring indicators or log types need to be added, they need only be incorporated into the collected monitoring data and then the model is retrained. This allows for a high flexibility and scalability of the method.

In some embodiments of the present application, after preprocessing the monitoring data, constructing an LSTM training model according to the preprocessed monitoring data;

It can be appreciated that in some embodiments of the present application, after the monitoring data is preprocessed, an LSTM training model is built from the preprocessed monitoring data;

It will be appreciated that by preprocessing the monitored data, abnormal data, such as negative values, abrupt high values, or abrupt low values, may be cleared, which may negatively impact the accuracy of the model. The monitoring data subjected to effective data screening and unified format processing can more accurately reflect the performance and state of the hybrid cloud platform. The stability of the model can be improved by constructing an LSTM training model by using the preprocessed monitoring data. Abnormal data may cause problems with over-fitting or under-fitting of the model, while pre-processed monitoring data may reduce these problems, making the model more stable and reliable. The monitoring data after normalization processing has consistent scale, which is helpful to improve the training effect and the prediction accuracy of the LSTM model. The normalization process can convert the monitoring data of different scales to the same scale, so that the model can better capture the similarity and the relevance between the data.

In some embodiments of the present application, in the step 2, obtaining model prediction results according to the LSTM training model includes:

It will be appreciated that by using the LSTM model for prediction, more accurate prediction results may be obtained. The LSTM model is a deep learning model suitable for sequence data, and can effectively process time sequence data so as to more accurately predict the performance of the hybrid cloud platform. By adding the prediction output to the test set sample and using the added test set sample to perform the next round of prediction, the real-time prediction result update can be realized. The method can timely reflect the state change of the hybrid cloud platform and provide a more timely, accurate and effective management tool for an administrator. By multi-round prediction and stepwise updating of the prediction results, the reliability of the prediction can be improved. This approach may reduce errors and instabilities that may exist with a single prediction result, making the prediction result more reliable and reliable. By adding the prediction output to the test set sample, training data of the model can be continuously increased, and the prediction performance of the model can be improved. The method can realize extensible model training and prediction, and is suitable for the increasing data size and complexity requirements of the hybrid cloud platform.

In some embodiments of the present application, in the step 2, obtaining a model prediction result according to the LSTM training model further includes: and evaluating the model prediction result, and judging whether different parameter sets and corresponding optimal performance interval sets can be constructed by using the LSTM training model according to the evaluation result.

It can be appreciated that by evaluating the model prediction results, it can be determined whether the model can provide accurate predictions in the construction of different parameter sets and their corresponding optimal performance interval sets. If the prediction result of the model is not accurate enough, the model can be adjusted and optimized in time. By evaluating the model prediction results, the performance of the model in the construction of different parameter sets and the corresponding optimal performance interval sets can be known. According to the evaluation result, parameters can be adjusted or model structures can be optimized in a targeted manner so as to improve the performance of the model. By evaluating the prediction results of the model, it is possible to judge whether the model is stable and reliable. If the prediction of the model is not sufficiently stable or reliable, further analysis and improvement can be performed.

It will be appreciated that the MSE (mean square error), RMSE (root mean square error) and MAE (mean absolute error) values of the test set samples, as well as the MSE, RMSE and MAE values of future predictions, are calculated to assess the prediction accuracy and stability of the model. The R2 score (determinant coefficient) of the test set sample is calculated and is an important indicator for measuring the fitting degree of the model. The value range of R2 score is between 0 and 1, and the closer to 1 is the better the fitting degree of the model, and the worse the fitting degree is. When the test result is that the fitting degree is good, the LSTM training model can be judged to be capable of constructing different parameter sets and the corresponding optimal performance interval sets. This means that the model has a good generalization ability, can be adapted to different parameter combinations, and can be predicted and decided in different optimal performance intervals. In summary, the method can effectively evaluate the prediction accuracy and fitting degree of the LSTM training model, and can determine the optimal performance interval of different parameter groups, thereby providing a beneficial reference for practical application.

In some embodiments of the present application, in the step2, constructing a performance interval set of different parameter sets and their corresponding optimal performance intervals according to the LSTM training model includes:

It can be understood that by performing performance testing on the components, defining resources and resource scenarios, an optimal parameter combination can be found, so that the configuration of the components is optimized, and the overall performance and efficiency of the system are improved. And determining the optimal parameter combination according to the model prediction result, storing the optimal parameter combination in a configuration change area, and being capable of being used as a precondition for automatic configuration change of the component, realizing automatic configuration change and reducing the cost and error rate of manual intervention. Enhanced maintainability and scalability: the method in the embodiment of the application can be suitable for different components and systems, has wide application prospect, and can be conveniently expanded and maintained to adapt to continuously changing environments and requirements. In summary, the method in the embodiment of the application can effectively improve the performance and efficiency of the system, reduce the cost and error rate, enhance maintainability and expandability, and has important practical value and popularization significance.

In some embodiments of the present application, the step2 further includes automatically adjusting configuration parameters for the component by the model prediction result, including:

It will be appreciated that automated adjustment: and the configuration parameters of the components are automatically adjusted through model prediction results and performance test verification, so that the automation degree and efficiency of the system are improved. By defining the automatic adjustment state identifier, the parameter change monitor and the component parameter reload monitor, the real-time monitoring and response to the component parameter change are realized, and the stability and reliability of the system are improved. By configuring an automatic hot reload mechanism, the configuration effective operation is quickly completed, the response time of the system is reduced, and the user experience is improved. By collecting new monitoring data and forming a new monitoring data set, the running state of the system can be deeply analyzed, and data support is provided for further optimization. By performing performance test on the components, resources and resource scenes are limited, flexible parameter adjustment can be performed according to different scenes and requirements, and the adaptability and expansibility of the system are improved.

In some embodiments of the present application, the step 2 further includes automatically adjusting the configuration parameters for the component according to the model prediction result, and further includes automatically adjusting an optimal performance interval of the parameters, specifically, continuously shrinking the performance interval dynamically until the optimal performance interval.

It will be appreciated that it enables the system to self-adjust based on real-time or near real-time data and model predictions, thereby maintaining optimal performance. This approach is very beneficial for many systems and applications that require optimized performance, such as applications that require processing large amounts of data or complex calculations.

In some embodiments of the present application, the dynamic contraction performance interval is continued in positive and negative steps until optimal.

In some embodiments of the present application, the performance interval is continuously shrunk in a positive and negative step manner until the performance interval is optimal, specifically:

It will be appreciated that by continuing the dynamic contraction performance interval, the model may be made more stable during training and prediction, reducing the risk of over-fitting or under-fitting. The positive and negative step size mode allows bidirectional searching in the performance interval, so that the searching range can be enlarged to find possible better solutions, and the searching range can be reduced to refine the current optimal solution. The performance interval is contracted, so that the searching process is focused, unnecessary calculation and experiments are reduced, and the optimization efficiency is improved. By calculating the interval mean value and the contraction step length of the model prediction result, the performance of the model can be quantitatively evaluated, and data support is provided for subsequent decisions. And the contraction direction of the performance interval is judged according to the calculation result, so that the optimization process is more intelligent, the search direction is automatically adjusted, and the manual intervention is reduced. This strategy can be applied to a variety of different types of models and problems, with wide applicability. Through continuous iteration and optimization, the model can be more accurate and reliable in terms of prediction and decision. In conclusion, the strategy of dynamically shrinking the performance interval by adopting the positive and negative step sizes provides an effective method for the super-parameter optimization problem, and can improve the optimization efficiency while guaranteeing the performance of the model, so that the prediction result of the model is more accurate and reliable.

In some embodiments of the present application, in the step 3, automatically configuring parameters of the component according to the model prediction result and the different parameter groups and the performance interval sets of the corresponding optimal performance intervals, including:

It can be appreciated that the method of the embodiment can automatically adjust the parameters of the components, reduce the requirement of manual intervention, and simultaneously reduce the configuration problem caused by human errors. By matching the LSTM prediction result with the performance interval set, the optimal parameter configuration can be found, so that the performance of the system is improved. Real-time monitoring and response: when the parameter change fails, the failure can be immediately recorded and the alarm notification is sent, so that the problem can be timely found and solved, and the stable operation of the system is ensured. Flexibility and extensibility: the method can be easily applied to different components and systems, and can be expanded and adjusted as required. Through automatic configuration and monitoring, the dependence on professionals can be reduced, thereby reducing the labor cost. The method of the embodiment can improve the performance and stability of the system, reduce the maintenance cost and improve the working efficiency.

A preferred embodiment of the application:

A mixed cloud automatic configuration method based on an LSTM model comprises the following steps:

1. Collecting monitoring data of each component of hybrid cloud platform

And collecting monitoring data which need training, including logs and key indexes.

The key index is characterized in that the CPU utilization rate represents the ratio of the time that the CPU is processing work to the total time; memory utilization represents the ratio of the amount of memory used to the total amount of memory; the ratio of the actual usage of the disk to the disk capacity; network bandwidth represents the amount of data transmitted per second; the database connection number represents the number of client connections that the database can accept at most simultaneously.

Further, indexes such as response time, request success rate and the like can be collected.

2. Cleaning and preprocessing collected monitoring data

The monitoring data collected in the step 1 are cleaned and preprocessed, noise and unnecessary information are removed, and the monitoring data are converted into a text data set which can be used for training, and the method mainly comprises the following steps: removing repeated data, removing null values, removing abnormal values, converting data formats, normalizing data, and normalizing the monitored data by using a Z-Score normalization method.

3. Training monitoring data by LSTM model

According to the type of the monitoring data and the characteristics of the monitoring data, selecting an LSTM model to be used, defining the LSTM model by using a deep learning framework (such as TensorFlow, pyTorch and the like), and constructing a training model by using an input layer, an intermediate layer, an output layer and other frameworks.

And adjusting the super parameters according to the training result of the monitoring data, and gradually optimizing the performance of the model.

And predicting the monitoring test data set by using the trained LSTM model, transmitting the prepared test set sample data into the LSTM model to obtain a prediction output, adding the prediction output into the test set, and continuing the next prediction.

The prediction results of the model are further evaluated to measure the performance of the model. The selection indexes are as follows: MSE, MAE, RMSE and R square, firstly calculating MSE, RMSE and MAE values of a test set, then calculating MSE, RMSE and MAE values predicted in the future, calculating R2 score of the test set, evaluating a predicted result, wherein the range of the predicted result is between 0 and 1, the closer to 1, the better the fitting degree of the model is, and the worse the fitting degree is, otherwise.

4. Automatic adjustment of configuration parameters for components by model prediction results

And setting a threshold according to the model prediction result, automatically adjusting component parameters, and continuously inputting the adjusted data as prediction data to continuously learn and optimize. According to different characteristics of the components, parameters which influence the performance of the components and can realize the effect of hot loading are selected, and the effects can be realized in real time.

4.1 Constructing different parameter sets and corresponding optimal Performance intervals

First, by performing performance test on the component, it is characterized in that resources are defined, including but not limited to the number of CPUs and the number of memories, and in a specified resource scenario, performance test verification is performed on the performance parameter combination of the component, and the storage structure is shown in table 1:

TABLE 1

And determining the optimal performance interval of each group of parameters according to the performance test result, and storing the parameter combination and the performance interval into the storage structure.

And 4.2, determining an optimal parameter combination according to the predicted result in the step 3, and storing the optimal parameter combination in a configuration change area to serve as a precondition for automatic configuration change of the component.

Setting the CPU utilization rate of the next timing period of the predicted_data as PRE_CPU_USE and the memory PRE_MEN_USE, and simultaneously meeting the following conditions:

CPU_USE_MIN≤PRE_CPU_USE≤CPU_USE_MAX；

MEN_USE_MIN≤PRE_MEN_USE≤MEN_USE_MAX；

and the CPU_USE_MIN, the CPU_USE_MAX and the MEN_USE_MIN and the MEN_USE_MAX correspond to the same group of performance parameters.

Performance parameters satisfying the above conditions are stored in the parameter change area.

4.3 Definition of automatic adjustment signpost

An automatic adjustment flag is defined to mark whether an automatic adjustment component parameter alter_tag is required, values of 0 and 1,0 indicating no automatic adjustment, and 1 indicating automatic adjustment.

The automatic adjustment state identifiers { ALTER_STATUS, ALTER_MSG, ALTER_TIME }, wherein the ALTER_STATUS values are 0 and 1, wherein 0 indicates that the automatic adjustment failed, 1 indicates that the automatic adjustment was successful, ALTER_MSG is character type, the information of the success and failure of the automatic adjustment is recorded, and the ALTER_TIME records the change end TIME.

Defining a monitor data update identifier ALTER_PARAM_TIME, wherein the format is TIME for recording the effective TIME of the automatic adjustment parameter, and starting from the TIME, the monitor data is new data after the new parameter is effective, forming a new monitor data set, and continuing to enter the step 3 for training.

4.4 Setup listener mechanism listens for changes in parts

Defining a parameter change monitor, monitoring the change of a parameter change area, and setting an automatic adjustment mark symbol when the parameter change area stores parameters.

Defining a component parameter reload monitor, receiving a parameter change monitor instruction, and informing the component of the reload parameter.

4.5 Completing the configuration validation operation by configuring the automatic Hot overload mechanism

Firstly, a parameter change monitor monitors the change of a parameter change area, sets an automatic adjustment mark symbol defined by 4.1 as 1, and informs a component of the need of parameter change.

Secondly, after the component parameter reloading monitor receives the parameter changing monitor instruction, the states of all parts of the component are checked, after the reloading condition is met, the parameter to be changed is reloaded through a component automatic heating reloading mechanism to be effective, and the content of the automatic adjustment state identifiers (ALTER_STATUS, ALTER_MSG and ALTER_TIME) is updated.

And finally, marking the automatic adjustment state identifier value defined by 4.1 as a result after parameter change, simultaneously updating the monitoring data update identifier ALTER_PARAM_TIME to form a new monitoring data set, and resetting the automatic adjustment flag sign to 0.

5. Continuing the dynamic contraction performance interval until optimum

The continuous tracking evaluation process is an iterative and optimized process, new monitoring data is collected from the TIME point of the monitoring data update identifier ALTER_PARAM_TIME defined in the step 4.3 according to the parameters adjusted in the step 4, and the steps 1-3 are repeated as a training data set to obtain more accurate prediction data.

More preferably, based on the storage structure of 4.1, a parameter performance interval version is introduced, and along with the improvement of the prediction accuracy of new monitoring data.

For the information intervals of the same group of PARAM parameters corresponding to a plurality of versions, the version numbers contain time sequence information, and the more recent versions, the more accurate the performance interval, so that the automatic adjustment of the parameter performance interval is realized. Specifically, a positive and negative step size mode is adopted to optimize the performance interval.

Firstly, defining a performance interval version, setting a shrinkage step parameter S, marking the range of each interval shrinkage, setting a shrinkage limit to Slimit as minus/plus/minus% 5 for avoiding excessive shrinkage, and setting a shrinkage limit interval as [ PCU _version-Slimit,PCU_version + Slimit ] when PCU _version represents a predicted value of the utilization rate of a version CPU;

if the time interval is not contracted again, the calculation formula of the step S is as follows:

Wherein CUA _version represents the upper limit of CPU utilization corresponding to the version performance interval, CUI _version represents the lower limit of CPU utilization corresponding to the version performance interval, S _right represents the right-hand interval shrinkage of the version performance interval, and S _left represents the left-hand interval shrinkage of the version performance interval.

If S _right >0, then the performance interval right interval contracts CUA _version＝CUA_version-S_right;

If S _left >0, then the performance interval left interval contracts CUI _version＝CUI_version+S_left;

Therefore, a new performance interval can be obtained and stored in a new version space, and the optimal performance interval determination of the automatic adjustment parameters is realized by iterating and contracting until the contraction limit.

FIG. 1 is a schematic diagram of a scenario of complex configuration of multiple components, where the log and performance data of the multiple components are collected together, input into an LSTM model for performance data prediction, and dynamically adjust component configuration parameters according to the prediction result, and the following steps are described in detail.

Fig. 2-4 show flowcharts of the hybrid cloud automatic configuration method based on the LSTM model, which comprises the following steps:

step 1, collecting monitoring data of all components of the hybrid cloud platform

In this example, the following mixed cloud component monitoring part sample data example is adopted, and important indexes related to component performance are collected, wherein the important indexes comprise a timestamp, a server name, a CPU utilization rate and a memory utilization rate, and are shown in table 2:

TABLE 2

Step 2, cleaning and preprocessing the data set

FIG. 2 shows a flow chart of the monitoring data cleaning and preprocessing according to the present invention, comprising the steps of:

step 2.1, judging whether the data set has repeated data, in this example, no repeated data, and entering the next step.

Step 2.2, judging whether the data set has null value or missing data, in this example, if the data set has null value or missing data, and entering the next step.

Step 2.3, judging whether abnormal data exists, in this example, the memory occupancy rate of the second row server2 is negative, the abnormal cpu occupancy rate of the seventh row server1 needs to be removed, and part of sample data after removal is shown in table 3, for example:

TABLE 3 Table 3

Time stamp	Server name	CPU utilization	Memory utilization
				2021-10-0110:00:01.325	server1	28.2％	64.5％
2021-10-0110:00:02	server3	12.8％	55.2％
				2021-10-0110:01:00	server1	31.4％	64.8％
2021-10-0110:01:00	server2	20.1％	79.8％
				2021-10-0110:01:00	server3	15.6％	56.7％
2021-10-0110:02:00	server2	19.7％	80.3％
				2021-10-0110:02:00	server3	13.9％	57.4％
2021-10-0110:03:00	server1	29.9％	63.2％
				2021-10-0110:03:00	server2	17.7％	81.2％

Step 2.4, performing unified format processing on the data, and converting the time format into timestamp numbers, wherein the time format is converted by using datetime libraries in Python:

from datetime import datetime

timestamp_str＝"2021-10-0110:00:00"

timestamp＝datetime.strptime(timestamp_str,"％Y-％m-％d％H:％M:％S")

Converting time format into time stamp number, i.e. number of seconds 0 minutes 0 seconds from 1 month 1 day 0 point 1970

import time

timestamp_num＝int(time.mktime(timestamp.timetuple()))

For the server name, the server is removed uniformly and replaced by a numerical number.

CPU and memory utilization are represented by floating point numbers, and example processed sample data are shown in Table 4:

TABLE 4 Table 4

Time stamp	Server name	CPU utilization	Memory utilization
				1633053600	1	0.282	0.645
1633053600	3	0.128	0.552
				1633053660	1	0.314	0.648
1633053660	2	0.201	0.798
				1633053660	3	0.156	0.567
1633053720	2	0.197	0.803
				1633053720	3	0.139	0.574
1633053780	1	0.299	0.632
				1633053780	2	0.177	0.812

Step 2.3 normalization of monitoring data using Z-Score

The Z-score has the advantages of simple calculation, easy understanding, retaining the original data distribution characteristics, eliminating dimension influence, being convenient for detecting abnormal values and the like, and is prepared according to the formula of the average value mu and the standard deviation sigma:

/>

Normalization processing is performed on the sample example data, and the results shown in table 5 are obtained:

TABLE 5

Time stamp	Server name	CPU utilization	Memory utilization
				-1.3553436926138287	1	1.0758071581665287	-0.2503626871522986
-1.3553436926138287	3	-1.2359272933355012	-1.1775908692163433
				-0.4170288293569923	1	1.5561675636734447	-0.22045210063410356
-0.4170288293569923	2	-0.14010511827285033	1.2750772252756464
				-0.4170288293569923	3	-0.8156119385169502	-1.0280379366253691
0.5212860338998442	2	-0.2001501689612148	1.3249282028059715
				0.5212860338998442	3	-1.070803403942499	-0.9582465680829141
0.5212860338998442	1	1.3309986235920777	-0.3799752287311437
				0.5212860338998442	2	-0.5003754224030372	1.4146599623605565

Grouping according to the server to obtain each group of data is as follows: examples of server 1 data are shown in table 6:

TABLE 6

Time stamp	CPU utilization	Memory utilization
			-1.3553436926138287	1.0758071581665287	-0.2503626871522986
-0.4170288293569923	1.5561675636734447	-0.22045210063410356
			0.5212860338998442	1.3309986235920777	-0.3799752287311437

Examples of server 2 data are shown in table 7:

TABLE 7

Time stamp	CPU utilization	Memory utilization
			-0.4170288293569923	-0.14010511827285033	1.2750772252756464
0.5212860338998442	-0.2001501689612148	1.3249282028059715
			0.5212860338998442	-0.5003754224030372	1.4146599623605565

Examples of the server 3 data are shown in table 8:

TABLE 8

Time stamp	CPU utilization	Memory utilization
			-1.3553436926138287	-1.2359272933355012	-1.1775908692163433
-0.4170288293569923	-0.8156119385169502	-1.0280379366253691
			0.5212860338998442	-1.070803403942499	-0.9582465680829141

Step 3 training the monitoring data by LSTM model

Step 3.1 model construction and training

Aiming at the monitoring data, the method is characterized by comprising a CPU, a memory and a time stamp, wherein three characteristics of the CPU occupancy rate, the memory occupancy rate and the time stamp in each time slice are used as input modes, and are converted into an input format of an LSTM network.

And constructing an LSTM network structure, adopting two layers of LSTM, setting the number of LSTM neurons of each layer to be 64, and adding a ReLU activation function behind each layer of LSTM to increase the fitting capacity of the network.

The input and output layers are set, wherein the input layer takes the CPU and the memory utilization rate as input characteristics respectively, and takes the time stamp as another input characteristic, so that the time sequence property in the data can be better described. Thus, the input tensor shape of the LSTM network should be (data, timesteps, features), where data represents the number of samples, timesteps represents the length of the time series, features represents the number of features per time slice, i.e., features = 3. And the prediction index of the output layer comprises CPU utilization rate and memory utilization rate. Thus, in the output layer of the LSTM network, two neurons may be used to represent predicted CPU and memory usage, respectively. The shape of the output tensor should be (2).

The following is an example of keras building an LSTM model:

# definition LSTM network model

model＝Sequential()

model.add(LSTM(units＝64,input_shape＝(timesteps,2)))

model.add(Dense(units＝32,activation＝'relu'))

model.add(Dense(units＝2,activation＝'linear'))

# Compiling model

model.compile(optimizer＝'adam',loss＝'mse')

Training model #

model.fit(X_train,y_train,epochs＝10,validation_data＝(X_test,y_test))

In the example code described above, in the input layer of the LSTM network, only the first two features are used as inputs to the LSTM network. In the output layer of the LSTM network, two neurons are used to represent predicted CPU and memory usage, respectively. In model training, MSE is used as a loss function and Adam as an optimizer.

Step 3.2 model prediction

After model training is completed, the trained LSTM model is used for predicting future CPU and memory utilization rate by using new monitoring data.

And reading new monitoring data, preprocessing the new monitoring data in the same way as training data, splitting the new monitoring data tensor into CPU and memory utilization rate according to input characteristics, and calling predict methods of the trained LSTM model to obtain a predicted time sequence.

Step 3.3 model evaluation

Further evaluation of the model's predicted results is required to measure the performance of the model. The common indexes are as follows: average square error (MSE), average absolute error (MAE), average square root error (RMSE), and R square (decision coefficient), etc. The Mean Absolute Error (MAE) represents the average of the absolute differences between the actual and predicted values in the dataset, which is a measure of the average of the residuals in the dataset. The Mean Square Error (MSE) represents the average of the squares of the differences between the original and predicted values in the data set, and is a measure of the variance of the residual. Root Mean Square Error (RMSE) is the square root of the root Mean Square Error (MSE), which measures the standard deviation of the residual. The decision coefficient (R square) represents the proportion of the dependent variable that is interpreted by the linear regression model. The lower the values of MAE, MSE and RMSE, the higher the accuracy of the regression model. However, higher R square values are considered more explanatory of the model.

In the example, firstly, MSE, RMSE and MAE values of a test set are calculated, then, MSE, RMSE and MAE values predicted in the future are calculated, R2 score of the test set is calculated, a predicted result is evaluated, the range of the predicted result is 0 to 1, the closer to 1, the better the fitting degree of the model is, and the worse the fitting degree is indicated.

Step 4, automatically adjusting configuration parameters for the component, fig. 3 shows a flow chart of automatically adjusting parameters for the component, and an example is given below based on the flow.

After the evaluation index is better, we can use predict () method to predict future CPU and future memory usage, and as a result, the predicted_data sample is as follows:

[{'timestamp':'2022-01-01 00:00:00','cpu_usage':35.23,'memory_usage':72.18},

{'timestamp':'2022-01-01 00:01:00','cpu_usage':37.16,'memory_usage':70.87},

{'timestamp':'2022-01-01 00:02:00','cpu_usage':39.01,'memory_usage':69.12}]

step 4.1 constructing different parameter sets and corresponding optimal performance intervals

First, by performing performance testing on a component, taking a web application as an example, the following parameter performance result set examples are shown in table 9:

TABLE 9

The above data are merely examples and the final component performance data is determined by the performance test results.

And 4.2, determining an optimal parameter combination according to the predicted result in the step3, and storing the optimal parameter combination in a configuration change area to serve as a precondition for automatic configuration change of the component.

Setting the next timing period CPU utilization of the predicted_data prediction to be pre_cpu_use=37%, and memory pre_men_use=70%, when the following conditions are satisfied at the same time:

CPU_USE_MIN≤PRE_CPU_USE≤CPU_USE_MAX

MEN_USE_MIN≤PRE_MEN_USE≤MEN_USE_MAX

And the same group of performance parameters are corresponding to the cpu_use_min, the cpu_use_max, the men_use_min and the men_use_max, the parameter combination of example 2 is satisfied, the current performance parameter is set as the parameter combination of example 1, and since the performance interval corresponding to the parameters has a change, automatic update is required, that is, processes=4 and threads=4 are stored in the parameter change area.

Step 4.3 automatic adjustment of heavy load component parameters

After the parameter change monitor monitors the change of the parameter change area, an automatic adjustment flag is set to 1, the component is informed that the parameter change is required, after the component parameter reload monitor receives a change instruction, an automatic hot reload mechanism reloads the new parameter to be effective, the embodiment is implemented by using a reload mode of uwsgi, if the reload is successful, the automatic adjustment flag is reset to 0, an automatic adjustment state identifier { alter_status=1, alter_msg= ", alter_time=" 2022-01-0100:02:00"}, a monitoring data update identifier alter_param_time is updated, and the TIME is based on the TIME when the first processing request is received after the reload, and a new batch of performance data is calculated from the latest request.

If the reload fails, the automatic adjustment flag sign is reset to 0, the automatic adjustment STATUS identifier { ALTER_STATUS=0, ALTER_MSG= "failurereason", ALTER_TIME= "2022-01-0100:02:00" }, and then an alarm notification is sent to the operation and maintenance personnel for intervention and troubleshooting.

Step 5 the dynamic contraction performance interval is continued until the optimum, FIG. 4 shows a flow chart of the dynamic contraction performance interval according to the present invention, and an example is given below in conjunction with the flow chart

In this example, a VERSION identifier is newly defined, the designated format is cpu_men_time, taking the data of example 2 in step 4.1 as an example, the default VERSION is 4_16_20220101, and since the minimum unit of the timing period is a day, the date of VERSION is accurate to the day. Taking the predicted CPU average value of 0.37 as an example, the CPU occupancy right interval contraction value of the default version of example 2 of 4_16_20220101 is calculated as:

Since both the values of S _right and S _left are greater than 0, the left and right sections shrink simultaneously to obtain a new performance section, wherein:

CUA_{4_16_20220105}＝CUA_{4_16_20220101}-S_right＝0.65-0.115＝0.535；

CUI_{4_16_20220105}＝CUI_{4_16_20220101}+S_left＝0.15+0.085＝0.235。

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims

1. The automatic configuration method of the hybrid cloud based on the LSTM model is characterized by comprising the following steps of:

2. The method for automatically configuring hybrid cloud based on LSTM model according to claim 1, wherein in step 1, monitoring data of each component of the hybrid cloud platform is collected, and constructing an LSTM training model by using the monitoring data includes:

3. The LSTM model-based hybrid cloud auto-configuration method of claim 2, wherein the monitoring data includes a log and key metrics.

4. The hybrid cloud automatic configuration method based on the LSTM model according to claim 3, wherein after preprocessing the monitoring data, constructing an LSTM training model by the preprocessed monitoring data;

5. The method according to claim 4, wherein in the step 2, obtaining model prediction results according to the LSTM training model comprises:

6. The method for automatically configuring hybrid cloud based on LSTM model according to claim 5, wherein in step 2, a model prediction result is obtained according to the LSTM training model, further comprising: and evaluating the model prediction result, and judging whether different parameter sets and corresponding optimal performance interval sets can be constructed by using the LSTM training model according to the evaluation result.

7. The method of claim 6, wherein in step 2, constructing a performance interval set of different parameter sets and their corresponding optimal performance intervals according to the LSTM training model comprises:

8. The LSTM model-based hybrid cloud automatic configuration method according to claim 7, wherein the step 2 further includes automatically adjusting configuration parameters for the component by the model prediction result, including:

9. The hybrid cloud automatic configuration method based on the LSTM model according to claim 8, wherein the step 2 further includes automatically adjusting configuration parameters for the component according to the model prediction result, and further includes automatically adjusting an optimal performance interval of the parameters, specifically, continuously shrinking the performance interval until the optimal performance interval by adopting a positive and negative step size manner;

The method adopting positive and negative step length continuously and dynamically shrinking the performance interval until the performance interval is optimal comprises the following specific steps:

10. The hybrid cloud automatic configuration method based on the LSTM model according to claim 9, wherein in the step3, the automatic configuration of the parameters of the component according to the model prediction result and the different parameter groups and the performance interval set of the corresponding optimal performance interval includes: