Disclosure of Invention
In order to overcome the defects of the prior art, the resource consumption characteristics of the micro-service at each stage in the life cycle of the micro-service need to be researched, and a scheduling mechanism capable of realizing agile deployment of the large-scale micro-service is designed, so that the system resources are fully utilized, the utilization rate of the system resources is maximized, and the cost of a cloud service provider is reduced.
The invention discloses a micro-service scheduling method, which comprises the following steps:
acquiring a preset number of Pod to be scheduled from a Pod queue to be scheduled;
adopting a preset scheduler to generate a scheduling strategy according to the resources requested by each Pod to be scheduled and the resources allocable by each candidate node, wherein the scheduling strategy comprises the candidate nodes allocated to each Pod to be scheduled; the resources requested by the pod to be scheduled comprise a memory and a CPU requested by the pod; the resources which can be allocated by the candidate node comprise the memory and the CPU which can be allocated by the candidate node and the number of the bearable pod;
and taking the candidate node distributed by each Pod to be scheduled in the scheduling strategy as a corresponding working node and scheduling the candidate node to the working node for operation.
Preferably, the scheduler is obtained by:
generating training data through a pre-designed simulator, wherein the training data comprises resources requested by the Pod to be scheduled, resources allocable by each candidate node and a scheduling strategy under corresponding conditions;
and training a sequencer by using the training data, and constructing a scheduler based on the trained sequencer.
Preferably, the generating a scheduling policy by using a preset scheduler according to the resource requested by each Pod to be scheduled and the resource allocable by each candidate node includes:
and using the scheduler to take the occupation ratio of the CPU and the memory in the resource requested by each Pod to be scheduled, the occupation ratio of the CPU and the memory in the resources of all candidate nodes and the quantity of the loadable pods as the input of the scheduler, and taking the corresponding output as the corresponding scheduling strategy.
Preferably, the sorter is a pair sorter.
Preferably, the method further comprises scaling the scheduled microservices, wherein the scaling comprises:
acquiring resource consumption parameters of the micro-service in current operation according to a preset time interval; the resource consumption parameters comprise consumption values of all instances of the currently running micro-service under preset telescopic target indexes;
predicting a resource consumption value in a future time period by adopting a resource consumption prediction model according to the acquired resource consumption parameters;
and generating a scaling decision according to the relation between the predicted resource consumption value in the future time period and a preset resource consumption threshold value and a preset scaling target index and executing.
Preferably, the resource consumption prediction model is obtained by training a BI-LSTM model or an XGboost model through machine learning. Preferably, the resource consumption prediction model is obtained by training an XGBoost model through machine learning.
Preferably, the number of the preset telescopic target indexes is 1 or more, and each preset telescopic target index is provided with a corresponding resource consumption threshold and a corresponding resource consumption prediction model;
the generating a scaling decision according to the relationship between the predicted resource consumption value in the future time period and the preset resource consumption threshold value and the preset scaling target index comprises:
acquiring the number of instances of the currently running micro-service as the current total amount;
for each telescopic target index, respectively calculating the total number of the telescopic examples according to the predicted resource consumption value and a preset resource consumption threshold value, and taking the maximum value as the total target number after telescopic operation;
when the current total amount is smaller than the target total amount, carrying out capacity expansion operation;
when the current total amount is larger than the target total amount, carrying out capacity reduction operation;
and when the current total amount is equal to the target total amount, performing telescopic operation.
The invention also provides a micro-service scheduling device based on sequencing, which comprises:
the device comprises a first module, a second module and a third module, wherein the first module is used for acquiring a preset number of Pod to be scheduled from a Pod queue to be scheduled;
a second module, configured to generate a scheduling policy by using a preset scheduler according to resources requested by each Pod to be scheduled and resources allocable by each candidate node, where the scheduling policy includes the candidate nodes allocated to each Pod to be scheduled; the resources requested by the pod to be scheduled comprise a memory and a CPU requested by the pod; the resources which can be allocated by the candidate node comprise the memory and the CPU which can be allocated by the candidate node and the number of the bearable pod;
and the third module is used for taking the candidate node distributed by each Pod to be scheduled in the scheduling strategy as a corresponding working node and scheduling the candidate node to the working node for operation.
Preferably, the micro-service scheduling apparatus further includes a fourth module for scaling the scheduled micro-service, including:
the system comprises a first unit, a second unit and a third unit, wherein the first unit is used for collecting resource consumption parameters of the micro-service in current operation according to a preset time interval; the resource consumption parameters comprise consumption values of all instances of the currently running micro-service under preset telescopic target indexes;
the second unit is used for predicting the resource consumption value in the future time period by adopting a resource consumption prediction model according to the acquired resource consumption parameters;
and the third unit is used for generating a scaling decision according to the relation between the predicted resource consumption value in the future time period and a preset resource consumption threshold value and a preset scaling target index and executing the scaling decision.
The present invention also provides an electronic device, comprising:
one or more processors;
a storage device for storing one or more programs, which when executed by the one or more processors, cause the one or more processors to implement the above-described base microservice scheduling method.
According to the micro-service scheduling method, the scheduled number of the Pods to be scheduled are obtained from the Pod queue to be scheduled each time for batch scheduling, the scheduling efficiency is greatly improved, the implementation is simple, the resource occupancy rate is low, quick and efficient large-scale micro-service scheduling deployment can be realized, batch scheduling is carried out simultaneously, the occupation condition of a plurality of Pod resources can be effectively considered in one scheduling process, and the resource waste is greatly avoided.
Detailed Description
The present invention will be further described with reference to the following examples. The following examples are set forth merely to aid in the understanding of the invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
The micro-service scheduling module comprises a micro-service scheduling module and a micro-service scaling module. The quality of the micro-service scheduling scheme directly determines the effectiveness of resources and influences the utilization rate of system resources. When the micro-service scheduling mechanism is unreasonable, the situation that one resource in the node is exhausted and the other resource is remained is easy to occur, so that the resource waste is caused, and the utilization rate of the system resource is reduced. And due to over-scheduling of the micro-services, the nodes cannot tolerate the change of the workload well, and resource competition and service interruption are easy to generate.
The work flow of the existing scheduling method is as follows:
when a new Pod is created, a Scheduler (Scheduler) running on the master node in the kubernets cluster is responsible for allocating the newly joined Pod to run on the appropriate working node. When a new Pod is deployed, the deployment file of the Pod indicates the minimum required Resource (Requested Resource) and the maximum available Resource (Limited Resource) Requested by all containers contained in the Pod, and the sum of the resources Requested by all the containers is used as the final Resource request value of the Pod. And the scheduler takes the least resource required to be met requested by the Pod as a scheduling basis in the scheduling process.
The scheduling mechanism of the existing scheduling method can be described as follows:
acquiring Pod of the unallocated node from the APIServer, and adding the Pod into a queue to be scheduled (Unscheduled queue);
taking out a Pod from the queue to be scheduled for scheduling operation;
and screening all candidate nodes according to the resources requested by the Pod and the available resources of each node, namely the available resources of the nodes are not less than the sum of the resources requested by all containers in the Pod. All nodes meeting the conditions will be selected for the next step.
And evaluating the applicability of each node by using a priority function, and sequencing all the nodes in a descending order according to the evaluation result.
And allocating the Pod to the node with the highest priority in the candidate nodes for running.
Because the balance among the resources in the nodes and the balance among the resources in the cluster are not fully considered in the scheduling method in the existing platform, the resources of the nodes in the cluster are not fully utilized, so that not only is the resources wasted, but also the number of the nodes operated in the cluster is indirectly increased, and the operation cost is increased.
Aiming at the defects of the prior art, the intelligent micro-service scheduling mechanism based on sequencing provided by the invention takes the optimization goals of improving the resource utilization rate, reducing the number of running nodes in a cluster and reducing the running cost as well as realizes the intelligent micro-service scheduling with optimal resources from two aspects of a scheduling mode and a scheduling principle.
As shown in fig. 1, in an embodiment of the present invention, a method for scheduling micro services includes the following steps:
acquiring a preset number of Pod to be scheduled from a Pod queue to be scheduled;
adopting a preset scheduler to generate a scheduling strategy according to the resources requested by each Pod to be scheduled and the resources allocable by each candidate node, wherein the scheduling strategy comprises the candidate nodes allocated to each Pod to be scheduled; the resources requested by the pod to be scheduled comprise a memory and a CPU requested by the pod; the resources which can be allocated by the candidate node comprise the memory and the CPU which can be allocated by the candidate node and the number of the bearable pod;
and taking the candidate node distributed by each Pod to be scheduled in the scheduling strategy as a corresponding working node and scheduling the candidate node to the working node for operation.
Wherein the scheduler is obtained by:
generating training data through a pre-designed simulator, wherein the training data comprises resources requested by the Pod to be scheduled, resources allocable by each candidate node and a scheduling strategy under corresponding conditions;
and training a sequencer by using the training data, and constructing a scheduler based on the trained sequencer.
Further, the generating a scheduling policy by using a preset scheduler according to the resource requested by each Pod to be scheduled and the resource allocable by each candidate node includes:
and using the scheduler to take the occupation ratio of the CPU and the memory in the resource requested by each Pod to be scheduled, the occupation ratio of the CPU and the memory in the resources of all candidate nodes and the quantity of the loadable pods as the input of the scheduler, and taking the corresponding output as the corresponding scheduling strategy.
Further, in order to solve the problem that the solution time of a scheduling scheme of a model-based scheduling mechanism is too long in a large-scale scene, the invention provides a batch scheduling mechanism based on sequencing, a superior scheduling scheme is rapidly calculated by using Pairwise Ranker (for a sequencer), various resources on nodes with different resource proportions are fully utilized on the premise of ensuring the service quality, the number of nodes running in a cluster is reduced, and the running cost of a cloud service provider is reduced.
All components related to cluster management are operated on a main node of the cluster, wherein the components comprise a scheduler which realizes a sequencing-based batch scheduling algorithm and a queue for storing Pod to be scheduled. And all the pods related to the workload (traffic) are run on the working node. And when the resources of the working nodes in the cluster are insufficient, starting a new node from the node pool, adding the new node into the working node cluster, and bearing the working load.
As shown in fig. 2, the scheduler acquires p Pod to be scheduled from the Pod to be scheduled queue each time, calculates a scheduling scheme by combining resources (CPU and memory) requested by the p Pod and allocable resources (CPU, memory, and number of loadable pods) of the current working node, and then allocates each Pod to be scheduled to a corresponding node according to the scheduling scheme for operation.
Supposing a Pod queue Q to be scheduled, a candidate node set N and a batch capacity p0And finishing the scheduling of all the Pod in the Pod queue Q to be scheduled by the scheduling method. The scheduling method of the embodiment based on the sorting scheduling method comprises the following steps:
firstly, get p from the Pod queue Q to be scheduled0Storing the Pod to be scheduled in the set P, when the Pod number is less than P0If so, all the parts are taken out. Initializing an empty set P', extracting the name P of each Pod in the set PnameRequested CPUpcpuAnd requested memory pmemStored into the set P'. The sequencer calculates the scheduling scheme of each Pod in P 'by taking the Pod set P' to be scheduled and the candidate node set N as input, and returns a corresponding result to be stored in AL. And finally, scheduling each Pod to a corresponding node for operation according to the scheduling scheme in the AL, and removing the corresponding Pod from the Pod queue Q to be scheduled.
If a certain Pod fails to be scheduled due to hardware, network, and the like in a batch of scheduling, the Pod is usually added to the Pod queue Q to be scheduled again in the existing scheduling method, and the subsequent scheduling is waited. However, since the invention adopts batch scheduling, if all the Pod of the batch performs rollback operation, the overall scheduling progress will be affected. Since the Pod failed in scheduling does not actually occupy the cluster resource, the scheduling method of this embodiment adopts a method of rescheduling the Pod to solve this problem. The scheduler calculates the optimal scheduling scheme according to the latest state of the cluster nodes during the next scheduling, so that the influence caused by the Pod scheduling failure can be relieved to the maximum extent.
The invention provides a scheduling method which adopts a sequencer to calculate a scheduling scheme. Because the realization of agile and efficient large-scale micro-service scheduling deployment has higher requirements on the complexity and the speed of model calculation, the invention adopts a pair of sequencers (Pairwise Ranker) as a core model of a scheduler. Compared with a point sorter (Pointwise Ranker) and a list sorter (Listwise Ranker), the method can effectively reduce the complexity of the model and the calculation time for the sorters.
When a scheduling scheme of a Pod is calculated, the sequencer selects two nodes from the candidate nodes each time for comparison, and calculates the probability that one node is superior to the other node according to the current resources to obtain the partial order relationship of the two nodes. And after the comparison of all the nodes one by one is completed, obtaining the ordering relation of the node complete set, wherein the node with the highest rank is the node to be scheduled by the Pod. Calculating two nodes (N) according to
equation 1
j,N
k) Partial order relationships, i.e. regression functions f
wResource R requested by Pod to be scheduled
podAnd resources currently allocable by two nodes
And
the fitness of the two nodes to the Pod is calculated for the input.
In order to obtain an accurate and efficient sequencer model, the work flow shown in fig. 3 is adopted, firstly, a simulator generates millions of pieces of training data with labels, then the sequencer model is trained offline by using the data, and finally the sequencer model is integrated into a scheduler to perform online calculation of a scheduling scheme.
When model training is performed, it is particularly important to have rich and comprehensive training data. The invention designs a sequencer simulator (namely a simulator), which can generate scheduling schemes under different resource configurations according to scheduling strategies. See, for example, Zheng Chen, Heng Ji. collagen transmission: A case study on the entity linking [ C ]// Proceedings of the Conference on electronic Methods in Natural Language processing. Association for computerized diagnostics, 2011: 771-781. The simulator generates million data training pair sequencer models, firstly determines the ranges of a CPU and a memory according to different CPU and memory resource proportions, and then randomly generates input data of million data. And then inputting the data into a simulator to obtain a corresponding node sequencing result as output data. And finally, integrating the input data and the corresponding output data to form final million sequencer model training data, wherein the data format of the million sequencer model training data is shown in fig. 4.
In order to improve the resource utilization rate and reduce the number of online nodes, the scheduling strategy adopted by the invention is to schedule the Pod on the node with the ratio of the distributable CPU and memory resources closest to the ratio according to the ratio of the CPU and memory resources requested by the Pod, thereby fully utilizing the resources on the node and minimizing the number of the online nodes in the cluster.
The scheduler of the embodiment is developed in the Go language, runs in Pod on the main node of the cluster as a system-level component, and is compatible with other schedulers in the cluster. When the scheduler is deployed, the Pod needs to be scheduled with the Namespace (Namespace) of the scheduler set to kube-system to obtain system-level operating authority. When the micro-service is deployed, if the scheduler wants to use, the name of the scheduler needs to be indicated in the configuration file of the micro-service, so that the scheduler can identify the corresponding micro-service to complete the scheduling operation.
After the micro-service is successfully deployed in the cluster and starts running, it will be able to handle the workload, i.e. the service request of the user. However, in a real scene, the workload is not always constant, and as described in the investigation report of Akamai, the workload applied in the real scene fluctuates within one day and several days. Therefore, if the number of microservice instances in a cluster remains the same, there is a risk of resource contention and service interruption as the workload increases; when the workload is low, the resources are idle and wasted, so that the utilization rate of the resources is reduced, and the cost of a cloud service provider is increased.
In order to solve the problem caused by the fluctuation of the workload, in another preferred embodiment of the present invention, the method for scheduling microservice further includes scaling the scheduled microservice.
The quality of the micro service scale will directly affect the system resource utilization and the quality of the service provided. When the workload is increased, too early expansion easily causes resource idling and waste, and too late expansion easily causes resource competition and service interruption; when the workload is reduced, too early expansion easily causes resource competition, which affects the service quality, and too late expansion causes resource waste, which reduces the resource utilization rate. The existing passive scaling mechanism based on a statistical method has the problem of scaling lag, resource competition and service interruption are easy to generate, while the observation-based semi-automatic scaling mechanism excessively depends on artificial observation to make scaling decision, and the labor cost is additionally increased.
Akamai's discloses in 2014online holistic shopping pendends and traffic reporting Kuberenetes that a Horizontal scaling mechanism is mainly realized by HPA (Horizontal Pod Auto-scaler), which specifically comprises the following steps:
firstly, the monitoring indexes are specified, namely, the expansion and contraction are carried out by taking the indexes as judgment bases. For example, current CPU utilization, memory utilization, or number of requests arriving per second, etc.;
the expected value of the monitoring index, i.e. in which state the index is expected to be maintained, is specified. For example, it is desirable that the CPU utilization of a microservice instance be 50% or that the number of requests processed per second be 100, etc.;
the data collector collects the value of each monitoring index in the cluster in real time, and the HPA calculates according to the current value (CurrentMetricS value) and the expected value (DesirdMetricS value) of the monitoring index to make a decision whether to stretch or not. The calculation is shown in equation 2. If a plurality of monitoring indexes are adopted at the same time, each monitoring index is calculated respectively, and the maximum value is used as the final scaling decision basis.
And 4, step 4: if the HPA decision is not flexible, maintaining the number of the existing micro-service instances unchanged; if the HPA decision is scaling, then a corresponding number of existing microservice instances are further deployed or deleted.
The scaling mechanism in HPA is a threshold-based passive scaling mechanism, i.e., the scaling mechanism is triggered to execute only after the actual value of the monitored indicator exceeds the expected value for a period of time. The hysteresis of scaling decisions will put the system at risk for resource contention and service interruption over a period of time. The cloud service provider is even at risk of paying penalties for violating the user's QoS.
The current telescoping mechanism has the following problems: (1) a threshold-based passive scaling mechanism is prone to cause resource contention, putting the system at risk of service interruption; (2) the semi-automatic telescoping mechanism based on observation cannot completely solve the problem of telescoping lag, and increases the labor cost and the probability of human error.
Aiming at the problems of the existing expansion mechanism, the invention aims to realize the full-automatic expansion of the micro-service and combines the active prediction of the monitoring index to complete the advanced expansion of the micro-service instance. The flexible mechanism ensures the service quality of users, improves the utilization rate of cluster resources and reduces the labor cost and the operation cost of cloud service providers.
The telescoping method of the embodiment comprises the following steps:
acquiring resource consumption parameters of the micro-service in current operation according to a preset time interval; the resource consumption parameters comprise consumption values of all instances of the currently running micro-service under preset telescopic target indexes;
predicting a resource consumption value in a future time period by adopting a resource consumption prediction model according to the acquired resource consumption parameters;
and generating a scaling decision according to the relation between the predicted resource consumption value in the future time period and a preset resource consumption threshold value and a preset scaling target index and executing.
Specifically, collecting resource consumption parameters of the micro-service in current operation; the resource consumption parameter indexes comprise instance ID, time, CPU consumption value and memory consumption value of the currently running micro service; predicting a resource consumption value in a future time period by adopting a resource consumption prediction model according to the acquired resource consumption parameters;
and generating a scaling decision according to the relation between the predicted resource consumption value in the future time period and a preset resource consumption threshold value and a preset scaling target index, and executing the scaling decision, wherein the scaling decision comprises scaling operation and time for executing the scaling operation.
The preset telescopic target indexes are 1 or more than 1, and each telescopic target index is correspondingly provided with a corresponding resource consumption threshold and a corresponding resource consumption prediction model;
generating a scaling decision according to a preset scaling target index according to the relationship between the predicted resource consumption value in the future time period and a preset resource consumption threshold value:
acquiring the number of instances of the micro-service in current operation as the current total amount;
for each telescopic target index, respectively calculating the total number of the telescopic instances under each telescopic target index according to the predicted resource consumption value and a preset resource consumption threshold value, and taking the maximum value as the total target amount after telescopic operation;
when the current total amount is smaller than the target total amount, carrying out capacity expansion operation;
when the current total amount is larger than the target total amount, carrying out capacity reduction operation;
and when the current total amount is equal to the target total amount, performing telescopic operation.
Specifically, after a predicted value of resource consumption is obtained through presetting, an elastic expansion decision process based on prediction is shown as an algorithm 2. The input of the algorithm is a resource consumption threshold T preset for the micro-service SsAnd a resource consumption prediction value Ps. The output is the scaling decision of the micro-service S. Firstly, acquiring the number n of instances of the current micro-service S in the clusterc. Then, using the CPU resource as the target index of scaling, the total number n of the scaled instances is calculated by formula 3dCpu, in the same wayCalculating total number n of examples after expansion and contraction by taking storage resources as target indexesdMem. Taking the larger value between the two as the total number n of the finally scaled examplesd(lines 2-4). If n isd>ncThen, it indicates that the capacity expansion operation should be performed, i.e. deployment in the cluster (n)d-nc) An instance of a micro service S; if n isd<ncThen it is indicated that a capacity reduction operation should be performed, i.e. a deletion in the cluster (n)c>nd) An instance of a micro service S; if n isd=dcAnd then, the number of the instances of the micro-service S in the cluster is kept unchanged without scaling operation. The elastic scaling algorithm based on prediction can effectively reduce the probability of resource competition and ensure the service quality.
And according to the historical data, predicting the resource consumption of the microservice in a period of time in the future, namely predicting the time sequence. Currently, time series prediction has been well studied in the field of machine learning. The related work Fisher compares 3 most popular prediction models of Bidirectional Long Short-Term Memory (BI-LSTM), Auto Regressable Integrated Moving Average (ARIMA) and Long Short-Term Memory (LSTM), and the invention selects BI-LSTM as the resource consumption prediction model. The resource consumption prediction model is obtained by training a BI-LSTM model or an XGboost model through machine learning. Because the XGboost has the advantage of fast convergence, has excellent performance in time series prediction and can predict the resource use condition changing along with time, the XGboost is preferably adopted as a resource consumption prediction model in the invention.
Under the condition that two telescopic target indexes are assumed to be a CPU resource and a memory resource, respectively training a BI-LSTM model and an XGboost model to compare the prediction effects of the BI-LSTM model and the XGboost model on resource consumption. The training set and test set of both models were 5:1 in size ratio. The result shows that the resource use condition predicted by the XGboost model in the future period is more consistent with the real resource use condition.
When a BI-LSTM model is trained, relu is used as an activation function of an LSTM module, an Adam optimizer is used as a target optimization function, a mean Square error mse (mean Square error) is used as an error calculation mode, that is, parameters of activity ═ relu, optizer ═ Adam, loss ═ mse, batch _ size ═ 16 and epoch ═ 200, and values of the rest parameters are default values.
When the XGboost model is trained, the XGboost model parameters can be divided into general parameters, model parameters and learning task parameters. When XGboost model training is carried out, general parameters are set as boost, silent and nthread as 1 and 5; the loss function in the learning task parameters adopts mean square error MSE, the object parameter reg is squarerror, and the evaluation index parameter eval _ metric of the check data is rmse; the specific values of the learning rate parameter eta influencing the learning effect of the model and the parameter max _ depth determining the maximum depth of the tree in the model parameters are obtained through Bayesian Optimization (Bayesian Optimization), and other parameters use default values. If Bayesian optimization is used to automatically adjust parameters, the total training time of the model will be increased, and the specific time of the increase will depend on the size of the training set. Furthermore, in order to prevent the number of model iterations from entering overfitting too much, the present invention sets early _ stopping _ counts to 10, i.e., stops the iteration when the value of the loss function does not drop for 10 consecutive times.
The preset resource consumption prediction model is usually deployed in a system after the existing data offline training is completed before the micro-service is deployed, but due to the problem of the source of the training data, a large amount of training data for training the resource consumption prediction model cannot be obtained before the micro-service is deployed, and at the moment, the method can be carried out in a cold start mode, and specifically, the method can comprise the following three methods:
the method comprises the following steps: before deploying the micro-service, historical resource consumption data of the application is obtained through offline operation or developer supply and the like, and a special resource consumption prediction model is trained for the micro-service by utilizing the historical data. After the micro-service deployment is operated, the special resource consumption prediction model is used for prediction, and the expansion and contraction operation is carried out by combining with the expander. The mode scaling can achieve better effect because the prediction result is more accurate.
The method 2 comprises the following steps: since the micro-service is not provided with a dedicated resource consumption prediction model before being deployed, after the micro-service is deployed and operated, the general resource consumption prediction model corresponding to the type of application is used for prediction. And after enough historical resource consumption data are collected and the special resource consumption prediction model is trained, replacing the resource consumption prediction model with the special resource consumption prediction model. The scaling effect of this mode before using the dedicated resource consumption prediction model is reduced.
The method 3 comprises the following steps: similar to method 2, micro-service scaling may be accomplished using a passive scaling mechanism before training the dedicated resource consumption prediction model. Resource contention and service interruption due to scaling lag may occur during this period. This phenomenon will gradually disappear after using the dedicated resource consumption prediction model.
Since elastic scaling of the microservice is an option configured additionally after the microservice is deployed and operated, and the prediction effects of different resource consumption models are slightly different, the elastic scaling device has multiple modes to select. Resource consumption trends vary for different types of applications, such as flat, oscillating, periodic, etc. Thus, the present invention provides 3 corresponding generic resource prediction models for a typical class 3 application. If the resource prediction accuracy is more demanding, a dedicated resource consumption prediction model may be trained for the application to obtain a higher prediction accuracy.
For microservice S, developers set a threshold T for CPU resource consumption of microservice S at deployments. As the workload increases, the resource consumption increases. The comparative graph of the effect of expansion and contraction in the following 3 modes is shown in fig. 5:
the first mode is as follows: without any scaling operation, as can be seen in FIG. 5, the resource consumption value is at t1The moment reaches the threshold value TsAnd continues to increase. This will cause resource contention and even service interruption.
And a second mode: by using passive scaling mechanisms, as can be seen in FIG. 5, the resources of the same microservice SSource consumption value at t1Time of day reaching threshold TsHowever, the passive scaling mechanism scales when the average value of the resource consumption in the past period exceeds the threshold value according to the scaling principle, so that the passive scaling mechanism scales after tsAnd (4) making a scaling decision and executing scaling operation at the moment. After adding the micro-service instance to share the workload, the resource consumption value of the micro-service S is decreased, and at t2After the moment of time, falls to the threshold value TsThe following. The whole telescoping process still causes the system to have a period of time (t)2-t1) Is in a dangerous state.
And a third mode: with the prediction-based elastic scaling mechanism of the present invention, as can be seen in fig. 5, the resource consumption module will make a prediction based on historical resource consumption data of the microservice S. At tasThe resource consumption of the microservice S received at the moment will be at t1Exceeds a threshold T after a momentsWill be at tasAnd making a scaling decision in advance and executing scaling operation. Since there are more microservice instances in the cluster sharing the continuously increasing workload, the resource consumption value of the microservice S will not exceed the threshold TsTherefore, service interruption caused by resource competition is avoided, and the system is ensured to be in a safe state for a long time.
The method is carried out by a data collector through which the resource consumption parameters of the micro-service in current operation are collected, and the data collector can collect static index data and system data in operation. The data used by the invention is run-time system data, and specifically comprises instance ID, time, CPU and memory consumption conditions of all nodes, and CPU and memory consumption conditions of all Pod. The data acquisition interval can be flexibly set. The data acquisition interval of the present invention is set to 30 seconds.
In practical applications, both the public data set and the data collector provided by the present invention, in order to increase the utility value of the collected data, the provided data is generally comprehensive and contains many prediction-independent items, such as node names, locations, etc. The invalid data filtering proposed by the invention refers to filtering and removing prediction irrelevant items in data. The target index set in this embodiment includes a CPU consumption value and a memory consumption value. Correspondingly, in the embodiment, when resource consumption prediction is performed, data on which the prediction model depends includes an instance ID, time, a CPU consumption value, and a memory consumption value. The data items irrelevant to the filtering are invalid data to be filtered and rejected.
When the data collector provided by the invention is used for collecting data, the collected data unit is related to the actual numerical value. The units of the CPU consumption value can be n and m according to the value size, and the units of the memory consumption value can be Mi and Ki, wherein 1core is 1000m and 109n, 1Mi ═ 1024Ki ═ 1024 × 1024 bytes. In order to improve the accuracy of prediction and reduce prediction errors, the invention unifies data units during prediction, namely the unit of a CPU is m, and the unit of a memory is Mi. And after the prediction is finished, the numerical value is converted into a corresponding unit.
Because the magnitude of each index data in the original data set is different, if the original data is directly used, the comprehensive comparison and evaluation of the model is not facilitated. In order to eliminate the influence caused by different dimensions of different indexes, data normalization processing is necessary. Moreover, the data normalization can also improve the convergence speed of the model and the prediction precision of the model.
When the resource consumption prediction model is used for predicting the micro-service resource consumption condition, the change trend of the training data is mild, so that the Min-Max normalization method suitable for the data concentration scene is adopted for preprocessing and restoring the data. The Min-Max normalization method maps the data into a [0,1] interval according to the maximum value and the minimum value of the training data, and the conversion method is shown as a formula 4:
wherein, x is the training data to be converted, min is the minimum value of the training data, and max is the maximum value of the training data.
The elastic expanders are uniformly managed by a Controller Manager (Controller Manager) as a logic Controller and run on the master nodes in the cluster. The elastic expansion device consists of a resource prediction module and an active expansion module, wherein the resource prediction module is developed and realized by Python language, the XGboost model is adopted for prediction, and the active expansion module is developed and realized by Go language. The resource prediction module and the active scaling module communicate in a master-slave mode. And the active retractor inquires the resource consumption condition of the micro service in the next 3 minutes to the resource prediction module every 1 minute, and makes a retraction decision according to the returned prediction value and completes the retraction operation.
In another embodiment of the present invention, an electronic device includes:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the methods described above.
In a further embodiment of the invention, a computer-readable medium has stored thereon a computer program which, when being executed by a processor, carries out the above-mentioned method.
It is to be understood that the computer device of the present embodiment includes a Central Processing Unit (CPU) that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) or a program loaded from a storage section into the random access memory 1003. In the RAM, various programs and data necessary for the operation of the system 1000 are also stored. The CPU, ROM, and RAM are connected to each other via a bus. An input/output (I/O) interface is also connected to the bus.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the terminal of the present application.
It should be noted that the computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The units or modules described may also be provided in a processor, wherein the names of the modules do not in some cases constitute a limitation of the module itself.
Exemplary embodiments of the present invention are specifically illustrated and described above. It is to be understood that the invention is not limited to the precise construction, arrangements, or instrumentalities described herein; on the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.