CN116450280A

CN116450280A - Container mixed expansion and contraction method and system based on machine learning

Info

Publication number: CN116450280A
Application number: CN202211554550.9A
Authority: CN
Inventors: 景宇; 闫海娜; 刘磊; 杨帆; 甄富; 鞠娜
Original assignee: Tianyi Cloud Technology Co Ltd
Current assignee: Tianyi Cloud Technology Co Ltd
Priority date: 2022-12-06
Filing date: 2022-12-06
Publication date: 2023-07-18

Abstract

The application provides a container mixed expansion and contraction method and system based on machine learning, wherein the method comprises the following steps: the metrics server collects system resource use state data through kubelet, wherein the system resource use state data comprises CPU, memory, GPU and disk IO use conditions of pod; prometheuseadaptar collects custom collected data types by Prometheus, including: request number per second, concurrency number, PPS, delay, interface response success rate; each pod controller collects real-time data to form an n-dimensional array, wherein the data in the n-dimensional array is n factors influencing the resource usage, and the content of the n-dimensional array is as follows: the method comprises the steps of sorting data according to time, namely [ pod name, timestamp, CPU usage of pod, memory usage of pod, GPU usage of pod, disk IO usage of pod, request per second of pod, concurrency of pod, PPS of pod, delay of pod, interface response success rate of pod ], and storing the data into a database; and recording quota setting under the current scale for the CPU quota set in the pod controller and the memory quota set in the pod controller respectively.

Description

Container mixed expansion and contraction method and system based on machine learning

Technical Field

The application relates to the field of cloud computing, in particular to a container mixed expansion and contraction method and system based on machine learning.

Background

In the current cloud computing products, elastic expansion is generally to automatically adjust elastic computing resources after triggering a threshold according to application requirements and strategies, so that the gap between actual requirements and estimated resources is reduced, the essence of the elastic expansion is to solve the problem of supply-demand balance between resources and service loads, and the elastic expansion plays an important role in the functions and performances of the products.

Kubernetes mainly includes two strategies in terms of container expansion and contraction: one is the container horizontal expansion and contraction capacity, and the example copy number of the pod controller is automatically adjusted according to the real-time load of the container. The other is the vertical expansion and contraction capacity of the container, and the quota of the CPU and the memory of the Pod template in the Pod controller is automatically calculated or adjusted depending on the service load index.

The horizontal expansion capacity is mainly to adjust the copy number of the pod controller, and the expansion capacity is adjusted after the triggering threshold value, so that hysteresis is inevitably generated; and the quota specification of each copy is the same under the same pod controller, and the same quota of each copy can cause inaccurate expansion and contraction capacity and waste of resources. The vertical scaling is mainly to adjust the resource quota of the pod, and the problem to be solved is that the pod needs to be restarted every time, which is unacceptable to the business party.

Disclosure of Invention

The embodiment of the application provides a container mixed expansion and contraction method and system based on machine learning so as to improve the technical problems.

A container mixed expansion and contraction method based on machine learning comprises the following steps:

the metrics server collects system resource use state data through kubelet, wherein the system resource use state data comprises CPU, memory, GPU and disk IO use conditions of pod; prometheuseadaptar collects custom collected data types by Prometheus, including: request number per second, concurrency number, PPS, delay, interface response success rate; each pod controller collects real-time data to form an n-dimensional array, wherein the data in the n-dimensional array is n factors influencing the resource usage, and the content of the n-dimensional array is as follows: the method comprises the steps of sorting data according to time, namely [ pod name, timestamp, CPU usage of pod, memory usage of pod, GPU usage of pod, disk IO usage of pod, request per second of pod, concurrency of pod, PPS of pod, delay of pod, interface response success rate of pod ], and storing the data into a database; based on historical data, two columns are added in a data table, which are respectively a CPU quota set in the pod controller and a memory quota set in the pod controller, and quota setting under the current scale is recorded.

In some embodiments, further comprising: adding a manual tag column in the data table, wherein 0 represents abnormality, 1 represents normal, 2 represents incapability of judging, and if wrong data exist, removing the data by marking, and removing abnormal values of the data.

In some embodiments, further comprising: loading data according to the historical data, drawing a chart, repairing distorted data, and improving data quality; and integrating the historical contemporaneous data and the current day adjacent data to repair.

In some embodiments, further comprising: the method comprises the steps of collecting the resource usage of the pod controller to be predicted, calculating expected quota of the pod controller in the future x hours by a machine learning prediction algorithm, limiting the minimum quota MinDesiredValue and the maximum quota MaxDesiredValue, maxDesiredValue of the expected quota according to the current cluster resource remaining amount change, and preventing the predicted data from being too small or too large to influence the operation of other components.

In some embodiments, further comprising: the distance between the n-dimensional array x of the predicted pod controller and the n-dimensional array y in the historical data is collected, and the distance between the n-dimensional arrays of the historical data is calculated according to the Euclidean distance calculation method:

where i=1 to n, xi is an element of array x, yi is an element of array y.

In some embodiments, further comprising:

defining m samples closest to the sample to be predicted as neighbors of the sample to be predicted; training the m value continuously, and repeating until the m value with the minimum error rate is obtained; determining the resource quota required by the sample to be classified according to the resource quota required by most samples in the neighbor; the desired quota DesiredBassicValue is output.

In some embodiments, further comprising: based on the basic expected quota DesiredDasiValue, adding a periodic variation factor coefficient s (t) influenced by time, a holiday and special event item factor coefficient h (t), and other influence factor coefficients epsilon (t), and calculating a real target quota; desiredValue = DesiredDasiValue x s (t) x h (t) x epsilon (t); wherein t is time.

In some embodiments, further comprising: and calculating the difference between the target index and the actual index, and determining a strategy by using the mixed expansion-contraction capacity controller.

In some embodiments, further comprising:

calculating the ratio of the current quota CurrentMetricValue to the actual target quota DesiredValue: ratio = currentmetric value/DesiredValue;

the ratio is kept unchanged when the ratio is approximately equal to 1, otherwise the ratio is expanded when the CurrentMetricValue is smaller than DesiredValue, and the ratio is contracted when the CurrentMetricValue is larger than DesiredValue;

in some implementations, further comprising, calculating a quota to be adjusted:

AdjustValue＝DesiredValue-CurrentMetricValue；

when AdjustValue < MinDesiredValue, adjustvalue=mindesiredvalue;

when AdjustValue > maxdesiridval, adjustvalue=maxdesiridval;

when MinDesiredValue < Adjust Value < MaxDairedValue, if the remaining resources of the nodes in the cluster cannot schedule Adjust Value, attempting to divide the Adjust Value into x quota according to the remaining resources of the schedulable nodes of the cluster, wherein the total sum is Adjust Value;

when MinDesiredValue < Adjust Value < MaxDesredValue, if the cluster can successfully schedule quota Adjust Value, checking whether a copy set with the specification exists in the pod controller according to Adjust Value, if so, directly expanding the copy set horizontally, if not, creating copy set update with the corresponding specification in the pod controller or creating copy set of the corresponding controller, informing the pod controller of the time stamp of the event needing to be operated, wherein the event comprises: the time stamp is expected to be adjusted, the action is adjusted, and the amount of resources to be adjusted is targeted.

A machine learning based container hybrid expansion and contraction system comprising: an electronic device for performing the method as described above.

The invention has the advantages that:

the pod controller supports innovation of duplicate sets of multiple quotas.

2. The design of the mixed expansion-contraction controller can realize different expansion-contraction activities of different duplicate sets, ensure that partial pod is not rebuilt while vertical expansion-contraction is carried out, and reduce jitter and fluctuation.

3. Predicting the future resource demand quota, and reducing the hysteresis rate.

4. When the residual quota of the cluster schedulable node is insufficient, the expected quota is tried to be divided into x quota according to the residual resources of the cluster schedulable node, and the capacity expansion is completed.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a hybrid expansion and contraction block diagram based on machine learning;

FIG. 2 is a pod controller block diagram 1;

FIG. 3 is a pod controller block diagram 2;

FIG. 4 is a pod controller block diagram 3;

FIG. 5 is a flow chart of a machine learning prediction algorithm;

fig. 6 is a flow chart of a hybrid expansion and contraction controller.

Detailed Description

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, based on the embodiments herein, which are within the scope of the protection of the present application, will be within the skill of the art without undue effort.

In this application, the terms "mounted," "connected," "secured," and the like are to be construed broadly unless otherwise specifically indicated or defined. For example, the connection can be fixed connection, detachable connection or integral connection; can be mechanically or electrically connected; the connection may be direct, indirect, or internal, or may be surface contact only, or may be surface contact via an intermediate medium. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art as the case may be.

Furthermore, the terms "first," "second," and the like, are used merely for distinguishing between descriptions and not for understanding as a specific or particular structure. The description of the terms "some embodiments," "other embodiments," and the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this application, the schematic representations of the above terms are not necessarily for the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples described herein, as well as features of various embodiments or examples, may be combined and combined by those skilled in the art without conflict.

In some embodiments, further comprising: the distance between the n-dimensional array x of the predicted pod controller and the n-dimensional array y in the historical data is collected, the distance between the n-dimensional arrays of the historical data is calculated according to the Euclidean distance calculation method, and n represents the space dimension.

Where i=1 to n, xi is an element of array x, yi is an element of array y.

In some embodiments, further comprising:

defining m samples closest to the sample to be predicted as neighbors of the sample to be predicted; the m value needs to be trained continuously until the m value with the minimum error rate is taken. Determining the resource quota required by the sample to be classified according to the resource quota required by most samples in the neighbor; the desired quota DesiredBassicValue is output.

In some embodiments, further comprising:

the ratio of the currentMetricValue to the true target quota DesiredValue is calculated, the ratio=currentMetricValue/DesiredValue remains unchanged when ratio is about 1, otherwise the ratio is expanded when currentMetricValue < DesiredValue, and the ratio is contracted when currentMetricValue > DesiredValue.

In some embodiments, further comprising calculating a quota to be adjusted Adjust value = DesiredValue-CurrentMetricValue

When AdjustValue < MinDesiredValue, adjustvalue=mindesiredvalue

When AdjustValue > maxdesiridevalue, adjustvalue=maxdesiridevalue.

In some embodiments, when MinDesiredValue < AdjustValue < maxdesiridvalue, if the remaining resources of the nodes in the cluster cannot schedule AdjustValue, an attempt is made to divide AdjustValue into x quotas according to the remaining resources of the schedulable nodes of the cluster, the total being AdjustValue.

In some embodiments, further including when MinDesiredValue < AdjustValue < maxdesired value, if the cluster can successfully schedule quota AdjustValue, checking whether there is a copy set with the specification in the pod controller according to AdjustValue, if there is, directly performing horizontal expansion and contraction on the copy set, if there is, creating copy set update with the corresponding specification in the pod controller or creating copy set of the corresponding controller, notifying the pod controller of a time stamp of an event that needs to be operated, where the event includes: the time stamp is expected to be adjusted, the action is adjusted, and the amount of resources to be adjusted is targeted.

The invention also discloses a container mixed expansion and contraction system based on machine learning, which comprises: an electronic device for performing the method as described above.

The design is based on kubernetes horizontal and vertical expansion and contraction strategies, provides a prediction algorithm based on machine learning, performs mixed expansion and contraction, aims at timely and accurately adjusting, and a pod controller supports attribute setting of multiple quotas, wherein each quota is controlled by one copy set, and the quotas of multiple pods controlled by one copy set are the same; different adjustment strategies are supported for different duplicate sets under one pod controller, the adjusted resource accurate matching requirement is achieved, the cluster resource utilization rate is improved, the expansion and contraction times are reduced, and the advantages and disadvantages of horizontal expansion and contraction are integrated. The main method comprises the following steps: predicting future resource quota of the pod controller through a machine learning algorithm; the mixed expansion and contraction controller decides a strategy for expansion and contraction according to future resource quota; a copy set with multiple quotas is allowed to exist in a pod controller, and accurate expansion and contraction are realized according to a policy.

The realization steps are as follows:

1. and (3) constructing a historical database, collecting and landing the characteristic data, and taking the characteristic data as an input item of the machine learning prediction algorithm in the step (2). Collecting system resource usage data through the step 1.1; data of the custom items are collected by step 1.2. Machine learning predictive algorithms require timing to acquire and store data, which is exposed through metrics api.

1.1metrics server collects system resource usage status data including CPU, memory, GPU, disk IO usage of pod through kubelet.

1.2 PrometheuseAdaptar collects custom collected data types by Prometheus, including: requests per second, concurrency, PPS, delay, interface response success rate.

1.3, each pod controller collects real-time data to form an n-dimensional array, wherein the data in the n-dimensional array is n factors influencing the resource usage, and the content of the n-dimensional array is as follows: the data are stored in a database according to time sequence.

1.5, based on historical data, adding two columns in a data table, namely a CPU quota set in the pod controller and a memory quota set in the pod controller, and recording quota setting under the current scale.

1.6, adding a manual label column in the data table, wherein 0 represents abnormality, 1 represents normal, 2 represents incapability of judging, and if wrong data exist, removing the data through marking, and removing abnormal values of the data.

1.7, loading data according to historical data, drawing a chart, repairing distorted data and improving data quality; and integrating the historical contemporaneous data and the current day adjacent data to repair.

2. The method comprises the steps of collecting the resource usage of the pod controller to be predicted, calculating expected quota of the pod controller in the future x hours by a machine learning prediction algorithm, limiting the minimum quota MinDesiredValue and the maximum quota MaxDesiredValue, maxDesiredValue of the expected quota according to the current cluster resource remaining amount change, and preventing the predicted data from being too small or too large to influence the operation of other components.

2.1, collecting the distance between the n-dimensional array x of the predicted pod controller and the n-dimensional array y in the historical data, and calculating the distance between the n-dimensional arrays of the historical data according to the Euclidean distance calculation method, wherein n represents the space dimension.

2.2, according to the calculated distance in 2.1, defining m samples with the nearest distance to the sample to be predicted as the neighbors of the sample to be predicted; the m value needs to be trained continuously until the m value with the minimum error rate is taken. Determining the resource quota required by the sample to be classified according to the resource quota required by most samples in the neighbor; outputting a desired quota DesiredBassicValue;

2.4 based on the basic expected quota DesiredDasiValue, adding a periodic variation factor coefficient s (t) influenced by time, a holiday and special event item factor coefficient h (t), and other influence factor coefficients epsilon (t), and calculating a real target quota;

DesiredValue＝DesiredBasicValue*s(t)*h(t)*ε(t)。

s (t) is expressed as a periodical change factor influenced by time, the same pod controller has different request amounts for resources in different time periods, and the resource usage amount has obvious periodicity, so the adjustment can be carried out for month/week/day, the coefficient is adjusted to be higher at the crest of each day, and the coefficient is adjusted to be smaller at the trough of each day; if the resource requirement is higher in the whole month, the coefficient can be adjusted to be higher in the month unit, otherwise, the coefficient is adjusted to be smaller. The initialization in the calculation mode is

The larger the value here, the more pronounced the effect; the smaller this value, the less pronounced the effect of the season

h (t) is indicated as fluctuation caused by the influence of node days or special events on the resource forecast amount, di is indicated as a period of time before and after the ith node day, ki is indicated as the influence range of holidays, for example, business running on the pod controller is expected to have double eleven-like e-commerce activities and the like, and the resource usage amount is caused to fluctuate.

ε (t) is expressed as a noise term and the resource forecast is affected by other uncontrollable factors such as partial machine failures, policy variations, etc.;

3. and (3) calculating the difference between the target index and the actual index according to the target quota DesiredValue given in the step (2), and determining a strategy by using the mixed expansion and contraction controller.

3.1 calculating the ratio of the current quota to the target quota

ratio＝CurrentMetricValue/DesiredValue

When ratio is about 1, the ratio is kept unchanged, otherwise, when the CurrentMetricValue < DesiredValue is expansion, and when the CurrentMetricValue > DesiredValue is contraction;

3.2 calculating quota to be adjusted

AdjustValue＝DesiredValue-CurrentMetricValue；

3.2.1 when AdjustValue < MinDesiredValue, adjustvalue=mindesiredvalue;

3.2.2 when AdjustValue > maxdesiridedvalue, adjustvalue=maxdesiridedvalue;

3.2.3 when MinDesiredValue < Adjust value < MaxDairedValue, if the remaining resources of nodes in the cluster cannot schedule Adjust value, attempting to divide Adjust value into x quota according to the remaining resources of schedulable nodes of the cluster, and the total is Adjust value; into 3.2.5

3.2.4 when MinDesiredValue < Adjust value < MaxDesredValue, if the cluster can successfully schedule quota Adjust value, go to 3.2.5;

3.2.5 checking whether a copy set of the specification exists in the pod controller according to the Adjust value, if so, directly carrying out horizontal expansion and contraction on the copy set, if not, creating copy set update of the corresponding specification in the pod controller or creating copy set of the corresponding controller, informing the pod controller of the time stamp of the event needing to be operated, wherein the event comprises: the time stamp is expected to be adjusted, the action is adjusted, and the amount of resources to be adjusted is targeted.

And 4, after the pod controller receives the change event request of the hybrid expansion and contraction controller, carrying out hybrid expansion and contraction on the copy set under the pod controller according to the time stamp of the event, and improving the resource utilization rate through the strategy. The method is added as once as possible, so that waste is avoided, and too many copies are not added.

4.1, when a copy set of the specification of Adjust value exists in the pod controller, directly performing level adjustment; expansion or contraction as in fig. 2 may be performed.

FIG. 2pod controller block diagram-1

4.2 when there is no duplicate set of the specification of Adjust value in the pod controller, creating a duplicate set of the corresponding specification; as in fig. 3.

FIG. 3pod controller block diagram-2

4.3 when there is no copy set of the specification for AdjustValue in the pod controller and the remaining resources of a single node in the cluster cannot schedule AdjustValue, an attempt is made to partition AdjustValue into x quotas by the remaining resources of the schedulable nodes of the cluster, as shown in fig. 4.

FIG. 4pod controller block diagram-3

5. And (3) evaluating the result of the prediction model, and according to the comparison between the predicted expected resource desiredValue and the actual demand resource, starting from the step (2), adjusting the related parameters n, x, s (t), h (t), epsilon (t) and the like, so that the result is more accurate.

FIG. 5 machine learning prediction algorithm flow chart

The steps are as follows:

step1: and (3) constructing a historical database, supplementing data, screening, cleaning and mapping.

Step2: the resource usage of the pod controller to be predicted is collected.

Step3: according to the Euclidean distance calculating method, the distance between the to-be-predicted pod controller and the historical data n-dimensional array is calculated.

Step4: n samples closest to the sample to be predicted are defined, the n values are required to be trained continuously, and the resource quota required by the sample to be classified is determined according to the resource quota required by most samples in the neighbor; the desired quota DesiredBassicValue is output.

Step5: the factor coefficient s (t) of periodical change due to time influence, factor coefficient h (t) of holidays and special events and other factor coefficient epsilon (t) of influence are added, and the real target quota is calculated.

Step6: judging whether the target quota is reasonable or not, and if the target quota is unreasonable, adjusting the first-pass coefficient, and carrying out the steps again.

Mixed expansion and contraction controller flow

Fig. 6 is a flow chart of a hybrid expansion controller.

The steps are as follows:

step1: calculating the ratio of the target quota to the current quota, judging whether the ratio is approximately equal to 1, and keeping the ratio approximately equal to 1 unchanged, otherwise, expanding and contracting the capacity is needed.

Step2: and calculating the quota to be adjusted Adjust value.

Step3: judging whether the cluster node residual resources can be scheduled or not, entering the step4 when the cluster node residual resources cannot be scheduled, and entering the step5 when the cluster node residual resources can be scheduled.

Step4: attempting to divide Adjust value into x quota according to the residual resource of the cluster schedulable node, and re-entering step2 to calculate quota.

Step5: and checking whether the copy set of the specification exists in the pod controller according to the Adjust value, if so, directly carrying out horizontal expansion and contraction on the copy set, and if not, creating copy set update of the corresponding specification in the pod controller or creating the copy set of the corresponding controller.

Step6: the method comprises the steps of informing a pod controller of a time stamp of an event needing to be operated, wherein the event comprises the following steps: the time stamp is expected to be adjusted, the action is adjusted, and the amount of resources to be adjusted is targeted.

Compared with the prior art, the application has the advantages and effects that:

1. and a copy set of multiple quotas is supported under one pod controller, and resources are efficiently utilized through the specification difference of the pod on the basis of meeting pod requirements.

2. The mode of combining horizontal expansion and contraction volume and vertical expansion and contraction volume is adopted, the total quota after expansion and contraction volume is accurately controlled, jitter and fluctuation are reduced, vertical expansion and contraction volume can be realized, and partial pod can be ensured not to be rebuilt.

3. And predicting the future resource demand quota based on a machine learning algorithm, so that the hysteresis rate is reduced.

See the drawings in the technical scheme section.

The figure is an overall structure diagram for carrying out mixed expansion and contraction based on a machine learning prediction algorithm, and based on the existing horizontal expansion and contraction and vertical expansion and contraction of kubernetes, a mixed expansion and contraction controller is newly added, the machine learning prediction algorithm is increased, quota needed by a pod controller in the future is predicted, and hysteresis rate is reduced. The copy number configuration of different quotas is added in the Pod controller, namely a plurality of copy groups are arranged under one Pod controller, each copy group manages one quota, and the accuracy is increased. The components in the figure function as follows:

the monitoring component supported at present by k8s is mainly used for inquiring the use condition of resources, including core indexes of the use of the resources and custom indexes of the use of the resources. The core index mainly acquires data from components such as kubelet and the like, and the data is provided for a machine learning prediction algorithm by metrics-server. The custom index mainly obtains the index collected by Prometheus through the API provided by Prometheus Adapter, and the collected data is used as the input of a machine learning prediction algorithm.

The machine learning predictive algorithm periodically collects the monitored data via metrics-api and remains in the database. The historical data is cleaned, analyzed and processed, the resource target quota of the pod controller in a future period is predicted, and the data is input to the hybrid expansion-contraction controller. The relevant strategy of how the comprehensive monitoring data is accurately predicted is mainly considered.

The Informier of the design uses the Informier design of the original k8s, relies on a List & Watch mechanism, and locally maintains a cache of the concerned API object; the state change of the objects is timely obtained, then the local cache is updated, and the data is put into the cache after being processed to a certain degree. Meanwhile, the collected data are also used as the input of the mixed expansion-contraction controller and the pod controller.

The Cache is used for storing information Cache between the mixed expansion and contraction controller and the pod controller, and the intermediate result of scheduling is temporarily stored in the Cache.

The mixed expansion and contraction controller obtains the current mixed expansion and contraction object through an index, obtains the data predicted by machine learning and the state of the current object, calculates the difference between the target quota and the current quota, calculates which expansion and contraction strategy should be adopted by the Pod controller through a certain algorithm, and monitors the strategy by the Pod controller.

And through the creation/update/deletion event of the index list/watch resource, according to what mixed expansion and contraction actions are required to be carried out, the adjustment of the pod controller is carried out according to the strategy, and the operation is issued to the copy set. The Pod controller supports the requirement of more than one quota, and each quota is controlled by one copy set; and judging that if the copy set needing to be expanded and contracted exists in the pod controller, carrying out horizontal expansion and contraction on the copy set directly, and if the quota copy set needing to be expanded and contracted does not exist in the pod controller, creating a new copy set.

In some embodiments, the machine learning-based container hybrid expansion and contraction method is applicable to a machine learning-based container hybrid expansion and contraction system, which comprises a monitoring module, a machine learning prediction module, a hybrid expansion and contraction controller, a pod controller, a buffer and an index module. The monitoring module is used for monitoring the resource use condition of the component query, and optionally, the resource use condition of the monitoring component query supported at present by k8s is mainly used, wherein the monitoring module comprises a core index of resource use and a self-defined index of resource use. The core index mainly acquires data from components such as kubelet and the like, and the data is provided for a machine learning prediction algorithm by metrics-server. The user-defined index mainly obtains an index collected by Prometheus through an API provided by Prometheus Adapter, and the collected data is used as input of a machine learning prediction algorithm; the machine learning prediction module is used for predicting the quota needed by the pod controller in the future, and optionally, the machine learning prediction module collects monitoring data through metrics-api at regular time and keeps the monitoring data in a database. The historical data is cleaned, analyzed and processed, the resource target quota of the pod controller in a future period is predicted, and the data is input to the hybrid expansion-contraction controller. Related strategies of how the comprehensive monitoring data are accurately predicted are mainly considered; the mixed capacity-reducing controller is used for calculating the difference between the target quota and the current quota, and optionally, the mixed capacity-reducing controller obtains the current mixed capacity-reducing object through an index, and calculates what capacity-reducing strategy the Pod controller should take through a certain algorithm by obtaining the data predicted by the machine learning module and the state of the current object, and the strategy is monitored by the Pod controller; the pod controller is used for creating/updating/deleting events of the index list/watch resource, and optionally, the pod controller is responsible for adjusting the pod controller according to the strategy and issuing the operation to the copy set according to what kind of mixed expansion and contraction actions are required to be carried out, which are acquired from the Informater. The Pod controller supports the requirement of more than one quota, and each quota is controlled by one copy set; judging that if a copy set needing capacity expansion and contraction exists in the pod controller, carrying out horizontal capacity expansion and contraction on the copy set directly, and if no quota copy set needing capacity expansion and contraction exists in the pod controller, creating a new copy set; the buffer is used for storing information buffer between the mixed expansion and contraction controller and the pod controller, and the intermediate result of dispatching is temporarily stored in the Cache; the index module is used for acquiring state changes of the objects, updating the local cache, processing the data to a certain degree and then putting the processed data into the cache. Meanwhile, the collected data are also used as the input of the mixed expansion-contraction controller and the pod controller.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present application, and are intended to be included within the scope of the present application.

Claims

1. The container mixed expansion and contraction method based on machine learning is characterized by comprising the following steps of:

the metrics server collects system resource use state data through kubelet, wherein the system resource use state data comprises CPU, memory, GPU and disk IO use conditions of pod;

prometheuseadaptar collects custom collected data types by Prometheus, including: request number per second, concurrency number, PPS, delay, interface response success rate;

each pod controller collects real-time data to form an n-dimensional array, wherein the data in the n-dimensional array is n factors influencing the resource usage, and the content of the n-dimensional array is as follows: the method comprises the steps of sorting data according to time, namely [ pod name, timestamp, CPU usage of pod, memory usage of pod, GPU usage of pod, disk IO usage of pod, request per second of pod, concurrency of pod, PPS of pod, delay of pod, interface response success rate of pod ], and storing the data into a database;

based on historical data, two columns are added in a data table, which are respectively a CPU quota set in the pod controller and a memory quota set in the pod controller, and quota setting under the current scale is recorded.

2. The method as recited in claim 1, further comprising:

adding a manual tag column in the data table, wherein 0 represents abnormality, 1 represents normal, 2 represents incapability of judging, and if wrong data exist, removing the data by marking, and removing abnormal values of the data.

3. The method as recited in claim 1, further comprising:

loading data according to the historical data, drawing a chart, repairing distorted data, and improving data quality; synthesizing historical contemporaneous data, and repairing the data adjacent to the current day;

the method comprises the steps of collecting the resource usage of the pod controller to be predicted, calculating expected quota of the pod controller in the future x hours by a machine learning prediction algorithm, limiting the minimum quota MinDesiredValue and the maximum quota MaxDesiredValue, maxDesiredValue of the expected quota according to the current cluster resource remaining amount change, and preventing the predicted data from being too small or too large to influence the operation of other components.

4. A method according to claim 3, further comprising:

the distance between the n-dimensional array x of the predicted pod controller and the n-dimensional array y in the historical data is collected, and the distance between the n-dimensional arrays of the historical data is calculated according to the Euclidean distance calculation method:

where i=1 to n, xi is an element of array x, yi is an element of array y.

5. The method as recited in claim 4, further comprising:

6. The method as recited in claim 5, further comprising: based on the basic expected quota DesiredDasiValue, adding a periodic variation factor coefficient s (t) influenced by time, a holiday and special event item factor coefficient h (t), and other influence factor coefficients epsilon (t), and calculating a real target quota: desiredValue = DesiredDasiValue x s (t) x h (t) x epsilon (t); wherein t is time.

7. The method as recited in claim 6, further comprising: and calculating the difference between the target index and the actual index, and determining a strategy by using the mixed expansion-contraction capacity controller.

8. The method as recited in claim 7, further comprising:

the ratio is kept constant when ratio is approximately 1, otherwise the capacity is expanded when currentMetricValue < DesiredValue, and the capacity is contracted when currentMetricValue > DesiredValue.

9. The method of claim 8, further comprising calculating a quota to be adjusted: adjust value = DesiredValue-CurrentMetricValue;

when AdjustValue < MinDesiredValue, adjustvalue=mindesiredvalue;

when AdjustValue > maxdesiridval, adjustvalue=maxdesiridval;

10. A machine learning based container hybrid expansion and contraction system, comprising: electronic device for performing the method according to any of claims 1-9.