WO2023226312A1 - 一种业务集群的伸缩方法及相关设备 - Google Patents

一种业务集群的伸缩方法及相关设备 Download PDF

Info

Publication number
WO2023226312A1
WO2023226312A1 PCT/CN2022/130526 CN2022130526W WO2023226312A1 WO 2023226312 A1 WO2023226312 A1 WO 2023226312A1 CN 2022130526 W CN2022130526 W CN 2022130526W WO 2023226312 A1 WO2023226312 A1 WO 2023226312A1
Authority
WO
WIPO (PCT)
Prior art keywords
instances
scaling
resource group
next cycle
management system
Prior art date
Application number
PCT/CN2022/130526
Other languages
English (en)
French (fr)
Inventor
王楠楠
杨昌鹏
王军
刘弋扬
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2023226312A1 publication Critical patent/WO2023226312A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present application relates to the field of cloud computing technology, and in particular to a scaling method for a business cluster, a scaling management system, a computer cluster, a computer-readable storage medium, and a computer program product.
  • An application program also referred to as an application (app) refers to a program written for a special application purpose for the user, such as a text processor, spreadsheet, accounting application, media player, airline flight Emulators, command line games, image editors, etc.
  • the business cluster includes at least one resource group, and each resource group includes at least one instance running an application.
  • Monitoring indicators are set for the instances in the resource group to monitor the performance of the instances.
  • the monitoring indicator may be, for example, central processing unit (CPU) utilization.
  • CPU central processing unit
  • the scaling of business cluster is triggered.
  • the scaling of business clusters includes increasing the number of instances in the resource group (that is, increasing the amount of application deployment, referred to as expansion), or reducing the number of instances in the resource group (that is, reducing the amount of application deployment, referred to as scaling down).
  • the above solution is to scale the business cluster when the monitoring indicators (such as average CPU utilization) reach the alarm threshold. This will cause a certain delay. In extreme cases, the number of instances in the resource group may not be increased in a timely manner, causing an impact on the business. Risk of adverse effects. For this reason, the scaling policy in the alarm policy is usually conservative. For example, the scaling policy can be set so that the average CPU utilization is lower than 25% in three consecutive cycles, then 2 instances will be reduced. The scaling policy can also be set so that the average CPU utilization is higher than 55% in one cycle, then 5 instances will be added. Example.
  • This application provides a scaling method for business clusters.
  • This method obtains real-time business traffic and predicts the resource requirements of the next period based on the real-time business traffic, thereby periodically outputting the predicted scaling strategy.
  • the predicted scaling strategy in the next cycle When the scaling policy of the current cycle is updated, the predictive scaling policy will be actively executed in the next cycle without having to wait passively for monitoring indicators to reach the alarm threshold. This avoids resource waste and improves resource utilization while ensuring good business operation.
  • this application provides a scaling method for business clusters.
  • This method can be executed by the scaling management system.
  • the scaling management system may be a software system, and the software system may be deployed in a computer cluster of the cloud platform.
  • the computer cluster runs the above software system, thereby executing the scaling method of the business cluster in the embodiment of the present application.
  • the scaling management system may also be a hardware system.
  • the hardware system can include one or more computers in the cloud platform. When the hardware system is running, the scaling method of the business cluster in the embodiment of the present application is executed.
  • the business cluster includes at least one resource group, and each resource group includes at least one instance running an application.
  • the scaling management system can obtain the business traffic of the application in the current cycle, and obtain the predicted scaling strategy of the resource group of the application in the next cycle based on the business traffic of the application in the current cycle.
  • the scaling management system executes the predictive scaling policy in the next cycle.
  • the scaling management system obtains the business traffic of the application in the current cycle, predicts the resource requirements of the next cycle based on the real-time business traffic, and thereby periodically outputs the predicted scaling strategy.
  • the predictive scaling strategy of the next cycle is updated relative to the scaling strategy of the current cycle, the predictive scaling strategy is actively executed in the next cycle. There is no need to passively wait for the monitoring indicator to reach the alarm threshold, nor does it need to wait for the first alarm threshold to be reached in multiple consecutive cycles. Reduce the number of instances in the resource group by a small amount, or increase the number of instances in the resource group by the maximum amount after reaching the second alarm threshold in a cycle. This avoids resource waste and improves resource utilization.
  • the at least one instance includes a container instance or a virtual machine instance. Accordingly, the scaling management system can adjust the number of instances in the resource group in the next cycle according to the predicted scaling policy, thereby realizing elastic scaling of the business cluster to meet different needs of the business.
  • the scaling management system when the scaling management system executes the predicted scaling policy in the next cycle, it can directly adjust the number of instances in the resource group to the target value in the next cycle, or, in the next cycle, it can adjust the number of instances in the resource group according to the scaling policy.
  • the step size adjusts the number of instances in the resource group.
  • the predicted scaling policy may include scaling conditions for the next period, and the scaling management system may also adjust the number of instances in the resource group when the scaling conditions are triggered.
  • the scaling management system can provide a variety of scaling rules to elastically scale the business cluster, which can cover a variety of scenarios and has high availability.
  • the target value includes a minimum number of instances and/or a maximum number of instances.
  • the scaling management system increases the number of instances in the resource group to the minimum number of instances in the next cycle.
  • the scaling management system reduces the number of instances in the resource group to the maximum number of instances in the next cycle.
  • the number of instances in the resource group can be quickly increased to meet business needs and avoid affecting the business.
  • the number of instances in the resource group can be quickly reduced to recycle resources and improve resource utilization. .
  • the minimum number of instances and the maximum number of instances are determined based on the service traffic and the average performance value of instances in the resource group.
  • the scaling management system can determine the number of instances that achieve the optimization goal, that is, the reference number, based on the relationship between business traffic, instance number, and performance values, and then determine the minimum number of instances and the maximum number of instances based on the reference number.
  • the scaling management system can The number of instances in the resource group is periodically adjusted according to the scaling step, thereby achieving elastic scaling of the business cluster in a fine-tuning manner to meet business needs.
  • the scaling management system can also be combined with passive scaling to achieve elastic scaling of the business cluster.
  • the scaling management system can set different scaling steps for different performance values.
  • the scaling management system adjusts the number of instances in the resource group according to the first scaling step in the next cycle.
  • the scaling management system adjusts the number of instances in the resource group according to the second scaling step in the next cycle.
  • the second performance value is greater than the first performance value
  • the second expansion step is greater than the first expansion step.
  • the predicted scaling policy for the application's resource group in the next period obtained by the scaling management system may include an alarm threshold for the next period.
  • the alarm thresholds in different periods may be different, for example, they may change with changes in service traffic.
  • the scaling management system may adjust the number of instances in the resource group when the average performance value of the instances in the resource group reaches the alarm threshold in the next period.
  • the number of instances in the resource group can be adjusted in advance to ensure the normal operation of the business without wasting resources.
  • this application provides a scaling management system.
  • the scaling management system is used to scale a business cluster.
  • the business cluster includes at least one resource group, and each resource group includes at least one instance of a running application.
  • the system includes:
  • a communication module used to obtain the business traffic of the application in the current cycle
  • a prediction module configured to obtain the predicted scaling strategy of the resource group of the application in the next period based on the business traffic of the application in the current period;
  • a scaling module configured to execute the predicted scaling policy in the next cycle when the predicted scaling policy of the resource group of the application in the next cycle is updated relative to the scaling policy of the application in the current cycle.
  • the at least one instance includes a container instance or a virtual machine instance, and the scaling module is specifically used to:
  • the number of instances in the resource group is adjusted in the next cycle.
  • the scaling module is specifically used to:
  • the target value includes the minimum number of instances and/or the maximum number of instances
  • the scaling module is specifically used to:
  • the number of instances in the resource group is reduced to the maximum number of instances in the next cycle.
  • the minimum number of instances and the maximum number of instances are determined based on the service traffic and the average performance value of instances in the resource group.
  • the scaling module is specifically used to:
  • the number of instances in the resource group is adjusted according to the scaling step in the next cycle.
  • the scaling module is specifically used to:
  • the number of instances in the resource group is adjusted according to the first scaling step in the next cycle
  • the average performance value of at least one instance in the resource group reaches the second performance value
  • the number of instances in the resource group is adjusted according to the second scaling step in the next cycle, and the second performance value is greater than the first performance value
  • the second telescopic step length is larger than the first telescopic step length.
  • the scaling module is specifically used to:
  • the number of instances in the resource group is adjusted.
  • this application provides a computer cluster.
  • the computer cluster includes at least one computer including at least one processor and at least one memory.
  • the at least one processor and the at least one memory communicate with each other.
  • the at least one processor is configured to execute instructions stored in the at least one memory, so that the computer cluster executes the method in the first aspect or any implementation of the first aspect.
  • the present application provides a computer-readable storage medium that stores instructions instructing a computer cluster to execute the above-mentioned first aspect or any implementation of the first aspect.
  • the present application provides a computer program product containing instructions that, when run on a computer cluster, causes the computer cluster to execute the method described in the first aspect or any implementation of the first aspect.
  • Figure 1A is a schematic architectural diagram of a resource scheduling system provided by an embodiment of the present application.
  • Figure 1B is a schematic architectural diagram of a resource scheduling system provided by an embodiment of the present application.
  • Figure 2 is a flow chart of a resource scheduling method provided by an embodiment of the present application.
  • Figure 3 is a flow chart of a resource scheduling method provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of a simulation result provided by an embodiment of the present application.
  • Figure 5 is a hardware structure diagram of a computer cluster provided by an embodiment of the present application.
  • first and second in the embodiments of this application are only used for descriptive purposes and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, features defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • Applications refer to programs written for a specific application purpose for users, such as text processors, spreadsheets, accounting applications, media players, aviation flight simulators, command line games, and image editors.
  • the program can be deployed in a cloud platform.
  • Cloud platform refers to a platform that provides computing, storage or network capabilities to users in the form of cloud services.
  • the cloud platform can provide computing, storage or network capabilities on demand to deploy applications in a clustered manner.
  • the cloud platform can build a business cluster, and the business cluster includes at least one resource group.
  • Each resource group includes at least one instance running an application.
  • the instance refers to the dynamic code generated by running the application.
  • the dynamic code can be called a process or thread.
  • the process or thread can realize the corresponding application purpose, for example, it can realize video playback, image editing, etc.
  • the cloud platform can adjust the number of instances in the resource group to flexibly adjust the deployment volume of applications. In this way, when business traffic is high, the deployment amount of applications can be increased to cope with high concurrency of applications, and when business traffic is low, the deployment amount of applications can be reduced to reduce resource consumption and improve resource utilization.
  • monitoring indicators are set for instances in the resource group.
  • the monitoring indicators may be, for example, CPU utilization, memory utilization, input output (IO) utilization and other performance values.
  • IO input output
  • a resource group includes one or more instances, you can also determine the average performance value of the instances in the resource group.
  • monitoring indicators such as average CPU utilization and other average performance values
  • the scaling of the business cluster is triggered to adjust the deployment volume of the application.
  • the scaling policy in the alarm policy is usually conservative. For example, the scaling policy can be set so that if the average CPU utilization is lower than 25% in three consecutive cycles, 2 instances will be reduced; if the average CPU utilization is higher than 55% in one cycle, 5 instances will be added.
  • embodiments of the present application provide a scaling method for a business cluster.
  • This method can be executed by the scaling management system.
  • the scaling management system may be a software system, and the software system may be deployed in a computer cluster of the cloud platform.
  • the computer cluster runs the above software system, thereby executing the scaling method of the business cluster in the embodiment of the present application.
  • the scaling management system may also be a hardware system.
  • the hardware system can include one or more computers in the cloud platform. When the hardware system is running, the scaling method of the business cluster in the embodiment of the present application is executed.
  • the business cluster includes at least one resource group, and each resource group includes at least one instance of a running application.
  • the scaling management system obtains the business traffic of the application in the current cycle, and based on the business traffic of the application in the current cycle, Business traffic, obtain the predicted scaling policy of the application's resource group in the next cycle.
  • the scaling management system performs predictive scaling in the next cycle. Strategy.
  • the scaling management system obtains the business traffic of the application in the current cycle, predicts the resource requirements of the next cycle based on the real-time business traffic, and thereby periodically outputs the predicted scaling strategy.
  • the predictive scaling strategy of the next cycle is updated relative to the scaling strategy of the current cycle, the predictive scaling strategy is actively executed in the next cycle. There is no need to passively wait for the monitoring indicator to reach the alarm threshold, nor does it need to wait for the first alarm threshold to be reached in multiple consecutive cycles. Reduce the number of instances in the resource group by a small amount, or increase the number of instances in the resource group by the maximum amount after reaching the second alarm threshold in a cycle. This avoids resource waste and improves resource utilization.
  • the scaling management system 10 includes a policy prediction device 102 and a policy execution device 104 .
  • the above-mentioned policy prediction device 102 and policy execution device 104 can be implemented by software, or can be implemented by hardware.
  • the policy prediction device 102 is connected to the business cluster 20.
  • the business cluster 20 includes at least one resource group, and each resource group includes at least one instance of a running application.
  • the policy execution device 104 accesses the resource platform 30 .
  • the above-mentioned resource platform 30 can be a software platform or a hardware platform.
  • the resource platform 30 can also be divided into a container resource platform (referred to as a container platform) or a virtual machine (virtual machine) resource platform (referred to as a virtual machine platform) according to resource types.
  • the container platform is used to increase or decrease the number of container instances in the resource group
  • the virtual machine platform is used to increase or decrease the number of virtual machine instances in the resource group.
  • the policy prediction device 102 is configured to obtain the service traffic of the application in the current period from the service cluster 20, and obtain the predicted scaling strategy of the application in the next period based on the service traffic of the application in the current period.
  • the above period can usually be set to a smaller value, for example, the period can be set to 10 minutes (minute, min), so as to detect changes in the application's business traffic in a timely manner, and then Adjust the scaling strategy in a timely manner according to changes in business traffic.
  • the policy execution device 104 is configured to execute the predicted scaling policy in the next cycle when the predicted scaling policy of the resource group of the application in the next cycle is updated relative to the scaling policy of the application in the current cycle. For example, the policy execution device 104 may execute the predictive scaling policy in the next cycle, thereby causing the resource platform 30 to increase or decrease the number of instances in the resource group of the application.
  • the interaction process between the strategy prediction device 102 and the strategy execution device 104 has been described in detail above.
  • the structure of the strategy prediction device 102 including the communication module 1022 and the prediction module 1024 will be described below.
  • the strategy prediction device 102 includes a communication module 1022 and a prediction module 1024 .
  • the communication module 1022 is used to obtain the business traffic of the application in the current cycle
  • the prediction module 1024 is used to obtain the predicted scaling strategy of the resource group of the application in the next cycle based on the business traffic of the application in the current cycle.
  • the policy execution device 104 includes a communication module 1042 and a scaling module 1044. Among them, the communication module 1042 is used to obtain the predicted scaling policy of the application's resource group in the next cycle, and the scaling module 1044 is used to apply the predicted scaling policy in the next cycle when there is an update relative to the application's scaling policy in the current cycle, in the next cycle.
  • the prediction scaling strategy is executed periodically.
  • FIG. 1A illustrates that the policy prediction device 102 and the policy execution device 104 are independent devices.
  • the policy prediction device 102 and the policy execution device 104 can be integrated into one device.
  • the scaling management system 10 includes a communication module 1062, a prediction module 1064, and a scaling module 1066.
  • the specific implementation of the prediction module 1064 and the scaling module 1066 can be found in the description of the prediction module 1024 and the scaling module 1044.
  • the communication module 1062 has the functions of the communication module 1022 and the communication module 1042.
  • FIG. 1A and FIG. 1B are only some schematic divisions of the telescopic management system 10, and the above-mentioned devices or modules of the telescopic management system 10 are divided from a functional perspective. In other embodiments of the present application, In possible implementations, the telescopic management system 10 can also be divided into different devices or modules from other perspectives.
  • the above-mentioned device is a software device and the above-mentioned module is a software module
  • the above-mentioned device or module can be deployed centrally in one computer or distributed in different computers in a computer cluster.
  • the above-mentioned device is a hardware device
  • the above-mentioned module is a hardware module
  • the above-mentioned multiple devices or modules may correspond to one computer, or to different computers in a computer cluster.
  • FIGS 1A and 1B introduce the scaling management system 10 of the embodiment of the present application.
  • the scaling method of the scaling management system 10 shown in Figure 1A for executing the business cluster of the embodiment of the present application will be introduced below with reference to the accompanying drawings.
  • the scaling management system 10 includes a policy prediction device 102 and a policy execution device 104.
  • the method includes:
  • the policy prediction device 102 obtains the service traffic of the application in the current cycle from the service cluster 20.
  • Business traffic refers to the traffic generated by users using applications to trigger business requests. Based on this, business traffic can be characterized by the number of business requests per unit time. The unit time may be, for example, seconds or minutes. Taking scenarios such as radio and television media, short videos, and online education as examples, the business cluster 20 can be an audio and video transcoding cluster. Considering that users can use many different types of terminals, such as personal computers (PCs), smart phones, etc. Mobile phones and smart watches are used to play videos. The audio and video transcoding cluster can transcode the audio and video in the cloud after receiving the business request, and push the transcoded audio and video to the corresponding terminal for playback. Among them, the number of business requests received by the audio and video transcoding cluster within a unit time is recorded as business traffic.
  • PCs personal computers
  • the audio and video transcoding cluster can transcode the audio and video in the cloud after receiving the business request, and push the transcoded audio and video to the corresponding terminal for playback.
  • Business traffic can change dynamically. For example, some applications have high business traffic during specific time periods (such as three meal periods) and low business traffic during other time periods. In some cases, applications can also generate sudden business traffic. For example, when an application is promoted to other platforms, sudden business traffic can be generated.
  • the policy prediction device 102 can set a smaller period. For example, the cycle can be set to 10 minutes. Based on this, the policy prediction device 102 can obtain the service traffic from the service cluster 20 every 10 minutes. The business traffic obtained by the policy prediction device 102 in the current cycle is the business traffic applied in the current cycle.
  • the business traffic applied in the current cycle belongs to time series data, which can be represented by arrays or traffic curves.
  • the values in the array are the business traffic sampling values at different time points in the current cycle.
  • the traffic curve can be obtained by curve fitting the service traffic sample values at different time points and at each time point in the current cycle.
  • the policy prediction device 102 obtains the predicted scaling policy of the resource group of the application in the next cycle based on the service traffic of the application in the current cycle.
  • the policy prediction device 102 can predict the service traffic of the application in the next cycle based on the service traffic of the application in the current cycle, and further predict the predicted scaling policy of the resource group of the application in the next cycle based on the business traffic of the application in the next cycle.
  • the application's business traffic in the current cycle can reflect the traffic change trend, and based on the traffic change trend, the application's business traffic in the next cycle can be predicted. For example, if the application's business traffic in the current cycle increases at a relatively stable growth rate, then the application's business traffic in the next cycle has a high probability of growing at this growth rate.
  • the policy prediction device 102 can input the business traffic applied in the current period into the traffic prediction model to predict the business traffic applied in the next period.
  • the traffic prediction model can be trained through historical traffic data.
  • the historical traffic data can be the business traffic of N days in history. Among them, N is a positive number, for example, N can be 15.
  • the business traffic of N days in history can be divided into multiple time series in hourly units. Two adjacent time series can be constructed as a sample data, so that a sample set can be obtained.
  • the sample set can be further divided into training set, validation set and test set.
  • the policy prediction device 102 can use a neural network model based on time series, such as a long short term memory (LSTM) network to construct an initial traffic prediction model.
  • the initial traffic prediction model can be initialized through a Gaussian distribution. Or initialize the parameters by random distribution initialization, etc., and then input the sample data in the training set into the initial traffic prediction model for parameter iteration.
  • the trained model meets the training stop conditions, for example, the loss value of the trained model tends to converge, Or when the loss value is less than the preset value, training can be stopped.
  • the strategy prediction device 102 can input the sample data of the verification set into the trained model, and adjust the hyperparameters according to the performance of the model on the verification set, thereby further optimizing the model.
  • the optimized model can include multiple models, and these multiple models can be tested on the test set to obtain the performance of multiple models. Among them, the performance of the model can be measured by one or more indicators such as accuracy and inference time.
  • the policy prediction device 102 can select a model with better performance as a traffic prediction model based on the performance of each model, so as to periodically predict the business traffic of the application in the next period.
  • the strategy prediction device 102 can set an optimization goal, and then find the number of instances that can achieve the above optimization goal. Considering that the predicted business traffic may be somewhat different from the real business traffic, the policy prediction device 102 may use the number of instances as a reference to determine the maximum number of instances and the minimum number of instances, and adjust it within the range from the minimum number of instances to the maximum number of instances. The number of instances in the resource group. For ease of description, the number of instances that achieves the above optimization goals is also called the reference number.
  • the optimization target may be the average CPU utilization of instances in the resource group of 60%, that is, the target average CPU utilization of 60%.
  • the policy prediction device 102 may determine the reference quantity in combination with the application's business traffic in the next cycle.
  • is the coefficient.
  • the above coefficient can be determined based on the number of business requests processed concurrently when a single instance is fully loaded. Assuming that the number of concurrently processed business requests when a single instance is fully loaded is 150, it can be assumed that the business traffic is 150, the number of instances is 1, and the average CPU utilization is 100%. Substituting the above values into formula (1) can determine the coefficient ⁇ , in In this example, ⁇ may be 150.
  • the policy prediction device 102 can input the target average CPU utilization and the application traffic flow (prediction result) in the next period into the above formula (1) to determine the reference quantity.
  • the curve can be integrated, or the sampling points in the curve can be averaged, and then the above formula (1) can be entered to determine the reference quantity.
  • the policy prediction device 102 may determine the maximum number of instances and the minimum number of instances based on the reference number. For example, the policy prediction device 102 can set a scaling ratio, and determine the maximum number of instances and the minimum number of instances based on the scaling ratio and the reference number. Assuming that the scaling ratio is 50% and the reference quantity is 100, the minimum proportional quantity can be 50 and the maximum proportional quantity can be 150.
  • the predicted scaling strategy determined by the policy prediction device 102 may include increasing the number of instances in the resource group to the minimum number of instances in the next cycle when the current number of instances is less than the minimum number of instances.
  • the predictive scaling strategy may also include reducing the number of instances in the resource group to the maximum number of instances in the next cycle when the current number of instances is greater than the maximum number of instances. In other words, when the current number of instances is less than the minimum number of instances or greater than the maximum number of instances, active scaling can be triggered to quickly increase or decrease the number of instances in the resource group without waiting for monitoring indicators to reach alarm thresholds.
  • setting the above minimum number of instances can also prevent the application's business from being affected when sudden traffic or other situations occur when the number of instances is too small, making it difficult to provide external services.
  • Setting the maximum number of instances can avoid excessively increasing the number of instances and causing waste of resources.
  • the predicted scaling strategy determined by the policy prediction device 102 may include: when the current number of instances is greater than or equal to the minimum number of instances and less than or equal to the maximum number of instances, adjusting the number of instances in the resource group according to the scaling step in the next cycle.
  • the prediction scaling strategy determined by the policy prediction device 102 may also be combined with passive scaling.
  • the predicted scaling strategy determined by the policy prediction device 102 may be: when the average performance value (such as average CPU utilization) of the instances in the resource group reaches a certain performance value, adjust the number of instances in the resource group according to the scaling step size.
  • the predictive scaling strategy can be further subdivided, specifically by setting different scaling steps for situations where different performance values are reached. Specifically, when the average performance value of the instances in the resource group reaches the first performance value, the number of instances in the resource group is adjusted according to the first scaling step. When the average performance value of the instances in the resource group reaches the second performance value, the number of instances in the resource group is adjusted according to the second scaling step. The scaling step adjusts the number of instances in the resource group. The second performance value is greater than the first performance value, and the second scaling step is greater than the first scaling step.
  • average CPU utilization greater than 0.6 can be divided into average CPU utilization greater than 0.6 and less than or equal to 0.7, and average CPU utilization greater than 0.7.
  • different scaling steps can be defined in the prediction scaling policy.
  • the scaling step can be 6 (meaning adding 6 instances at a time); when the average CPU utilization is greater than 0.7, the scaling step can be 11.
  • the average CPU utilization can be divided into average CPU utilization greater than or equal to 0.3 and less than 0.4, average CPU utilization greater than or equal to 0.15 and less than 0.3, average CPU utilization greater than or equal to 0 and less than 0.15.
  • the above scenarios correspond to The scaling step size can be set to -3 (meaning to reduce 3 instances at a time), -5, and -9 in sequence.
  • the predictive scaling strategy may also include one of multiple scaling rules (such as expansion rules or scaling rules).
  • the predicted scaling strategy can include expansion rules.
  • the predicted scaling strategy may include scaling rules. In this way, it can be avoided that when business traffic jitters, it is mistakenly identified as business traffic having an opposite change trend, thereby affecting the accuracy of the prediction scaling strategy.
  • the policy prediction device 102 can also determine the change rate of the business traffic applied in the next cycle, and the business traffic applied in the current cycle. rate of change.
  • the change rate may be a growth rate or a decrease rate.
  • the scaling step size may be increased.
  • the scaling step size can be increased from 5 to 10.
  • the stretching step size can be reduced, for example, from -2 to -5.
  • the policy prediction device 102 can set different alarm thresholds for different periods.
  • the policy prediction device 102 can obtain the alarm threshold of the next period based on the business traffic applied in the current period.
  • scaling conditions are triggered, for example, when the average performance value of the instances in the application's resource group, such as the average CPU utilization, reaches the alarm threshold for the next period, adjust the number of instances in the resource group.
  • the alarm threshold of the current period can be 0.6. Based on the business traffic of the current period, when the business traffic of the next period is predicted to have a sudden growth trend, the alarm threshold of the next period can be obtained, for example, 0.5. Correspondingly, the predicted scaling strategy can be to adjust the number of instances in the resource group when the number of instances in the resource group reaches 0.5 in the next cycle.
  • the policy execution device 104 obtains the predicted scaling policy for the applied resource group in the next cycle from the policy prediction device 102.
  • the policy execution device 104 may periodically acquire the predicted scaling policy of the applied resource group from the policy prediction device 102 according to the period in which the policy prediction device 102 acquires the predicted scaling policy of the applied resource group. In each cycle, the policy execution device 104 may obtain the predicted scaling policy for the next cycle from the policy prediction device 102 .
  • the policy execution device 104 determines whether the predicted scaling policy of the applied resource group in the next cycle is updated relative to the scaling policy applied in the current cycle. If yes, execute S210; if not, execute S212.
  • the policy execution device 104 can also compare the predicted scaling policy of the applied resource group in the next cycle with the scaling policy of the applied resource group in the current cycle. For example, the policy executing device 104 can compare the trigger conditions and scaling steps, so as to Determine whether the predicted scaling policy for the next cycle is updated relative to the scaling policy for the current cycle.
  • the policy execution device 104 executes S210.
  • the predicted scaling policy of the applied resource group in the next cycle is updated relative to the applied scaling policy.
  • the policy execution device 104 executes S212.
  • the policy execution device 104 executes the predictive scaling policy in the next cycle.
  • the policy execution device 104 may execute the predictive scaling policy in different ways.
  • the resources used by the application may include container resources or virtual machine resources. The specific implementation manner in which the policy execution device 104 executes the updated capacity expansion and contraction policy when applications use different resources will be described in detail below.
  • At least one instance running the application in the resource group may be a container instance.
  • a container instance refers to an instance encapsulated in a container.
  • the resource platform corresponding to the container instance is the container platform.
  • the container platform can manage the container instances. For example, the container platform can increase or decrease the number of container instances to meet business needs.
  • the container platform can be a native container platform, and the native container platform includes kubernetes, referred to as k8s.
  • the container platform may also be a non-native container platform, such as a platform modified by developers based on the native container platform.
  • the container platform may be a Cloud Container Engine (CCE).
  • the policy execution device 104 can update the elastic horizontal scaling (Horizontal Pod Autoscaling, HPA) policy of the CCE workload in the CCE platform.
  • the policy execution device 104 can modify the configuration file, for example, modify the parameters related to the elastic horizontal scaling policy in the configuration file of the CCE workload, and then forward the modified configuration file (including the HPA policy) to the CCE.
  • CCE can automatically increase or decrease the number of instances in the resource group according to the HPA policy in the configuration file to achieve elastic scaling of the business cluster.
  • At least one instance running the application in the resource group may be a virtual machine instance.
  • a virtual machine instance refers to an instance encapsulated in a virtual machine.
  • the resource platform corresponding to the virtual machine instance is the virtual machine platform.
  • the virtual machine platform can manage the virtual machine instances. For example, the virtual machine platform can increase or decrease the number of virtual machine instances to meet business needs.
  • the policy execution device 104 may update the number of changes to the virtual machine instances in the virtual machine management component. For example, when the prediction scaling policy indicates that the scaling step is 11, the number of changes to the virtual machine instances may be 11, indicating that 11 virtual machine instances will be added. For example, when the predictive scaling policy indicates that the scaling step is -3, the number of changes to virtual machine instances can be -3, which means that three virtual machine instances will be reduced.
  • the virtual machine platform adjusts the number of virtual machine instances in the resource group according to the changed number of the virtual machine instances.
  • the virtual machine platform can use the virtual machine creation interface to create a new virtual machine instance according to the number of changes, or use the virtual machine deletion interface to reclaim virtual machine resources and delete existing virtual machine instances.
  • S212 The policy execution device 104 continues to execute the scaling policy of the current period in the next period.
  • the policy execution device 104 may keep the scaling policy unchanged and scale the business cluster according to the scaling policy of the current period.
  • embodiments of this application provide a scaling method for a business cluster.
  • the policy prediction device 102 interfaces with the business cluster 20 to obtain real-time business traffic, predicts resource requirements in the next period based on the real-time business traffic, and thereby periodically outputs the predicted scaling strategy.
  • the policy execution device 104 actively executes the predicted scaling strategy. There is no need to passively wait for the monitoring indicator to reach the alarm threshold, nor does it need to reach the first alarm threshold in multiple consecutive periods. Then reduce the number of instances in the resource group to a smaller extent, or increase the number of instances in the resource group to the maximum extent after reaching the second alarm threshold in a cycle, thus avoiding resource waste and improving resource utilization.
  • the online education application provides audio and video transcoding services to transcode the teaching audio and video into different formats so that they can be played on different types of terminals.
  • the business traffic of audio and video transcoding services can be different.
  • the business traffic can reach the peak value from 7 to 9 pm, and the business traffic can reach the valley value from 2 to 6 am.
  • the application can be deployed to the cloud platform so that the deployment volume can be adjusted based on business traffic.
  • applications can be deployed to the cloud platform using containerized deployment.
  • the policy execution device 104 periodically obtains the predicted scaling strategy for the online education application in the next cycle from the policy prediction device 102.
  • the predicted scaling strategy includes scaling rules (such as expansion rules or scaling rules).
  • the scaling step size in the scaling rule can be simulated by the policy prediction device 102 by inputting multiple candidate step sizes.
  • the curve of the number of instances changing with time can be obtained.
  • the average value can be simulated The curve of CPU utilization changing over time.
  • the policy execution device 104 obtains the prediction result, that is, the predicted scaling policy applied in the next cycle, it determines whether the predicted scaling policy has been updated relative to the scaling policy of the current cycle. If so, it modifies the transcoding CCE work according to the predicted scaling policy. HPA policy of the load, and sends the modified HPA policy to CCE so that CCE can scale the business cluster according to the modified HPA policy.
  • the policy prediction device 102 can also provide the simulation service with a real curve of the number of instances changing over time, and a real curve of the average CPU utilization changing over time, so as to facilitate and The simulation results are compared to optimize the simulation effect of the simulation service and provide a more appropriate scaling step size.
  • the embodiment of the present application also provides a scaling management system 10 as described above.
  • the scaling management system 10 provided by the embodiment of the present application will be introduced below with reference to the accompanying drawings.
  • the system 10 includes a policy prediction device 102 and a policy execution device 104 .
  • the strategy prediction device 102 includes a communication module 1022 and a prediction module 1024.
  • the communication module 1022 is used to obtain the business traffic of the application in the current cycle;
  • the prediction module 1024 is used to obtain the predicted scaling strategy of the resource group of the application in the next cycle based on the business traffic of the application in the current cycle.
  • the policy execution device 104 includes a communication module 1042 and a scaling module 1044.
  • the communication module 1042 is used to obtain the predicted scaling strategy of the application's resource group in the next period
  • the scaling module 1044 is used to calculate the predicted scaling strategy of the application's resource group in the next period relative to the application's predicted scaling strategy in the current period.
  • the predictive scaling strategy is executed in the next cycle.
  • FIG. 1A is a schematic division method of the scaling management system 10.
  • the scaling management system 10 may also have other structures.
  • the system 10 includes:
  • Communication module 1062 used to obtain the business traffic of the application in the current cycle
  • the prediction module 1064 is used for the scaling management system to obtain the predicted scaling strategy of the resource group of the application in the next cycle based on the business traffic of the application in the current cycle;
  • the scaling module 1066 is configured to execute the predicted scaling policy in the next cycle when the predicted scaling policy of the resource group of the application in the next cycle is updated relative to the scaling policy of the application in the current cycle.
  • the at least one instance includes a container instance or a virtual machine instance
  • the scaling module 1044 in Figure 1A is specifically used to:
  • the number of instances in the resource group is adjusted in the next cycle.
  • the scaling module 1044 in Figure 1A (or the scaling module 1066 in Figure 1B) is specifically used to:
  • the target value includes the minimum number of instances and/or the maximum number of instances.
  • the scaling module 1044 in Figure 1A is specifically used to:
  • the number of instances in the resource group is reduced to the maximum number of instances in the next cycle.
  • the minimum number of instances and the maximum number of instances are determined based on the service traffic and the average performance value of instances in the resource group.
  • the scaling module 1044 in Figure 1A (or the scaling module 1066 in Figure 1B) is specifically used to:
  • the number of instances in the resource group is adjusted according to the scaling step in the next cycle.
  • the scaling module 1044 in Figure 1A (or the scaling module 1066 in Figure 1B) is specifically used to:
  • the number of instances in the resource group is adjusted according to the first scaling step in the next cycle
  • the average performance value of at least one instance in the resource group reaches the second performance value
  • the number of instances in the resource group is adjusted according to the second scaling step in the next cycle, and the second performance value is greater than the first performance value
  • the second telescopic step length is larger than the first telescopic step length.
  • the scaling module 1044 in Figure 1A (or the scaling module 1066 in Figure 1B) is specifically used to:
  • the number of instances in the resource group is adjusted.
  • the scaling management system 10 may correspond to performing the method described in the embodiment of the present application, and the above and other operations and/or functions of each module/unit of the scaling management system 10 are respectively to implement the implementation shown in Figure 2
  • the corresponding processes in each method in the example will not be repeated here for the sake of brevity.
  • An embodiment of the present application also provides a computer cluster.
  • the computer cluster is specifically used to implement the functions of the scaling management system 10 as shown in Figure 1A or Figure 1B.
  • Figure 5 provides a schematic structural diagram of a computer cluster.
  • the computer cluster 50 includes multiple computers 500.
  • the computers 500 include a bus 501, a processor 502, a communication interface 503 and a memory 504.
  • the processor 502, the memory 504 and the communication interface 503 communicate through the bus 501.
  • the bus 501 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 5, but it does not mean that there is only one bus or one type of bus.
  • the processor 502 may be a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • the communication interface 503 is used for communicating with the outside.
  • the communication interface 503 is used to obtain the service traffic of the application in the current cycle, and so on.
  • Memory 504 may include volatile memory, such as random access memory (RAM). Memory 504 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (solid state drive) ,SSD).
  • RAM random access memory
  • ROM read-only memory
  • HDD hard disk drive
  • SSD solid state drive
  • Computer readable instructions are stored in the memory 504, and the processor 502 executes the computer readable instructions, so that the computer cluster 50 executes the aforementioned scaling method of the business cluster (or implements the functions of the aforementioned scaling management system 10).
  • the software or program code required to perform the functions of each module in FIG. 1A or FIG. 1B may be stored in at least one memory 504 in the computer cluster 50 .
  • At least one processor 502 executes the program code stored in the memory 504, so that the computer cluster 50 executes the aforementioned scaling method of the business cluster.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that the computer cluster 50 can store or a data storage device such as a data center containing one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc.
  • the computer-readable storage medium includes instructions that instruct the computer cluster 50 to perform the above scaling method of the business cluster.
  • An embodiment of the present application also provides a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer cluster 50, the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transmitted over a wired connection from a website, computer, or data center. (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website, computer or data center.
  • the computer program product may be a software installation package. If it is necessary to use any of the foregoing business cluster scaling methods, the computer program product may be downloaded and executed on the computer cluster 50 .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请提供了一种业务集群的伸缩方法,业务集群包括至少一个资源组,每个资源组包括运行应用的至少一个实例,该方法包括:伸缩管理系统获取应用在当前周期的业务流量,根据应用在当前周期的业务流量,获取应用的资源组在下一周期的预测伸缩策略,当应用的资源组在下一周期的预测伸缩策略相对于应用在当前周期的伸缩策略有更新时,伸缩管理系统在下一周期执行预测伸缩策略。由于基于实时的业务流量预测下一周期的资源需求,从而周期性地输出预测伸缩策略。在下一周期的预测伸缩策略相对于当前周期的伸缩策略有更新时,在下一周期主动执行预测伸缩策略,无需被动等待监控指标达到告警阈值,提高了资源利用率。

Description

一种业务集群的伸缩方法及相关设备
本申请要求于2022年05月27日提交中国国家知识产权局、申请号为202210589198.6、发明名称为“一种业务集群的伸缩方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及云计算技术领域,尤其涉及一种业务集群的伸缩方法、伸缩管理系统以及计算机集群、计算机可读存储介质、计算机程序产品。
背景技术
应用程序(application program),也可以简称为应用(application,app),是指为针对用户的某种特殊应用目的所撰写的程序,例如文本处理器、表格、会计应用、媒体播放器、航空飞行模拟器、命令行游戏、图像编辑器等。
随着业务流量的增多,应用的压力越来越大,为此可以采用集群化部署方式,并调整应用的部署量以应对高并发。然而,业务流量并非一成不变的,当业务流量减少时,应用部署过多反而造成资源的浪费,此时需要减少部署量来降低资源的消耗。
目前,业界主要采用配置告警策略的方式弹性伸缩集群。具体地,业务集群包括至少一个资源组,每个资源组包括运行应用的至少一个实例。针对资源组中的实例设置有监控指标,以用于监控实例的性能。该监控指标例如可以为中央处理器(central processing unit,CPU)利用率。当资源组中实例的平均CPU利用率达到设定的告警阈值时触发业务集群的伸缩。其中,业务集群的伸缩包括增加资源组中实例的数量(即增加应用的部署量,简称为扩容),或者是缩减资源组中实例的数量(即减少应用的部署量,简称为缩容)。
然而,上述方案是在监控指标(如平均CPU利用率)达到告警阈值时进行业务集群的伸缩,如此存在一定时延,在极端情况下,可能存在增加资源组中实例的数量不及时导致对业务造成不良影响的风险。为此,告警策略中的伸缩策略通常比较保守。例如,伸缩策略可以设置为平均CPU利用率在连续三个周期均低于25%,则减少2个实例,伸缩策略还可以设置为平均CPU利用率在一个周期高于55%,则增加5个实例。
如此,在缩减资源组中实例的数量时,若业务流量快速下降,部分资源不能及时回收导致资源浪费,在增加资源组中实例的数量时,由于按照较大步长增加实例,也导致资源浪费。
发明内容
本申请提供了一种业务集群的伸缩方法,该方法通过获取实时的业务流量,基于实时的业务流量预测下一周期的资源需求,从而周期性地输出预测伸缩策略,在下一周期的预测伸缩策略相对于当前周期的伸缩策略有更新时,在下一周期主动执行该预测伸缩策略,无需被动等待监控指标达到告警阈值,如此在保障业务良好运行时,避免了资源浪费,提 高了资源利用率。
第一方面,本申请提供了一种业务集群的伸缩方法。该方法可以由伸缩管理系统执行。该伸缩管理系统可以是软件系统,软件系统可以部署在云平台的计算机集群中,计算机集群运行上述软件系统,从而执行本申请实施例的业务集群的伸缩方法。在一些实施例中,该伸缩管理系统也可以是硬件系统。该硬件系统可以包括云平台中的一台或多台计算机。该硬件系统运行时,执行本申请实施例的业务集群的伸缩方法。
具体地,业务集群包括至少一个资源组,每个资源组包括运行应用的至少一个实例。伸缩管理系统可以获取应用在当前周期的业务流量,根据所述应用在当前周期的业务流量,获取所述应用的资源组在下一周期的预测伸缩策略,当所述应用的资源组在下一周期的预测伸缩策略相对于所述应用在当前周期的伸缩策略有更新时,所述伸缩管理系统在下一周期执行所述预测伸缩策略。
在该方法中,伸缩管理系统通过获取应用在当前周期的业务流量,基于实时的业务流量预测下一周期的资源需求,从而周期性地输出预测伸缩策略。在下一周期的预测伸缩策略相对于当前周期的伸缩策略有更新时,在下一周期主动执行预测伸缩策略,无需被动等待监控指标达到告警阈值,更不需要在连续多个周期达到第一告警阈值后以较小幅度缩减资源组中实例的数量,或者在一个周期达到第二告警阈值后以最大幅度增加资源组中实例的数量,如此避免了资源浪费,提高了资源利用率。
在一些可能的实现方式中,所述至少一个实例包括容器实例或虚拟机实例。相应地,所述伸缩管理系统可以根据所述预测伸缩策略,在下一周期调整所述资源组中实例的数量,从而实现弹性伸缩业务集群,以满足业务的不同需求。
在一些可能的实现方式中,所述伸缩管理系统在下一周期执行所述预测伸缩策略时,可以直接在下一周期将所述资源组中实例的数量调整至目标值,或者,在下一周期按照伸缩步长调整所述资源组中实例的数量。在一些实施例中,预测伸缩策略可以包括下一周期的伸缩条件,伸缩管理系统也可以在该伸缩条件被触发时,调整所述资源组中实例的数量。
该方法中,伸缩管理系统可以提供多种伸缩规则对业务集群进行弹性伸缩,能够覆盖多种场景,具有较高可用性。
在一些可能的实现方式中,所述目标值包括最小实例数量和/或最大实例数量。相应地,所述资源组中当前实例数量小于所述最小实例数量时,所述伸缩管理系统在下一周期增加所述资源组中实例的数量至所述最小实例数量。所述资源组中当前实例数量大于所述最大实例数量时,所述伸缩管理系统在下一周期减少所述资源组中实例的数量至所述最大实例数量。
如此可以实现在流量突发时,快速增加资源组中实例的数量,以满足业务需求,避免影响业务,在流量骤降时,快速减少资源组中实例的数量,以回收资源,提高资源利用率。
在一些可能的实现方式中,所述最小实例数量和所述最大实例数量根据所述业务流量以及所述资源组中实例的平均性能值确定。具体地,伸缩管理系统可以根据业务流量、实例数量、性能值之间的关系确定出达到优化目标的实例数量,也即参考数量,然后根据该参考数量确定最小实例数量和最大实例数量。
如此,即使下一周期的实际流量与预测结果存在一定偏差,也可以在最小实例数量和 最大实例数量范围内进行弹性伸缩,从而满足业务的需求。
在一些可能的实现方式中,所述资源组中当前实例数量不小于最小实例数量,且不大于最大实例数量时,表示当前实例数量与能够满足优化目标的实例数量接近,伸缩管理系统可以在下一周期按照伸缩步长调整所述资源组中实例的数量,从而实现以微调的方式进行业务集群的弹性伸缩,满足业务的需求。
在一些可能的实现方式中,伸缩管理系统还可以结合被动伸缩实现业务集群的弹性伸缩。具体地,所述伸缩管理系统可以针对不同性能值分别设置不同伸缩步长。所述资源组中实例的平均性能值到达第一性能值时,所述伸缩管理系统在下一周期按照第一伸缩步长调整所述资源组中实例的数量。所述资源组中实例的平均性能值到达第二性能值时,所述伸缩管理系统在下一周期按照第二伸缩步长调整所述资源组中实例的数量。该第二性能值大于第一性能值,第二伸缩步长大于所述第一伸缩步长。
如此可以实现对业务集群的精准伸缩,使得业务集群包括的资源组中实例的平均性能值达到优化目标。
在一些可能的实现方式中,伸缩管理系统获取的应用的资源组在下一周期的预测伸缩策略可以包括下一周期的告警阈值。换言之,不同周期的告警阈值可以是不同的,例如可以随着业务流量变化而变化。相应地,伸缩管理系统可以在所述资源组中实例的平均性能值达到下一周期的告警阈值时,调整所述资源组中实例的数量。
如此,当有流量突发时,可以提前调整资源组中实例的数量,从而保障业务的正常运行,且不浪费资源。
第二方面,本申请提供了一种伸缩管理系统。所述伸缩管理系统用于对业务集群进行伸缩,所述业务集群包括至少一个资源组,每个资源组包括运行应用的至少一个实例,所述系统包括:
通信模块,用于获取所述应用在当前周期的业务流量;
预测模块,用于根据所述应用在当前周期的业务流量,获取所述应用的资源组在下一周期的预测伸缩策略;
伸缩模块,用于当所述应用的资源组在下一周期的预测伸缩策略相对于所述应用在当前周期的伸缩策略有更新时,在下一周期执行所述预测伸缩策略。
在一些可能的实现方式中,所述至少一个实例包括容器实例或虚拟机实例,所述伸缩模块具体用于:
根据所述预测伸缩策略,在下一周期调整所述资源组中实例的数量。
在一些可能的实现方式中,所述伸缩模块具体用于:
在下一周期将所述资源组中实例的数量调整至目标值;或者,
在下一周期按照伸缩步长调整所述资源组中实例的数量;或者
在伸缩条件被触发时,调整所述资源组中实例的数量。
在一些可能的实现方式中,所述目标值包括最小实例数量和/或最大实例数量,所述伸缩模块具体用于:
所述资源组中当前实例数量小于所述最小实例数量时,在下一周期增加所述资源组中实例的数量至所述最小实例数量;或者,
所述资源组中当前实例数量大于所述最大实例数量时,在下一周期减少所述资源组中实例的数量至所述最大实例数量。
在一些可能的实现方式中,所述最小实例数量和所述最大实例数量根据所述业务流量以及所述资源组中实例的平均性能值确定。
在一些可能的实现方式中,所述伸缩模块具体用于:
所述资源组中当前实例数量不小于最小实例数量,且不大于最大实例数量时,在下一周期按照伸缩步长调整所述资源组中实例的数量。
在一些可能的实现方式中,所述伸缩模块具体用于:
所述资源组中至少一个实例的平均性能值到达第一性能值时,在下一周期按照第一伸缩步长调整所述资源组中实例的数量;
所述资源组中至少一个实例的平均性能值到达第二性能值时,在下一周期按照第二伸缩步长调整所述资源组中实例的数量,所述第二性能值大于第一性能值,所述第二伸缩步长大于所述第一伸缩步长。
在一些可能的实现方式中,所述伸缩模块具体用于:
所述资源组中实例的平均性能值达到下一周期的告警阈值时,调整所述资源组中实例的数量。
第三方面,本申请提供一种计算机集群。所述计算机集群包括至少一台计算机,所述至少一台计算机包括至少一个处理器和至少一个存储器。所述至少一个处理器、所述至少一个存储器进行相互的通信。所述至少一个处理器用于执行所述至少一个存储器中存储的指令,以使得计算机集群执行如第一方面或第一方面的任一种实现方式中的方法。
第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,所述指令指示计算机集群执行上述第一方面或第一方面的任一种实现方式所述的方法。
第五方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机集群上运行时,使得计算机集群执行上述第一方面或第一方面的任一种实现方式所述的方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需使用的附图作以简单地介绍。
图1A为本申请实施例提供的一种资源调度系统的架构示意图;
图1B为本申请实施例提供的一种资源调度系统的架构示意图;
图2为本申请实施例提供的一种资源调度方法的流程图;
图3为本申请实施例提供的一种资源调度方法的流程图;
图4为本申请实施例提供的一种仿真结果的示意图;
图5为本申请实施例提供的一种计算机集群的硬件结构图。
具体实施方式
本申请实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。
首先对本申请实施例中所涉及到的一些技术术语进行介绍。
应用,是指为针对用户的某种特殊应用目的所撰写的程序,例如文本处理器、表格、会计应用、媒体播放器、航空飞行模拟器、命令行游戏、图像编辑器。该程序可以部署在云平台中。
云平台,是指将计算、存储或网络等能力以云服务的方式提供给用户的平台。具体地,云平台可以按需提供计算、存储或网络等能力,以实现集群化方式部署应用。云平台可以构建业务集群,业务集群包括至少一个资源组。每个资源组包括运行应用的至少一个实例(instance)。其中,实例是指运行应用产生的动态代码,该动态代码可以称作进程或线程,进程或线程可以实现相应的应用目的,例如可以实现视频播放、图像编辑等等。云平台可以调整资源组中实例的数量,从而实现弹性调整应用的部署量。如此可以实现在业务流量较高时,增加应用的部署量,以应对应用的高并发,以及在业务流量较低时,减少应用的部署量,以降低资源的消耗,提高资源利用率。
目前,业界主要采用配置告警策略的方式对业务集群进行弹性伸缩。具体地,针对资源组中的实例设置有监控指标,该监控指标例如可以为CPU利用率、内存利用率、输入输出(input output,IO)利用率等性能值。考虑到资源组包括一个或多个实例,还可以确定资源组中实例的平均性能值。当监控指标(如平均CPU利用率等平均性能值)达到设定的告警阈值时触发业务集群的伸缩,以调整应用的部署量。
考虑到监控指标达到告警阈值时进行业务集群的伸缩存在一定时延,在极端情况下,可能存在伸缩不及时导致对业务造成不良影响的风险。告警策略中的伸缩策略通常比较保守。例如,伸缩策略可以设置为平均CPU利用率在连续三个周期均低于25%,则减少2个实例,平均CPU利用率在一个周期高于55%,则增加5个实例。
基于上述伸缩策略,业务流量快速下降的情况下,仍然要等待三个周期才能缩减实例的数量,并且缩减幅度较小,部分资源不能及时回收,导致资源浪费。在业务流量上升的情况下,无论业务流量的增长率(例如为流量曲线的斜率)是否达到峰值,均按照最大幅度增加实例的数量,也可以导致资源浪费。
有鉴于此,本申请实施例提供了一种业务集群的伸缩方法。该方法可以由伸缩管理系统执行。该伸缩管理系统可以是软件系统,软件系统可以部署在云平台的计算机集群中,计算机集群运行上述软件系统,从而执行本申请实施例的业务集群的伸缩方法。在一些实施例中,该伸缩管理系统也可以是硬件系统。该硬件系统可以包括云平台中的一台或多台计算机。该硬件系统运行时,执行本申请实施例的业务集群的伸缩方法。
具体地,业务集群包括至少一个资源组,每个资源组包括运行应用的至少一个实例,伸缩管理系统获取应用在当前周期的业务流量,根据应用在当前周期的业务流量,根据应用在当前周期的业务流量,获取应用的资源组在下一周期的预测伸缩策略,当应用的资源组在下一周期的预测伸缩策略相对于应用在当前周期的伸缩策略有更新时,伸缩管理系统 在下一周期执行预测伸缩策略。
在该方法中,伸缩管理系统通过获取应用在当前周期的业务流量,基于实时的业务流量预测下一周期的资源需求,从而周期性地输出预测伸缩策略。在下一周期的预测伸缩策略相对于当前周期的伸缩策略有更新时,在下一周期主动执行预测伸缩策略,无需被动等待监控指标达到告警阈值,更不需要在连续多个周期达到第一告警阈值后以较小幅度缩减资源组中实例的数量,或者在一个周期达到第二告警阈值后以最大幅度增加资源组中实例的数量,如此避免了资源浪费,提高了资源利用率。
为了使得本申请的技术方案更加清楚、易于理解,下面先结合附图对本申请实施例的伸缩管理系统进行介绍。
参见图1A所示的伸缩管理系统的架构示意图,该伸缩管理系统10包括策略预测装置102和策略执行装置104。上述策略预测装置102、策略执行装置104可以通过软件实现,或者可以通过硬件实现。其中,策略预测装置102接入业务集群20,业务集群20包括至少一个资源组,每个资源组包括运行应用的至少一个实例。策略执行装置104接入资源平台30。与伸缩管理系统10类似,上述资源平台30可以是软件平台或硬件平台。其中,资源平台30还可以根据资源类型分为容器资源平台(简称为容器平台)或虚拟机(virtual machine)资源平台(简称为虚拟机平台)。其中,容器平台用于增加或缩减资源组中容器实例的数量,虚拟机平台用于增加或缩减资源组中虚拟机实例的数量。
具体地,策略预测装置102用于从业务集群20获取应用在当前周期的业务流量,根据所述应用在当前周期的业务流量,获取所述应用在下一周期的预测伸缩策略。考虑到业务流量可以出现突发增长或突发下降的情况,上述周期通常可以设置为较小值,例如周期可以设置为10分钟(minute,min),以便于及时感知应用的业务流量变化,进而及时根据业务流量变化对伸缩策略进行调整。
策略执行装置104用于所述应用的资源组在下一周期的预测伸缩策略相对于所述应用在当前周期的伸缩策略有更新时,在下一周期执行所述预测伸缩策略。例如,策略执行装置104可以在下一周期执行预测伸缩策略,从而使得资源平台30增加或减少应用的资源组中实例的数量。
以上对策略预测装置102和策略执行装置104的交互过程进行了详细说明,下面对策略预测装置102包括通信模块1022和预测模块1024的结构分别进行说明。
如图1A所示,策略预测装置102包括通信模块1022和预测模块1024。其中,通信模块1022用于获取应用在当前周期的业务流量,预测模块1024用于根据所述应用在当前周期的业务流量,获取所述应用的资源组在下一周期的预测伸缩策略。策略执行装置104包括通信模块1042和伸缩模块1044。其中,通信模块1042用于获取应用的资源组在下一周期的预测伸缩策略,伸缩模块1044用于应用在下一周期的预测伸缩策略相对于所述应用在当前周期的伸缩策略有更新时,在下一周期执行所述预测伸缩策略。
图1A以策略预测装置102和策略执行装置104为独立装置进行示例说明,在一些可能的实现方式中,策略预测装置102和策略执行装置104可以集成为一个装置。具体地,参见图1B所示的伸缩管理系统的另一种架构示意图,该伸缩管理系统10包括通信模块 1062、预测模块1064和伸缩模块1066。其中,预测模块1064和伸缩模块1066的具体实现可以参见预测模块1024和伸缩模块1044相关内容描述,通信模块1062具有上述通信模块1022和通信模块1042的功能。
需要说明的是,图1A和图1B仅仅是对伸缩管理系统10的一些示意性的划分方式,并且,伸缩管理系统10的上述装置或模块是从功能角度进行划分得到,在本申请实施例其他可能的实现方式中,伸缩管理系统10也可以从其他角度划分为不同装置或模块。此外,上述装置为软件装置,上述模块为软件模块时,上述装置或模块可以集中地部署在一台计算机中,也可以分布式地部署在计算机集群的不同计算机中。上述装置为硬件装置,上述模块为硬件模块时,上述多个装置或多个模块可以对应一台计算机,或者对应计算机集群中的不同计算机。
图1A和图1B对本申请实施例的伸缩管理系统10进行了介绍,下面结合附图,以图1A所示的伸缩管理系统10执行本申请实施例的业务集群的伸缩方法进行介绍。
参见图2所示的业务集群的伸缩方法的流程图,伸缩管理系统系统10包括策略预测装置102和策略执行装置104,该方法包括:
S202:策略预测装置102从业务集群20获取应用在当前周期的业务流量。
业务流量是指用户使用应用触发业务请求产生的流量。基于此,业务流量可以通过单位时间内的业务请求数量表征。该单位时间例如可以为秒或分钟。以广播电视媒体、短视频、在线教育等场景为例,业务集群20可以为音视频转码集群,考虑到用户可以使用多种不同类型的终端,例如是个人计算机(personal computer,PC)、智能手机、智能手表,来播放视频,音视频转码集群可以在接收业务请求后,在云端将音视频进行转码,并推送转码后的音视频至相应的终端进行播放。其中,音视频转码集群在单位时间内接收到的业务请求的数量记作业务流量。
业务流量可以是动态变化的。例如,一些应用在特定时间段(如三餐时间段)具有较高的业务流量,在其他时间段具有较低的业务流量。在一些情况下,应用还可以产生突发业务流量,例如应用被推广至其他平台时,可以产生突发的业务流量。
为了及时感知业务流量的变化趋势,策略预测装置102可以设置较小的周期。例如周期可以设置为10min。基于此,策略预测装置102可以每间隔10min从业务集群20获取业务流量。策略预测装置102当前周期获取的业务流量即为应用在当前周期的业务流量。
应用在当前周期的业务流量属于时序数据,该时序数据可以通过数组或者流量曲线进行表征。在一些实施例中,业务流量采用数组表征时,数组中的数值为当前周期内不同时间点的业务流量采样值。在另一些实施例中,业务流量采用流量曲线表征时,该流量曲线可以是通过对当前周期内不同时间点以及各时间点的业务流量采样值进行曲线拟合得到。
S204:策略预测装置102根据应用在当前周期的业务流量,获取所述应用的资源组在下一周期的预测伸缩策略。
具体地,策略预测装置102可以根据应用在当前周期的业务流量,预测应用在下一周期的业务流量,进而基于应用在下一周期的业务流量,预测应用的资源组在下一周期的预测伸缩策略。
其中,应用在当前周期的业务流量可以反映流量变化趋势,基于该流量变化趋势可以预测应用在下一周期的业务流量。例如,应用在当前周期的业务流量以一较稳定的增长率增长时,则应用在下一周期的业务流量有较高概率以该增长率进行增长。在一些可能的实现方式中,策略预测装置102可以将应用在当前周期的业务流量输入流量预测模型,以预测应用在下一周期的业务流量。
流量预测模型可以通过历史流量数据训练得到。历史流量数据可以为历史N天的业务流量。其中,N为正数,例如N可以为15。历史N天的业务流量可以以小时为单位分成多个时间序列,两个相邻的时间序列可以构造为一个样本数据,如此可以获得样本集。样本集还可以进一步分为训练集、验证集和测试集。
在训练流量预测模型时,策略预测装置102可以采用基于时间序列的神经网络模型,如长短期记忆(long short term memory,LSTM)网络构造初始流量预测模型,该初始流量预测模型可以通过高斯分布初始或者随机分布初始化等方式进行参数初始化,然后将训练集中的样本数据输入该初始流量预测模型进行参数迭代,当训练后的模型满足训练停止条件时,例如训练后的模型的损失值趋于收敛,或者损失值小于预设值时,可以停止训练。
进一步地,策略预测装置102可以将验证集的样本数据输入训练后的模型,根据模型在验证集上的表现(performance)调整超参数,从而对模型进行进一步优化。优化后的模型可以包括多个,这多个模型可以在测试集上进行测试,从而得到多个模型的性能。其中,模型的性能可以通过准确度、推理时间等指标中的一种或多种进行衡量。策略预测装置102可以根据各模型的性能,选择性能较好的模型作为流量预测模型,从而用于周期性地预测应用在下一周期的业务流量。
策略预测装置102可以设定优化目标,然后寻找能够达到上述优化目标的实例数量。考虑到预测的业务流量可以与真实的业务流量存在一定差异,策略预测装置102可以将该实例数量作为参考,确定出最大实例数量、最小实例数量,并在最小实例数量到最大实例数量范围内调整资源组中实例的数量。为了便于描述,达到上述优化目标的实例数量也称作参考数量。
在一些实施例中,优化目标可以为资源组中实例的平均CPU利用率为60%,即目标平均CPU利用率为60%,策略预测装置102可以结合应用在下一周期的业务流量确定参考数量。
具体地,业务流量和实例数量、平均CPU利用率满足如下关系式:
α*实例数量*平均CPU利用率=业务流量         (1)
其中,α为系数。根据单个实例满载时并发处理的业务请求数量可以确定上述系数。假定单个实例满载时并发处理的业务请求数量为150,则可以假定业务流量为150,实例数量为1,平均CPU利用率为100%,将上述取值代入公式(1)可以确定系数α,在该实例中,α可以为150。
策略预测装置102可以将目标平均CPU利用率和应用在下一周期的业务流量(预测结果)输入上述公式(1),从而确定参考数量。其中,应用在下一周期的业务流量通过曲线表征时,可以对该曲线进行积分,或者对曲线中的采样点求平均值,然后输入上述公式(1),确定参考数量。
考虑到突发流量的可能性,策略预测装置102可以根据参考数量,确定最大实例数量和最小实例数量。例如,策略预测装置102可以设置伸缩比例,根据该伸缩比例和参考数量确定最大实例数量和最小实例数量。假设伸缩比例为50%,参考数量为100,则最小比例数量可以为50,最大比例数量可以为150。
策略预测装置102确定的预测伸缩策略可以包括当前实例数量小于最小实例数量时,在下一周期增加资源组中实例的数量至该最小实例数量。该预测伸缩策略也可以包括当前实例数量大于最大实例数量时,在下一周期减少资源组中实例的数量至最大实例数量。换言之,当前实例数量小于最小实例数量或大于最大实例数量时,可以触发主动伸缩,实现快速增加资源组中实例的数量或快速缩减资源组中实例的数量,而无需等待监控指标达到告警阈值。
此外,设置上述最小实例数量还可以防止实例数量过小导致突发流量或者其他情况发生时应用的业务受到影响,难以对外提供服务。设置最大实例数量可以避免过度增加实例的数量导致资源浪费。
进一步地,策略预测装置102确定的预测伸缩策略可以包括:当前实例数量大于或等于上述最小实例数量,且小于或等于最大实例数量时,在下一周期按照伸缩步长调整资源组中实例的数量。
其中,当前实例数量大于或等于上述最小实例数量,且小于或等于最大实例数量时,策略预测装置102确定的预测伸缩策略还可以结合被动伸缩。具体地,策略预测装置102确定的预测伸缩策略可以为:资源组中实例的平均性能值(如平均CPU利用率)达到某一性能值时,按照伸缩步长调整资源组中实例的数量。
进一步地,为了实现较为精准的伸缩,预测伸缩策略还可以进行进一步细分,具体是针对达到不同性能值的情况,分别设置不同的伸缩步长。具体地,资源组中实例的平均性能值达到第一性能值时,按照第一伸缩步长调整资源组中实例的数量,资源组中实例的平均性能值达到第二性能值时,按照第二伸缩步长调整资源组中实例的数量,该第二性能值大于第一性能值,第二伸缩步长大于第一伸缩步长。
例如,平均CPU利用大于0.6可以分为平均CPU利用率大于0.6且小于等于0.7,以及平均CPU利用率大于0.7,针对上述两种场景,预测伸缩策略中可以定义不同的伸缩步长。平均CPU利用率大于0.6且小于等于0.7时,伸缩步长可以为6(表示一次增加6个实例);平均CPU利用率大于0.7时,伸缩步长可以为11。又例如,平均CPU利用率可以分为平均CPU利用率大于或等于0.3且小于0.4、平均CPU利用率大于或等于0.15且小于0.3、平均CPU利用率大于或等于0且小于0.15,上述场景对应的伸缩步长可以依次设置为-3(表示一次减少3个实例)、-5、-9。
为了便于理解,下面还提供了预测伸缩策略的一个示例。
Figure PCTCN2022130526-appb-000001
Figure PCTCN2022130526-appb-000002
Figure PCTCN2022130526-appb-000003
需要说明的是,上述以预测伸缩策略包括多种伸缩规则进行示例说明,在一些可能的实现方式中,预测伸缩策略也可以包括多种伸缩规则(如扩容规则或缩容规则)中的一种。例如,应用在当前周期的业务流量呈现上升趋势时,应用在下一周期的业务流量有较高概率呈现上升趋势,相应地,预测伸缩策略可以包括扩容规则。又例如,应用在当前周期的业务流量呈现下降趋势时,应用在下一周期的业务流量有较高概率呈现下降趋势,相应地,预测伸缩策略可以包括缩容规则。如此,可以避免业务流量出现抖动时被误识别为业务流量有相反的变化趋势,进而影响预测伸缩策略的准确性。
以上为策略预测装置102获得预测伸缩策略的一种实现方式,在一些可能的实现方式中,策略预测装置102也可以确定应用在下一周期的业务流量的变化率,以及应用在当前周期的业务流量的变化率。该变化率可以是增长率或下降率,在下一周期的增长率相较于当前周期的增长率的增幅大于第一预设幅度时,可以增加伸缩步长。例如,伸缩步长可以由5增加为10。在下一周期的下降率相较于当前周期的下降率的降幅大于第二预设幅度时,可以减小伸缩步长,例如由-2变更为-5。
在另一些可能的实现方式中,策略预测装置102可以针对不同周期设置不同的告警阈值。策略预测装置102可以根据应用在当前周期的业务流量,获取下一周期的告警阈值。当伸缩条件被触发,例如应用的资源组中实例的平均性能值如平均CPU利用率达到下一周期的告警阈值时,调整资源组中实例的数量。
下面结合一示例进行说明。该示例中,当前周期的告警阈值可以为0.6,基于当前周期的业务流量,预测下一周期的业务流量有突发增长趋势时,可以获得下一周期的告警阈值,例如为0.5。相应地,预测伸缩策略可以为在下一周期,资源组中实例的数量达到0.5时,调整资源组中实例的数量。
S206:策略执行装置104从策略预测装置102获取应用的资源组在下一周期的预测伸缩策略。
具体地,策略执行装置104可以根据策略预测装置102获取应用的资源组的预测伸缩策略的周期,周期性地从策略预测装置102获取应用的资源组的预测伸缩策略。在每一个周期,策略执行装置104可以从策略预测装置102获取下一周期的预测伸缩策略。
S208:策略执行装置104确定应用的资源组在下一周期的预测伸缩策略相对于应用在当前周期的伸缩策略是否有更新。若是,则执行S210,若否,则执行S212。
具体地,策略执行装置104还可以将应用的资源组在下一周期的预测伸缩策略与应用的资源组在当前周期的伸缩策略进行比较,例如策略执行装置104可以比较触发条件、伸缩步长,从而确定下一周期的预测伸缩策略相对于当前周期的伸缩策略是否有更新。
当应用的资源组在下一周期的预测伸缩策略相对于应用的资源组在当前周期的伸缩策略有更新时,策略执行装置104执行S210,当应用的资源组在下一周期的预测伸缩相对于应用的资源组在当前周期的伸缩策略未更新时,策略执行装置104执行S212。
S210:策略执行装置104在下一周期执行预测伸缩策略。
根据资源组中的实例类型不同,策略执行装置104可以采用不同方式执行预测伸缩策略。其中,应用使用的资源可以包括容器资源或虚拟机资源。下面分别对应用使用不同资源的情况下,策略执行装置104执行更新后的扩缩容策略的具体实现方式进行详细说明。
在一些可能的实现方式中,资源组中运行应用的至少一个实例可以为容器实例。容器实例是指以容器封装的实例。与容器实例对应的资源平台为容器平台,容器平台可以对容器实例进行管理,例如容器平台可以增加或缩减容器实例的数量,以满足业务的需求。
其中,容器平台可以是原生容器平台,该原生容器平台包括kubernetes,简称为k8s。在一些实施例,容器平台也可以是非原生容器平台,如开发者基于原生容器平台改造的平台,例如容器平台可以为云容器引擎(Cloud Container Engine,CCE)。相应地,策略执行装置104可以更新CCE平台中CCE工作负载的弹性水平伸缩(Horizontal Pod Autoscaling,HPA)策略。具体地,策略执行装置104可以修改配置文件,例如将CCE工作负载的配置文件中与弹性水平伸缩策略相关的参数进行修改,然后转发修改后的配置文件(包括HPA策略)至CCE。CCE可以根据配置文件中的HPA策略自动增加或减少资源组中实例的数量,实现业务集群的弹性伸缩。
在另一些可能的实现方式中,资源组中运行应用的至少一个实例可以为虚拟机实例。虚拟机实例是指以虚拟机封装的实例。与虚拟机实例对应的资源平台为虚拟机平台,虚拟机平台可以对虚拟机实例进行管理,例如虚拟机平台可以增加或缩减虚拟机实例的数量,以满足业务的需求。
区别于容器平台,虚拟机平台通常与虚拟机管理组件交互,从而对虚拟机实例进行管理。具体地,策略执行装置104可以在虚拟机管理组件中更新虚拟机实例的变更数量,例如,预测伸缩策略指示伸缩步长为11时,虚拟机实例的变更数量可以为11,表示增加11个虚拟机实例,又例如,预测伸缩策略指示伸缩步长为-3时,虚拟机实例的变更数量可以为-3,表示减少3个虚拟机实例。相应地,虚拟机平台根据所述虚拟机实例的变更数量调整资源组中所述虚拟机实例的数量。其中,虚拟机平台可以根据变更数量,采用虚拟机新建接口创建新的虚拟机实例,或者是采用虚拟机删除接口回收虚拟机资源删除已有的虚拟机实例。
S212:策略执行装置104在下一周期继续执行当前周期的伸缩策略。
具体地,策略执行装置104可以保持伸缩策略不变,按照当前周期的伸缩策略进行业务集群的伸缩。
需要说明的是上述S212为本申请实施例的可选步骤,执行本申请实施例的业务集群的伸缩方法也可以不执行上述S212。
基于上述内容描述,本申请实施例提供了一种业务集群的伸缩方法。该方法中,策略预测装置102对接业务集群20获得实时的业务流量,基于实时的业务流量预测下一周期的资源需求,从而周期性地输出预测伸缩策略。在下一周期的预测伸缩策略相对于当前周期的伸缩策略有更新时,策略执行装置104主动执行预测伸缩策略,无需被动等待监控指标达到告警阈值,更不需要在连续多个周期达到第一告警阈值后以较小幅度缩减资源组中实例的数量,或者在一个周期达到第二告警阈值后以最大幅度增加资源组中实例的数量,如此避免了资源浪费,提高了资源利用率。
接下来,将以在线教育应用中的音视频转码场景为例,对本申请实施例的业务集群的伸缩方法进行介绍。
在该场景中,在线教育应用提供有音视频转码业务,以用于将教学音视频转码为不同格式,从而可以实现在不同类型的终端上播放。考虑到不同时间段,音视频转码业务的业务流量可以是不同的,例如在晚上7点至9点业务流量可以达到峰值,在凌晨2点到6点,业务流量达到谷值,为此,可以将该应用部署到云平台,以便于根据业务流量调整部署量。考虑到部署效率以及可移植性,应用可以采用容器化部署方式部署到云平台。
参见图3所示的业务集群的伸缩方法的流程示意图,策略执行装置104周期性地从策略预测装置102获取该在线教育应用在下一周期的预测伸缩策略。该预测伸缩策略包括伸缩规则(如扩容规则或缩容规则),伸缩规则中的伸缩步长可以由策略预测装置102通过输入多个候选步长进行仿真得到。具体地,参见图4所示的仿真结果示意图,在一天的每个小时,分别输入候选步长,可以得到实例数量随时间变化的曲线,基于该实例数量以及预测的业务流量,可以仿真出平均CPU利用率随时间变化的曲线,为了避免影响转码业务,可以在仿真服务中设置如下约束:平均CPU利用率不超过70。每一组候选步长视为一个候选策略,策略预测装置102可以在在多个满足上述约束的候选策略中选择成本最低的策略,作为预测结果。策略执行装置104在获取到预测结果,即应用在下一周期的预测伸缩策略时,判断该预测伸缩策略相对于当前周期的伸缩策略是否有更新,若是,则根据该预测伸缩策略修改转码CCE工作负载的HPA策略,并向CCE发送修改后的HPA策略,以使CCE按照修改后的HPA策略对业务集群进行伸缩。
在一些可能的实现方式中,如图4所示,策略预测装置102还可以向仿真服务提供真实的实例数量随时间变化的曲线,以及真实的平均CPU利用率随时间变化的曲线,以便于和仿真结果进行对比,从而优化仿真服务的仿真效果,从而提供更为合适的伸缩步长。
基于本申请实施例提供的业务集群的伸缩方法,本申请实施例还提供了一种如前述的伸缩管理系统10。下面将结合附图对本申请实施例提供的伸缩管理系统10进行介绍。
参见图1A所示的伸缩管理系统10的结构示意图,该系统10包括策略预测装置102和策略执行装置104。其中,策略预测装置102包括通信模块1022和预测模块1024。通信模块1022用于获取所述应用在当前周期的业务流量;预测模块1024用于根据所述应用在当前周期的业务流量,获取所述应用的资源组在下一周期的预测伸缩策略。策略执行装置104包括通信模块1042和伸缩模块1044。其中,通信模块1042用于获取所述应用的资源组在下一周期的预测伸缩策略,伸缩模块1044用于当所述应用的资源组在下一周期的预测伸缩策略相对于所述应用在当前周期的伸缩策略有更新时,在下一周期执行所述预测伸缩策略。
图1A为伸缩管理系统10的一种示意性划分方式,在本申请实施例其他可能的实现方式中,伸缩管理系统10也可以是其他结构。参见图1B所示的伸缩管理系统10的结构示意图,该系统10包括:
通信模块1062,用于获取所述应用在当前周期的业务流量;
预测模块1064,用于所述伸缩管理系统根据所述应用在当前周期的业务流量,获取所述应用的资源组在下一周期的预测伸缩策略;
伸缩模块1066,用于当所述应用的资源组在下一周期的预测伸缩策略相对于所述应用在当前周期的伸缩策略有更新时,在下一周期执行所述预测伸缩策略。
在一些可能的实现方式中,所述至少一个实例包括容器实例或虚拟机实例,图1A中的伸缩模块1044(或图1B中的伸缩模块1066)具体用于:
根据所述预测伸缩策略,在下一周期调整所述资源组中实例的数量。
在一些可能的实现方式中,图1A中的伸缩模块1044(或图1B中的伸缩模块1066)具体用于:
在下一周期将所述资源组中实例的数量调整至目标值;或者,
在下一周期按照伸缩步长调整所述资源组中实例的数量;或者
在伸缩条件被触发时,调整所述资源组中实例的数量。
在一些可能的实现方式中,所述目标值包括最小实例数量和/或最大实例数量,图1A中的伸缩模块1044(或图1B中的伸缩模块1066)具体用于:
所述资源组中当前实例数量小于所述最小实例数量时,在下一周期增加所述资源组中实例的数量至所述最小实例数量;或者,
所述资源组中当前实例数量大于所述最大实例数量时,在下一周期减少所述资源组中实例的数量至所述最大实例数量。
在一些可能的实现方式中,所述最小实例数量和所述最大实例数量根据所述业务流量以及所述资源组中实例的平均性能值确定。
在一些可能的实现方式中,图1A中的伸缩模块1044(或图1B中的伸缩模块1066)具体用于:
所述资源组中当前实例数量不小于最小实例数量,且不大于最大实例数量时,在下一周期按照伸缩步长调整所述资源组中实例的数量。
在一些可能的实现方式中,图1A中的伸缩模块1044(或图1B中的伸缩模块1066)具体用于:
所述资源组中至少一个实例的平均性能值到达第一性能值时,在下一周期按照第一伸缩步长调整所述资源组中实例的数量;
所述资源组中至少一个实例的平均性能值到达第二性能值时,在下一周期按照第二伸缩步长调整所述资源组中实例的数量,所述第二性能值大于第一性能值,所述第二伸缩步长大于所述第一伸缩步长。
在一些可能的实现方式中,图1A中的伸缩模块1044(或图1B中的伸缩模块1066)具体用于:
所述资源组中实例的平均性能值达到下一周期的告警阈值时,调整所述资源组中实例的数量。
根据本申请实施例的伸缩管理系统10可对应于执行本申请实施例中描述的方法,并且伸缩管理系统10的各个模块/单元的上述和其它操作和/或功能分别为了实现图2所示实施例中的各个方法中的相应流程,为了简洁,在此不再赘述。
本申请实施例还提供一种计算机集群。该计算机集群具体用于实现如图1A或图1B所示的伸缩管理系统10的功能。
图5提供了一种计算机集群的结构示意图,如图5所示,计算机集群50包括多台计算机500,计算机500包括总线501、处理器502、通信接口503和存储器504。处理器502、存储器504和通信接口503之间通过总线501通信。
总线501可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图5中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
处理器502可以为中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。
通信接口503用于与外部通信。例如,通信接口503用于获取应用在当前周期的业务流量等等。
存储器504可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器504还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,硬盘驱动器(hard disk drive,HDD)或固态驱动器(solid state drive,SSD)。
存储器504中存储有计算机可读指令,处理器502执行该计算机可读指令,以使得计算机集群50执行前述业务集群的伸缩方法(或实现前述伸缩管理系统10的功能)。
具体地,在实现图1A或图1B所示的伸缩管理系统10的实施例的情况下,且图1中所描述的伸缩管理系统10的模块如预测模块1024或预测模块1064、伸缩模块1044或伸缩模块1066的功能为通过软件实现的情况下,执行图1A或图1B中各模块的功能所需的软件或程序代码可以存储在计算机集群50中的至少一个存储器504中。至少一个处理器502执行存储器504中存储的程序代码,以使得计算机集群50执行前述业务集群的伸缩方法。
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算机集群50能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算机集群50执行上述业务集群的伸缩方法。
本申请实施例还提供了一种计算机程序产品。所述计算机程序产品包括一个或多个计算机指令。在计算机集群50上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机或数据中心进行传输。 所述计算机程序产品可以为一个软件安装包,在需要使用前述业务集群的伸缩方法的任一方法的情况下,可以下载该计算机程序产品并在计算机集群50上执行该计算机程序产品。
上述各个附图对应的流程或结构的描述各有侧重,某个流程或结构中没有详述的部分,可以参见其他流程或结构的相关描述。

Claims (19)

  1. 一种业务集群的伸缩方法,其特征在于,所述业务集群包括至少一个资源组,每个资源组包括运行应用的至少一个实例,所述方法包括:
    伸缩管理系统获取所述应用在当前周期的业务流量;
    所述伸缩管理系统根据所述应用在当前周期的业务流量,获取所述应用的资源组在下一周期的预测伸缩策略;
    当所述应用的资源组在下一周期的预测伸缩策略相对于所述应用在当前周期的伸缩策略有更新时,所述伸缩管理系统在下一周期执行所述预测伸缩策略。
  2. 根据权利要求1所述的方法,其特征在于,所述至少一个实例包括容器实例或虚拟机实例,所述伸缩管理系统在下一周期执行所述预测伸缩策略,包括:
    所述伸缩管理系统根据所述预测伸缩策略,在下一周期调整所述资源组中实例的数量。
  3. 根据权利要求1或2所述的方法,其特征在于,所述伸缩管理系统在下一周期执行所述预测伸缩策略,包括:
    所述伸缩管理系统在下一周期将所述资源组中实例的数量调整至目标值;或者,
    所述伸缩管理系统在下一周期按照伸缩步长调整所述资源组中实例的数量;或者
    所述伸缩管理系统在伸缩条件被触发时,调整所述资源组中实例的数量。
  4. 根据权利要求3所述的方法,其特征在于,所述目标值包括最小实例数量和/或最大实例数量,所述伸缩管理系统在下一周期将所述资源组中实例的数量调整至目标值,包括:
    所述资源组中当前实例数量小于所述最小实例数量时,所述伸缩管理系统在下一周期增加所述资源组中实例的数量至所述最小实例数量;或者,
    所述资源组中当前实例数量大于所述最大实例数量时,所述伸缩管理系统在下一周期减少所述资源组中实例的数量至所述最大实例数量。
  5. 根据权利要求4所述的方法,其特征在于,所述最小实例数量和所述最大实例数量根据所述业务流量以及所述资源组中实例的平均性能值确定。
  6. 根据权利要求3所述的方法,其特征在于,所述伸缩管理系统在下一周期按照伸缩步长调整所述资源组中实例的数量,包括:
    所述资源组中当前实例数量不小于最小实例数量,且不大于最大实例数量时,所述伸缩管理系统在下一周期按照伸缩步长调整所述资源组中实例的数量。
  7. 根据权利要求6所述的方法,其特征在于,所述伸缩管理系统在下一周期按照伸缩步长调整所述资源组中实例的数量,包括:
    所述资源组中实例的平均性能值到达第一性能值时,所述伸缩管理系统在下一周期按照第一伸缩步长调整所述资源组中实例的数量;
    所述资源组中实例的平均性能值到达第二性能值时,所述伸缩管理系统在下一周期按照第二伸缩步长调整所述资源组中实例的数量,所述第二性能值大于第一性能值,所述第二伸缩步长大于所述第一伸缩步长。
  8. 根据权利要求3所述的方法,其特征在于,所述伸缩管理系统在伸缩条件被触发时,调整所述资源组中实例的数量,包括:
    所述伸缩管理系统在所述资源组中实例的平均性能值达到下一周期的告警阈值时,调 整所述资源组中实例的数量。
  9. 一种伸缩管理系统,其特征在于,所述伸缩管理系统用于对业务集群进行伸缩,所述业务集群包括至少一个资源组,每个资源组包括运行应用的至少一个实例,所述系统包括:
    通信模块,用于获取所述应用在当前周期的业务流量;
    预测模块,用于根据所述应用在当前周期的业务流量,获取所述应用的资源组在下一周期的预测伸缩策略;
    伸缩模块,用于当所述应用的资源组在下一周期的预测伸缩策略相对于所述应用在当前周期的伸缩策略有更新时,在下一周期执行所述预测伸缩策略。
  10. 根据权利要求9所述的系统,其特征在于,所述至少一个实例包括容器实例或虚拟机实例,所述伸缩模块具体用于:
    根据所述预测伸缩策略,在下一周期调整所述资源组中实例的数量。
  11. 根据权利要求9或10所述的系统,其特征在于,所述伸缩模块具体用于:
    在下一周期将所述资源组中实例的数量调整至目标值;或者,
    在下一周期按照伸缩步长调整所述资源组中实例的数量;或者
    在伸缩条件被触发时,调整所述资源组中实例的数量。
  12. 根据权利要求11所述的系统,其特征在于,所述目标值包括最小实例数量和/或最大实例数量,所述伸缩模块具体用于:
    所述资源组中当前实例数量小于所述最小实例数量时,在下一周期增加所述资源组中实例的数量至所述最小实例数量;或者,
    所述资源组中当前实例数量大于所述最大实例数量时,在下一周期减少所述资源组中实例的数量至所述最大实例数量。
  13. 根据权利要求12所述的系统,其特征在于,所述最小实例数量和所述最大实例数量根据所述业务流量以及所述资源组中实例的平均性能值确定。
  14. 根据权利要求11所述的系统,其特征在于,所述伸缩模块具体用于:
    所述资源组中当前实例数量不小于最小实例数量,且不大于最大实例数量时,在下一周期按照伸缩步长调整所述资源组中实例的数量。
  15. 根据权利要求14所述的系统,其特征在于,所述伸缩模块具体用于:
    所述资源组中至少一个实例的平均性能值到达第一性能值时,在下一周期按照第一伸缩步长调整所述资源组中实例的数量;
    所述资源组中至少一个实例的平均性能值到达第二性能值时,在下一周期按照第二伸缩步长调整所述资源组中实例的数量,所述第二性能值大于第一性能值,所述第二伸缩步长大于所述第一伸缩步长。
  16. 根据权利要求11所述的系统,其特征在于,所述伸缩模块具体用于:
    所述资源组中实例的平均性能值达到下一周期的告警阈值时,调整所述资源组中实例的数量。
  17. 一种计算机集群,其特征在于,所述计算机集群包括至少一台计算机,所述至少一台计算机包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储有计算机 可读指令;所述至少一个处理器执行所述计算机可读指令,以执行如权利要求1至8中任一项所述的方法。
  18. 一种计算机可读存储介质,其特征在于,包括计算机可读指令;所述计算机可读指令用于实现权利要求1至8任一项所述的方法。
  19. 一种计算机程序产品,其特征在于,包括计算机可读指令;所述计算机可读指令用于实现权利要求1至8任一项所述的方法。
PCT/CN2022/130526 2022-05-27 2022-11-08 一种业务集群的伸缩方法及相关设备 WO2023226312A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210589198.6A CN117170855A (zh) 2022-05-27 2022-05-27 一种业务集群的伸缩方法及相关设备
CN202210589198.6 2022-05-27

Publications (1)

Publication Number Publication Date
WO2023226312A1 true WO2023226312A1 (zh) 2023-11-30

Family

ID=88918300

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/130526 WO2023226312A1 (zh) 2022-05-27 2022-11-08 一种业务集群的伸缩方法及相关设备

Country Status (2)

Country Link
CN (1) CN117170855A (zh)
WO (1) WO2023226312A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104580524A (zh) * 2015-01-30 2015-04-29 华为技术有限公司 一种云平台上的资源伸缩方法和一种云平台
CN112000459A (zh) * 2020-03-31 2020-11-27 华为技术有限公司 一种用于服务的扩缩容的方法及相关设备
US20210255901A1 (en) * 2020-02-13 2021-08-19 International Business Machines Corporation Enhanced healing and scalability of cloud environment app instances through continuous instance regeneration
CN113886010A (zh) * 2021-09-27 2022-01-04 阿里巴巴(中国)有限公司 容器资源的控制方法、设备及计算机存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104580524A (zh) * 2015-01-30 2015-04-29 华为技术有限公司 一种云平台上的资源伸缩方法和一种云平台
US20210255901A1 (en) * 2020-02-13 2021-08-19 International Business Machines Corporation Enhanced healing and scalability of cloud environment app instances through continuous instance regeneration
CN112000459A (zh) * 2020-03-31 2020-11-27 华为技术有限公司 一种用于服务的扩缩容的方法及相关设备
CN113886010A (zh) * 2021-09-27 2022-01-04 阿里巴巴(中国)有限公司 容器资源的控制方法、设备及计算机存储介质

Also Published As

Publication number Publication date
CN117170855A (zh) 2023-12-05

Similar Documents

Publication Publication Date Title
US11847576B2 (en) Methods and system for managing predictive models
US11216310B2 (en) Capacity expansion method and apparatus
US7962563B2 (en) System and method for managing storage system performance as a resource
US9524009B2 (en) Managing the operation of a computing device by determining performance-power states
CN112860403B (zh) 集群负载资源调度方法、装置、设备、介质及产品
US11184263B1 (en) Intelligent serverless function scaling
US10814229B2 (en) Fragment-based mobile device application streaming utilizing crowd-sourcing
CN113391765A (zh) 基于分布式存储系统的数据存储方法、装置、设备及介质
CN112445857A (zh) 一种基于数据库的资源配额管理方法和装置
CN111585915A (zh) 长、短流量均衡传输方法、系统、存储介质、云服务器
US20130239114A1 (en) Fine Grained Adaptive Throttling of Background Processes
US20210294641A1 (en) Dynamic interrupt steering and processor unit idle state demotion
US20210389990A1 (en) Computer system and control method for computer system
WO2023226312A1 (zh) 一种业务集群的伸缩方法及相关设备
US20220237044A1 (en) Dynamic client/server selection for machine learning execution
TWI539273B (zh) 用於減少電力消耗之並行網路應用程式排程技術
US11347615B2 (en) System and method for context-based performance optimization of an information handling system
CN107315700B (zh) 一种中断处理方法以及相关装置
CN109951737B (zh) 视频处理方法、装置、电子设备和计算机可读存储介质
Zhang et al. PRMRAP: A proactive virtual resource management framework in cloud
US10915361B1 (en) Dynamic capacity buffers
US7630860B1 (en) Controlling process group execution elapsed time through component performance control
US20200166978A1 (en) Battery Heat Balancing During Peak Power Mode
US8892693B2 (en) Enabling fragment-based mobile device application streaming
CN112156453B (zh) 实例自适应调整方法、装置、计算机可读存储介质及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22943499

Country of ref document: EP

Kind code of ref document: A1