WO2021197364A1 - 一种用于服务的扩缩容的方法及相关设备 - Google Patents

一种用于服务的扩缩容的方法及相关设备 Download PDF

Info

Publication number
WO2021197364A1
WO2021197364A1 PCT/CN2021/084242 CN2021084242W WO2021197364A1 WO 2021197364 A1 WO2021197364 A1 WO 2021197364A1 CN 2021084242 W CN2021084242 W CN 2021084242W WO 2021197364 A1 WO2021197364 A1 WO 2021197364A1
Authority
WO
WIPO (PCT)
Prior art keywords
service
state
workload
state prediction
instance
Prior art date
Application number
PCT/CN2021/084242
Other languages
English (en)
French (fr)
Inventor
张书博
余阳
潘茂林
张超盟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021197364A1 publication Critical patent/WO2021197364A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the technical field of cloud computing, in particular to a method for service expansion and contraction and related equipment.
  • cloud computing has received extensive attention from the scientific research and industrial circles.
  • An important goal in cloud computing is that cloud service providers can save resource consumption as much as possible under the premise of guaranteeing service level agreement (SLA), and only allocate resources that users actually need.
  • SLA service level agreement
  • the needs of users are not static, and resources need to be allocated to users flexibly and dynamically. Therefore, the automatic scaling strategy has become one of the core research contents in cloud computing.
  • a variety of automatic scaling strategies for cloud platforms have been proposed, such as scaling strategies based on thresholds, scaling strategies based on cybernetics and queuing theory, and scaling strategies based on time series analysis methods and reinforcement learning methods.
  • these strategies have slow response speeds in applications, and the determined scaling strategies are not accurate enough.
  • the slow convergence speed of the Q-learning algorithm will lead to untimely resource scheduling, which cannot guarantee to meet the SLA and affect user experience.
  • the embodiment of the invention discloses a method for service expansion and contraction and related equipment, which can determine the instance expansion and contraction strategy in time, ensure the accuracy of the determined instance expansion and contraction strategy, and ensure that the requirements of the SLA are met.
  • the present application provides a method for service expansion and contraction.
  • the method includes: a computing device obtains index information and workload data of a current periodic service, where the index information is used to indicate the status of the current service;
  • the computing device inputs the workload data to a workload prediction model to obtain a workload prediction result, wherein the workload prediction model is used to predict the workload value received by the service, and the workload prediction result includes The average user request rate received by the service in the next period;
  • the computing device inputs the indicator information and the workload prediction result to the state prediction model to obtain the state prediction result, wherein the state prediction model is used to
  • the prediction result includes the state of the service in the next cycle;
  • the computing device determines the instance scaling strategy corresponding to the service according to the state prediction result, and performs an analysis on all instances according to the instance scaling strategy.
  • the instance corresponding to the service is expanded and contracted.
  • the computing device uses the workload prediction model to obtain the workload prediction result, and further uses the indicator information and the workload prediction value to use the state prediction model to predict the state of the service in the next cycle, so that a sudden change in the load can occur.
  • the status of the service in the next cycle can be obtained more accurately, which can ensure that the instance expansion strategy is determined in a timely and accurate manner to ensure that the requirements of the SLA are met.
  • a computing device determines an initial state prediction model, and the initial state prediction model adopts a neural network model; the computing device obtains a plurality of training samples, the The training sample includes historical index information and historical workload data corresponding to the service; the computing device uses the training sample to train the initial state prediction model to obtain the state prediction model.
  • the computing device extracts and obtains a plurality of training samples containing historical indicator information and historical workload data corresponding to the service, and then uses the training samples to train the initial state prediction model to predict the completed state of the training
  • the model has the ability to predict the status of the service in the next cycle, so that the input indicator information and workload of the current cycle can be predicted, so that the status of the service in the next cycle can be accurately output.
  • the indicator information includes CPU utilization and response time, and the CPU utilization and response time are used to determine the status corresponding to the service .
  • the computing device collects the corresponding CPU utilization rate and response of the service at runtime. Time, so that the corresponding status of the service can be described more accurately.
  • the computing device determines m*n states corresponding to the service according to m CPU utilization intervals and n response time intervals, and the state prediction result Is one of the m*n states, wherein the m CPU utilization intervals are obtained by dividing the CPU utilization rate by the computing device according to a preset threshold, and the CPU utilization rate range is 0-1, the m is a positive integer greater than 1; the n response time intervals are obtained by dividing the response time by the computing device according to a preset time length, and the n is a positive integer greater than 1.
  • both CPU utilization and response time are continuous indicators. If the CPU utilization and response time are directly used to determine the status of the service, the state space of the service will explode, that is, there are infinite states of the service. , The status of the service cannot be accurately determined, and a lot of computing resources and storage resources will be wasted. If the CPU utilization and response time are divided into intervals to make them discretized, it can be guaranteed that there are only a limited number of states , Which can accurately determine the corresponding status of the service.
  • the computing device uses the ⁇ -greedy strategy to determine the instance scaling strategy corresponding to the service according to the state prediction result, and the ⁇ -greedy strategy uses To select the action with the largest Q value corresponding to the state prediction result, the Q value is used to indicate the maximum future reward expectation under a given corresponding state and corresponding action.
  • the computing device after the computing device predicts the state of the service in the next cycle, it can quickly and accurately select the action with the largest Q value corresponding to the service in this state, thereby determining the instance expansion strategy.
  • an example scaling system which includes: an acquisition unit for acquiring index information and workload data of the current periodic service, the index information being used to indicate the current service status; a workload prediction unit using The workload data is input into a workload prediction model to obtain a workload prediction result, wherein the workload prediction model is used to predict the workload value received by the service, and the workload prediction result includes the The average user request rate received by the service in the next period; the state prediction unit is used to input the indicator information and the workload prediction result into the state prediction model to obtain the state prediction result, wherein the state prediction model is used for The state of the service is predicted, and the prediction result includes the state of the service in the next cycle; the instance scheduling unit is configured to determine the instance scaling strategy corresponding to the service according to the state prediction result, and according to the instance The scaling strategy scales and shrinks the instance corresponding to the service.
  • the acquisition unit is further configured to acquire training samples, where the training samples include historical indicator information and historical workload data corresponding to the service;
  • the state prediction unit is also used to determine an initial state prediction model, the initial state prediction model adopts a neural network model; the initial state prediction model is trained using the training samples to obtain the state prediction model.
  • the indicator information includes a CPU utilization rate and a response time, and the CPU utilization rate and the response time are used to determine a state corresponding to the service.
  • the state prediction unit is further configured to divide the CPU utilization rate according to a preset threshold to obtain m intervals, and the CPU utilization rate
  • the range of is 0-1, the m is a positive integer greater than 1; the response time is divided into intervals according to the preset time length, and n intervals are obtained, and the n is a positive integer greater than 1; according to the m
  • the CPU utilization interval and the n response time intervals determine m*n states corresponding to the service, and the state prediction result is one of the m*n states.
  • the instance scheduling unit is specifically configured to: according to the state prediction result, determine the instance scaling strategy corresponding to the service using an ⁇ -greedy strategy
  • the ⁇ -greedy strategy is used to select the action with the largest Q value corresponding to the state prediction result, and the Q value is used to indicate the maximum future reward expectation under a given corresponding state and corresponding action.
  • a computing device in a third aspect, includes a processor and a memory, the memory is used to store program code, and the processor is used to execute the program code in the memory to implement the above-mentioned first aspect and A method combining any one of the above-mentioned first aspects.
  • a computer-readable storage medium stores a computer program.
  • the processor executes the first aspect described above and in combination with the first aspect described above.
  • a method for service expansion and contraction provided by any one of the implementations of the aspects.
  • a computer program product includes instructions.
  • the computer program product When the computer program product is executed by a computer, the computer can execute the first aspect and any one of the implementations in the first aspect. The process of the method provided for service expansion and contraction.
  • FIG. 1 is a schematic flowchart of a Q-Learning algorithm for instance expansion and contraction according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a flow chart of applying the SARSA algorithm for instance expansion and contraction according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an example expansion and contraction system provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a method for service expansion and contraction provided by an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of a state prediction model provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a flow of algorithm operation provided by an embodiment of the present application.
  • Fig. 8 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • Cloud computing is a service related to information technology, software, and the Internet. Cloud computing combines multiple computing resources to form a computing resource sharing pool. This computing resource sharing pool is also called “cloud” and is implemented through software. With automated management, users can obtain resources on the “cloud” at any time as required. In theory, resources on the “cloud” can be expanded indefinitely.
  • SLA is an agreement signed by a service provider and a customer, which includes items such as service type, service quality, and service performance, which can meet user needs to the greatest extent and ensure user satisfaction.
  • Microservices is an emerging software architecture whose purpose is to split a large single application and service into dozens of supporting microservices. Each microservice in the system can be deployed independently, and each microservice is loosely coupled. Each microservice only focuses on completing one task, and each task represents a small business capability. The microservice strategy can make work easier, it can extend a single component rather than the entire application stack to meet the SLA.
  • Service mesh is the infrastructure layer of communication between services, focusing on communication between services, making the communication between each service instance smoother, more reliable, and faster. It also provides functions such as service discovery, load balancing, encryption, authentication, authorization, and fuse mode support.
  • a virtual machine refers to a complete computer system that has complete hardware system functions through a software model and runs in a completely isolated environment. The work that can be done in the physical computer can be realized in the VM.
  • part of the hard disk and memory capacity of the physical computer needs to be used as the hard disk and capacity of the VM.
  • Each VM has an independent hard disk and operating system, etc. Operate the virtual machine like a physical machine.
  • Container is a kind of virtualization technology in computer operating system, which enables processes to run in a relatively independent and isolated environment (including independent file systems, namespaces, resource views, etc.), thereby simplifying the software deployment process , To enhance the portability and security of software, and to improve the utilization of system resources.
  • Container technology is widely used in service-oriented scenarios in the field of cloud computing.
  • An instance refers to the result obtained after a certain microservice of an application is instantiated.
  • an instance contains one or more containers, which are used to perform the functions of the container.
  • Auto-scaling is a concept in cloud computing, which means that the system deployed on the cloud platform can dynamically determine the appropriate amount of resources based on the workload of the application, and then automatically apply and release resources.
  • Horizontal scaling is a concept in cloud computing, which refers to scaling in units of instances, which can directly increase or decrease the number of instances.
  • Reinforcement learning also known as reinforcement learning, evaluation learning or reinforcement learning
  • Reinforcement learning is one of the paradigms and methodology of machine learning. It is used to describe and solve the agent's interaction with the environment through learning strategies to maximize returns Or the question of achieving a specific goal.
  • Reinforcement learning includes a variety of typical algorithms, such as Q-learning (Q-learning) algorithm, state action return state action (SARSA) algorithm, etc.
  • Q-learning Q-learning
  • SARSA state action return state action
  • the common model is the standard Markov decision process.
  • reinforcement learning can be divided into model-based reinforcement learning and modeless reinforcement learning, as well as active reinforcement learning and passive reinforcement learning.
  • Neural network is an algorithmic mathematical model that imitates the behavioral characteristics of animal neural networks and performs distributed and parallel information processing. This kind of network relies on the complexity of the system and achieves the purpose of processing information by adjusting the interconnection between a large number of internal nodes.
  • ARIMA autoregressive integrated moving average model
  • p the number of autoregressive terms
  • d the number of differences (order) made to make a non-stationary series a stationary series
  • q the number of moving average terms.
  • scaling strategies such as responsive strategy and predictive strategy.
  • Predictive strategy can effectively reduce the response time and better meet the SLA because it can make the decision of instance expansion and contraction in advance. More extensive.
  • reinforcement learning does not require any prior knowledge in the application process, and is adaptive and robust. Therefore, predictive scaling strategies can be specified based on the reinforcement learning method to ensure the resource utilization of the application when the workload changes dynamically In a relatively stable state.
  • Fig. 1 it is a schematic diagram of the process of applying the Q-Learning algorithm for instance expansion and contraction.
  • the rows of the Q table represent the status of the service, and the columns of the Q table represent the actions corresponding to the status.
  • the actions can specifically increase or decrease the number of instances running the service, such as adding 2 instances, reducing 2 examples, etc.
  • the value in the Q table represents the value obtained by performing an action in a certain state.
  • all values in the Q table can be set to 0, of course, it can also be set to other values (such as 1 or 2 etc.).
  • determine the state S corresponding to the current periodic service according to the current periodic service indicators obtained by monitoring.
  • the period length can be set as needed, for example, it can be set to 5 seconds.
  • the service indicators are collected by the monitoring application deployed in the instance.
  • the instance can be a virtual machine, a container, etc.
  • the service indicator can be a CPU utilization and response time.
  • the response time represents the period of time from when the request arrives at the start of the service until the service returns a result.
  • the greedy strategy ( ⁇ -greedy) to select the action A in state S according to the Q table.
  • the meaning of ⁇ -greedy is to select the action with the largest Q value corresponding to the current state S from the Q table with the probability of ⁇ , with 1- ⁇
  • the probability of ⁇ is randomly selected from the Q table.
  • the value of ⁇ is between 0-1.
  • is the learning rate, which is used to characterize the degree of retention of the previous training effect. The larger the ⁇ , the less the previous training effect is retained.
  • R represents the return value (reward obtained from the environment), and ⁇ is the discount coefficient.
  • the essence of formula 1 is to use the maximum Q value corresponding to the state S1 of the service in the next cycle to update the Q value corresponding to the state S of the current service when performing action A. It should be understood that A1 is the action to be executed in the next cycle. This action will maximize the value of Q, maxQ a (S1, A1) means that the execution of action a maximizes the value of Q, that is, a is action A1.
  • Instance expansion and contraction can be achieved by running the Q-Learning algorithm, but the Q-Learning algorithm itself needs to be executed multiple iterations to update the Q table.
  • the convergence speed is slow, and the instance expansion strategy cannot be determined in time, resulting in untimely resource scheduling and failure to meet the SLA. Requirements.
  • FIG. 2 it is a schematic diagram of the process of applying the SARSA algorithm for instance expansion and contraction.
  • the Q table is initialized first, and then in the first cycle of the algorithm, the current cycle service status S is determined according to the current cycle service indicators obtained by monitoring, and then the current status S is selected using the ⁇ -greedy strategy
  • the corresponding action A is executed and executed, and the return value R and the state S1 of the next cycle of the service are calculated.
  • the state S1 use the ⁇ -greedy strategy to select the action A1 to be performed in the next cycle from the Q table, and finally update the Q table using the dynamic programming equation, and determine the action A1 as the action to be performed in the next cycle.
  • the formula 2 is:
  • Instance expansion and contraction can also be achieved by running the SARSA algorithm, but the SARSA algorithm only considers that the status of the service is affected by the actions performed, and does not consider that the status of the service is affected by the instance itself and the workload, resulting in the actions determined in the current cycle It is not suitable for the next cycle, which in turn leads to the determined instance scaling strategy is not accurate enough to meet the requirements of the SLA.
  • the action space is fixed, that is, the actions corresponding to each state are 5 (-2, -1, 0, +1, + 2), +2 means adding 2 instances, +1 means adding 1 instance, 0 means the number of instances remains unchanged, -2 means decreasing by 2 instances, and -1 means decreasing by 1 instance.
  • +2 means adding 2 instances
  • +1 means adding 1 instance
  • 0 means the number of instances remains unchanged
  • -2 means decreasing by 2 instances
  • -1 means decreasing by 1 instance.
  • this application provides a method for service expansion and contraction and related equipment.
  • the expansion strategy can be determined in a timely and efficient manner and the accuracy of the determined expansion strategy can be guaranteed to meet the SLA. Requirements.
  • the instance scaling system can be deployed in any computing device that involves instance scaling.
  • it may be deployed in one or more computing devices in a cloud environment (for example, a central server), or in one or more computing devices in an edge environment (for example, a server).
  • the cloud environment refers to the central computing equipment cluster owned by the cloud server provider and used to provide computing, storage, and communication resources, with large storage resources and computing resources;
  • the edge environment refers to the distance from the terminal equipment geographically.
  • clusters of edge computing devices are used to provide computing, storage, and communication resources.
  • the instance scaling system is used to collect various indicators and workloads of the service, predict the next cycle of workload and service status, so as to accurately determine the instance scaling strategy and perform scaling in time.
  • the internal units of the example scaling system can be divided in multiple ways, which are not limited in this application.
  • Fig. 4 is an exemplary division method. As shown in Fig. 4, the function of each functional unit will be briefly described below.
  • the illustrated example scaling system 400 includes multiple functional units.
  • the collection unit 410 is used to collect various indicators and workload values during service operation from a container cloud cluster or a virtual machine cluster, such as service response time and CPU utilization.
  • the container cloud cluster provides a platform for building, publishing, and running containerized services, and allows developers or administrators to manage and maintain containers.
  • the workload prediction unit 420 is used to dynamically fit the workload value collected by the collection unit 410, and predict the workload value of the next cycle; the state prediction unit 430 is used to perform a dynamic fit based on the indicators and values collected by the collection unit 410 The workload value predicted by the workload prediction unit 420 predicts the state of the service to obtain the state of the service in the next cycle; the instance scheduling unit 440 is configured to determine an instance scaling strategy according to the state predicted by the state prediction unit 430, and execute This strategy is to complete the expansion and contraction of the instance.
  • example expansion and contraction system 400 may be a software system, and the various parts and functional units included therein are deployed on hardware devices in a flexible manner.
  • FIG. 5 is a schematic flowchart of a method for service expansion and contraction according to an embodiment of the application. As shown in Figure 5, the method includes but is not limited to the following steps:
  • S501 The computing device obtains indicator information and workload data when the current periodic service is running.
  • the computing device is deployed with the example scaling system 400 shown in FIG. 4 above.
  • the computing device can collect indicator information from the container cloud cluster.
  • Each application includes one or more services, which run in different containers.
  • Multiple containers can be deployed on one physical machine.
  • the collection unit 410 in the computing device can collect indicator information and workload data in real time or periodically, and the collection period can be set as needed, for example, it can be set to 5 seconds.
  • the collected indicator information includes the current cycle's CPU utilization, memory utilization, response time, number of instances (containers), etc., and the collected workload is a flow data set.
  • CPU utilization can truly and effectively reflect the resource utilization of the service, and the response time can intuitively reflect the user’s experience. It is an important basis for judging whether the service meets the SLA. Therefore, this application uses the CPU utilization and response time to determine The status of the service. It should be understood that CPU utilization and response time are continuous indicators. If the CPU utilization and response time are directly used to construct the state of the service, there will be infinite states that will be constructed, which will result in a lot of resource management and subsequent forecasting services. The state caused the system to crash. Therefore, it is necessary to discretize the CPU utilization and response time, so that the status of the services contained in the constructed state space is determined and limited.
  • the CPU utilization rate is divided into intervals according to a preset threshold to obtain m intervals, wherein the CPU utilization rate has a value range of 0-1, and the m is a positive value greater than 1.
  • the response time is divided into intervals according to the preset duration to obtain n intervals, where n is a positive integer greater than 1; the m CPU utilization intervals and the n response time intervals determine which service corresponds to m*n states.
  • the CPU utilization rate when dividing the CPU utilization rate into intervals, it can be divided at equal intervals.
  • the CPU utilization rate can be divided into [0,0.2], [0.2,0.4], [0.4,0.6], [0.6,0.8 ] And [0.8,1], that is, the value of m is 5, and the size of each interval is 0.2.
  • the interval size can also be set to other values, which is not limited in this application.
  • the response time is divided into [0,100ms], [100ms,250ms], [250ms,500ms], [500ms,1000ms], [1000ms, ⁇ ], 100ms means 100ms, ⁇ means infinity, so the response time is divided into 5 , That is, the value of n is 5.
  • it can also be divided in other ways, which is not limited in this application.
  • the continuous CPU utilization rate and response time are divided into different intervals, so that a limited number of states can be obtained and the state space explosion can be avoided.
  • the collection unit 410 collects that the current CPU utilization rate is 0.772 and the response time is 291ms, it can be determined that the current periodic service status is the state determined by the interval [0.6, 0.8] and the interval [250ms, 500ms] .
  • the computing device inputs the workload data to the workload prediction model to obtain a workload prediction result.
  • the workload prediction model Before the workload prediction model is used to predict current workload data, it needs to fit historical workload data so that the workload prediction model has the ability to predict the workload value of the next cycle. It should be understood that in practical applications, the workload (user request rate) is complex and changeable and affected by many factors. Therefore, the workload prediction model should be a model used for non-stationary time series prediction, such as an ARIMA model.
  • the ARIMA model is a model used for time series forecasting in statistical models. It uses a fixed-size queue type time series for dynamic fitting. The length of the queue can be set as needed, for example, it can be set to 50. This application does not do this. limited.
  • the historical workload data (the average user request rate in each cycle) collected by the collection unit 410 is used to fit the model. It should be understood that the historical workload data (time series) collected by the collection unit 410 may not be stable. At this time, model fitting cannot be performed directly, and further processing is required to make it a stable sequence to satisfy model fitting. If a sequence is stationary, the mean, variance and covariance of the sequence will not change significantly.
  • Unit root means that the sequence is not stationary, and difference processing is needed to make the sequence stationary, thereby satisfying the requirements of model fitting; if there is a unit root, it means that the sequence is stationary, and the model can be directly simulated. combine.
  • ADF augmented dickey-fuller
  • the smaller the value of the detected ADF the more it indicates that the sequence does not have unit roots, and the more stable the sequence is.
  • differential processing is performed on the sequence, the minimum number of differential times for the sequence to become stable from non-stationary is defined as the parameter d of the ARIMA model.
  • partial autocorrelation function (PACF) is used to determine the autoregressive order p in the model.
  • PACF is used to describe the difference between the time series observation value and the past observation value under the condition of a given intermediate observation value.
  • the linear correlation between the two, p represents the lag of the time series data used in the model;
  • the autocorrelation function (ACF) is used to determine the moving average order q, and ACF is used to describe the time series observations and the past observations
  • the linear correlation between, q represents the number of lags in the prediction error used in the model.
  • Y t represents the load forecast value, which is affected by its own changes.
  • the polynomial of the observation value in formula 3 can be obtained, and ⁇ t represents the error, which has a dependence relationship in different periods, corresponding to the error in formula 3 Polynomial. Since the historical workload data is differentiated and then the ARMA model fitting is performed, after Y t is obtained by formula 3, it is necessary to perform an inverse differential operation on Y t , so as to finally obtain the predicted value of the workload of the next cycle.
  • the application when the application uses the ARIMA model for fitting, it can be dynamically fitted in real time.
  • the collection unit 410 collects the workload value of the current cycle (ie, the average user request rate), it is added to the model for use.
  • the fitted time series find and discard the earliest collected historical workload values from the time series in chronological order, and use the updated time series to perform model fitting to ensure the accuracy of prediction.
  • multiple cycles of workload values collected by the collection unit 410 for example, the workload values of the last 5 cycles
  • the corresponding number of historical workload values are discarded in the process, and then the model is fitted, which can reduce the calculation pressure of the computing device and improve the resource utilization efficiency of the computing device.
  • S503 The computing device inputs the indicator information and the workload prediction result to the state prediction model to obtain the state prediction result.
  • the state prediction model in this application is a trained neural network model, such as recurrent neural networks (RCNN), recursive neural networks (RNN), and convolutional neural networks (convolutional neural networks).
  • RCNN recurrent neural networks
  • RNN recursive neural networks
  • CNN convolutional neural networks
  • neural networks, CNN convolutional neural networks
  • the state prediction model needs to be trained before it is used to predict the state of the service, so that it has the ability to predict the state of the service.
  • special training data for training.
  • the sample data includes the CPU that is pre-collected by the collection unit in each historical cycle. Utilization, memory utilization, response time, number of instances, workload values, and workload prediction values predicted by the workload prediction model for each historical period.
  • the structure of the initial state prediction model 600 of the present application mainly includes three parts, namely the input layer 610, the hidden layer 620, and the output layer. Layer 630. Then the parameters of the initial state prediction model 600 are initialized, and then the sample data is input to the input layer 610.
  • the input layer 610 processes the sample data and transmits it to the hidden layer 620.
  • the feature extraction unit 621 in the hidden layer 620 processes the input
  • the sample data is subjected to feature extraction and recognition, and then input to the prediction unit 622.
  • the prediction unit 622 predicts the CPU utilization and response time of the next cycle.
  • the loss function calculation unit 623 calculates the loss function based on the result predicted by the prediction unit 622, and The loss function is the objective function and uses a backpropagation algorithm to update and adjust the parameters in the model.
  • the output layer 630 outputs the response time prediction value and the CPU utilization prediction value predicted by the hidden layer 620. Input different training samples in sequence, and continue to perform the above training process iteratively until the loss function value converges, that is, every time the calculated loss function value fluctuates within a certain value accessory, the training is stopped. At this time, the state prediction model has been trained Complete, that is, the state prediction model has the function of predicting the state of the service in the next cycle.
  • the status of the service is determined by the response time and CPU utilization. Therefore, the output of the status prediction model is the response time and CPU utilization of the next cycle service.
  • the state prediction model provided in this application also supports dynamic training during use, so that the model fits the actual situation and improves the prediction accuracy. For example, if the load has a tendency in an actual application scenario (the load is always High or low, etc.), then continue dynamic training on the previously trained model, which can make the model more suitable for the current scene.
  • the number of neurons in the input layer 610 of the state prediction model is 7, which are the current cycle of CPU utilization, memory utilization, response time, number of instances, workload value, and workload prediction in the next cycle. Value and a bias term (adding translation capabilities to the network classification to make the model fit better); the number of neurons in the output layer 630 is 2, which are the response time prediction value and the CPU utilization prediction value respectively; hidden layer
  • the number of layers of 620 and the number of neurons in each layer can be flexibly set. Through a large number of experiments and effect comparisons, the number of neurons in the hidden layer is preferably 12 in this application.
  • S504 The computing device determines an instance scaling strategy corresponding to the service according to the state prediction result.
  • the state corresponding to the service in the next cycle can be determined. Then, according to the determined state, the ⁇ -greedy strategy is used to select the corresponding action in this state from the Q table. This action is the determined instance scaling strategy, such as adding 5 instances, reducing 2 instances, and so on.
  • the above Q table is the result of executing the improved SARSA algorithm.
  • the values in the Q table need to iteratively execute the above algorithm to update the value and action space in the Q table until the algorithm converges.
  • the Q table has been stabilized and can be directly used to determine the instance expansion strategy.
  • S701 Determine the state S of the service in the first cycle of the algorithm, and select the action to be performed from the Q table.
  • the collection unit 410 is used to collect the response time and CPU utilization of the current cycle to determine the state S of the service, and then select the action corresponding to the largest Q value in the state S. If there are multiple Q values in parallel with the largest value, Randomly select an action corresponding to the Q value.
  • S702 Execute the action A selected in the previous cycle, calculate the reward value R after the action is executed, and determine the state S1 of the current cycle.
  • the reward value can be calculated by the following formula 4.
  • the formula 4 is:
  • R represents the return value
  • represents the CPU utilization rate
  • p is a set constant used to control the impact of the response time on the return value. The larger the p, the greater the impact of the response time on the return value, generally set to 2.
  • A represents the response time
  • b represents the response time specified by the SLA.
  • the return value when the response time is greater than the response time specified by the SLA, the return value must be negative, and when the response time is less than the response time specified by the SLA, the return value must be positive.
  • the state S1 of the current cycle can be obtained by measuring the response time of the current cycle and the CPU utilization rate.
  • the ⁇ -greedy strategy is used to select the action A1, and then according to the calculated reward value, the above formula 1 is substituted to complete the update of Q(S, A).
  • S704 Determine whether to increase the action space of the service in the state S according to the state S1 of the current cycle.
  • action A is the increase of the maximum number of instances, for example, action A is to increase by 2 instances, but the response time of the current cycle still exceeds the response time specified by the SLA, it means that the previous cycle has increased The number of instances of is not enough. Therefore, the action space corresponding to the service in state S needs to be increased. For example, two actions can be added. At this time, the corresponding action space of the service in state S is (-3, -2, -1, 0, +1, +2, +3), the Q value corresponding to the newly added action can be initialized to 0.
  • S705 Use the load prediction model to predict the load of the next cycle to obtain the predicted value of the workload, and use the state prediction model to predict the state of the service in the next cycle according to the predicted value of the workload and various indicators of the current cycle, using ⁇ -greedy The strategy selects action A2.
  • the items in the Q table will be updated continuously, and the algorithm will consume a lot of resources during its operation.
  • the preset number of cycles can be set as required, for example, set to 500.
  • the value of N can be 3.
  • S505 The computing device scales and shrinks the instance corresponding to the service according to the determined instance scaling strategy.
  • the computing device determines the number of instances of the service that needs to be scaled, it will call the external interface exposed by the container cloud cluster, such as the application programming interface (API), to pass the number of instances of the service that need to be scaled to the replication Controller, the number of instances of the service is increased or decreased by the replication controller.
  • API application programming interface
  • the method described in FIG. 5 is to perform instance expansion and contraction for each service.
  • the above steps S501-S505 need to be performed for each service that composes the application. Complete the expansion and contraction of the entire application instance.
  • the present application also provides an example scaling system, which is used to execute the aforementioned method for service scaling.
  • This application does not limit the division of functional units in the example scaling system, and each unit in the example scaling system can be added, reduced, or merged as needed.
  • Figure 4 exemplarily provides a division of functional units:
  • the example scaling system 400 includes a collection unit 410, a workload prediction unit 420, a state prediction unit 430, and an instance scheduling unit 440.
  • the collection unit 410 is configured to perform the foregoing step S501, and optionally perform optional methods in the foregoing steps.
  • the workload prediction unit 420 is configured to perform the foregoing step S502, and optionally perform optional methods in the foregoing steps.
  • the state prediction unit 430 is configured to perform the foregoing step S503, and optionally perform optional methods in the foregoing steps.
  • the instance scheduling unit 440 is configured to perform the foregoing steps S504 and S505, and optionally perform optional methods in the foregoing steps.
  • each unit included in the example scaling system 400 can be a software unit, a hardware unit, or a part of a software unit and a part of a hardware unit.
  • FIG. 8 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • the computing device 800 includes a processor 810, a communication interface 820, and a memory 830.
  • the processor 810, the communication interface 820, and the memory 830 are connected to each other through an internal bus 840.
  • the computing device 800 may be a computing device in cloud computing or a computing device in an edge environment.
  • the processor 810 may be composed of one or more general-purpose processors, such as a central processing unit (CPU), or a combination of a CPU and a hardware chip.
  • the above-mentioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
  • the above-mentioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • the bus 840 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus 840 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 8, but it does not mean that there is only one bus or one type of bus.
  • the memory 830 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM); the memory 130 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory). Only memory (ROM), flash memory (flash memory), hard disk drive (HDD), or solid-state drive (SSD); the memory 830 may also include a combination of the above types.
  • volatile memory such as a random access memory (random access memory, RAM
  • non-volatile memory such as a read-only memory (read-only memory).
  • SSD solid-state drive
  • the memory 830 of the computing device 800 stores the codes corresponding to the units of the example scaling system 400, and the processor 810 executes these codes to realize the functions of the units of the example scaling system 400, that is, S501- S505 method.
  • the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it can implement any part of the method described in the above method embodiments. Or all steps.
  • the embodiment of the present invention also provides a computer program, the computer program includes instructions, when the computer program is executed by a computer, the computer can execute part or all of the steps of any method for service expansion and contraction.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are only illustrative, for example, the division of the above-mentioned units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请提供一种用于服务扩缩容的方法及相关设备。其中,该方法包括:获取当前周期服务的指标信息和工作负载,并利用工作负载预测模型根据所述工作负载得到下个周期工作负载预测值;根据工作负载预测值以及指标信息,利用状态预测模型预测下个周期服务的状态;根据预测得到的状态,确定服务对应的实例扩缩策略并对服务对应的实例进行扩缩。上述方法能够及时的确定实例扩缩策略,提高所确定的实例扩缩策略的准确性。

Description

一种用于服务的扩缩容的方法及相关设备 技术领域
本发明涉及云计算技术领域,尤其涉及一种用于服务的扩缩容的方法及相关设备。
背景技术
云计算作为近年来新兴的产业,获得了科研界和产业界的广泛关注。云计算中一个重要目标是云服务提供商能够在保证服务等级协议(service level agreement,SLA)的前提下尽可能的节省资源消耗,只分配给用户实际需要的资源。而用户的需求并不是一成不变的,需要灵活动态的给用户分配资源,因此,自动缩放策略成为云计算中的核心研究内容之一。
目前已经提出多种用于云平台中的自动缩放策略,例如基于阈值的缩放策略,基于控制论、排队论相关的缩放策略,基于时间序列分析方法和强化学习方法相关的缩放策略等。但是这些策略在应用中存在着响应速度较慢,确定的缩放策略不够准确等问题。例如在基于Q-learning算法(一种强化学习算法)制定自动缩放策略时,由于Q-learning算法收敛速度较慢将会导致资源调度不及时,不能保证满足SLA,影响用户体验。
因此,如何及时并准确的制定实例扩缩策略,保证SLA,节省系统资源开销是目前亟待解决的问题。
发明内容
本发明实施例公开了一种用于服务的扩缩容的方法及相关设备,能够及时的确定实例扩缩策略,保证确定的实例扩缩策略的准确性,保证满足SLA的要求。
第一方面,本申请提供一种用于服务的扩缩容的方法,所述方法包括:计算设备获取当前周期服务的指标信息和工作负载数据,所述指标信息用于指示当前服务的状态;该计算设备输入所述工作负载数据至工作负载预测模型,得到工作负载预测结果,其中,所述工作负载预测模型用于对所述服务接收的工作负载值进行预测,所述工作负载预测结果包括所述服务在下一个周期接收到的平均用户请求速率;该计算设备输入所述指标信息和所述工作负载预测结果至状态预测模型,得到状态预测结果,其中,所述状态预测模型用于对服务的状态进行预测,所述预测结果包括所述服务在下一个周期的状态;该计算设备根据所述状态预测结果,确定所述服务对应的实例扩缩策略,并根据所述实例扩缩策略对所述服务对应的实例进行扩缩。
在本申请提供的方案中,计算设备利用工作负载预测模型得到工作负载预测结果,进一步根据指标信息和工作负载预测值,利用状态预测模型预测得到服务在下一 个周期的状态,这样可以在负载发生突变的情况下,较为准确的得到服务在下一个周期的状态,从而可以保证及时且准确的确定实例扩缩策略,保证满足SLA的要求。
结合第一方面,在第一方面的一种可能的实现方式中,计算设备确定初始状态预测模型,所述初始状态预测模型采用一种神经网络模型;该计算设备获取多个训练样本,所述训练样本包括所述服务对应的历史指标信息和历史工作负载数据;该计算设备利用所述训练样本对所述初始状态预测模型进行训练以得到所述状态预测模型。
在本申请提供的方案中,计算设备提取获取多个包含服务对应的历史指标信息和历史工作负载数据的训练样本,然后利用该训练样本对初始状态预测模型进行训练,以使得训练完成的状态预测模型具备预测服务下一个周期的状态的能力,这样可以对输入的当前周期的指标信息和工作负载进行状态预测,从而可以准确的输出下一个周期服务的状态。
结合第一方面,在第一方面的一种可能的实现方式中,所述指标信息包括CPU利用率和响应时间,所述CPU利用率和所述响应时间用于确定所述服务所对应的状态。
在本申请提供的方案中,由于CPU利用率可以最为真实有效的反映服务资源利用情况,响应时间可以最直观的反映用户的体验,所以计算设备通过采集服务在运行时对应的CPU利用率和响应时间,从而可以较为准确的描述服务对应的状态。
结合第一方面,在第一方面的一种可能的实现方式中,计算设备根据m个CPU利用率区间和n个响应时间区间确定所述服务对应的m*n个状态,所述状态预测结果为所述m*n个状态中的其中一个,其中,所述m个CPU利用率区间为该计算设备根据预设阈值对所述CPU利用率进行区间划分得到,所述CPU利用率的范围为0-1,所述m为大于1的正整数;所述n个响应时间区间为该计算设备根据预设时长对所述响应时间进行区间划分得到,所述n为大于1的正整数。
在本申请提供的方案中,CPU利用率和响应时间都是连续的指标,若直接利用CPU利用率和响应时间确定服务的状态,将会导致服务的状态空间爆炸,即服务的状态存在无穷个,无法准确的确定出服务的状态,而且将会浪费大量的计算资源和存储资源,而若将CPU利用率和响应时间进行区间划分,使其变得离散化,则可以保证只存在有限个状态,从而可以准确的确定出服务相应的状态。
结合第一方面,在第一方面的一种可能的实现方式中,计算设备根据所述状态预测结果,以ε-greedy策略确定所述服务对应的实例扩缩策略,所述ε-greedy策略用于选取所述状态预测结果对应的Q值最大的动作,所述Q值用于指示在给定相应状态和相应动作下的最大未来奖励期望。
在本申请提供的方案中,计算设备在预测出服务在下一个周期的状态之后,可以快速准确的选出服务在该状态下对应的Q值最大的动作,从而确定出实例扩缩策略。
第二方面,提供了一种实例扩缩系统,包括:采集单元,用于获取当前周期服务的指标信息和工作负载数据,所述指标信息用于指示当前服务的状态;工作负载预测单元,用于将所述工作负载数据输入工作负载预测模型,得到工作负载预测结果,其 中,所述工作负载预测模型用于对所述服务接收的工作负载值进行预测,所述工作负载预测结果包括所述服务在下一个周期接收到的平均用户请求速率;状态预测单元,用于将所述指标信息和所述工作负载预测结果输入至状态预测模型,得到状态预测结果,其中,所述状态预测模型用于对服务的状态进行预测,所述预测结果包括所述服务在下一个周期的状态;实例调度单元,用于根据所述状态预测结果,确定所述服务对应的实例扩缩策略,并根据所述实例扩缩策略对所述服务对应的实例进行扩缩。
结合第二方面,在第二方面的一种可能的实现方式中,所述采集单元,还用于获取训练样本,所述训练样本包括所述服务对应的历史指标信息和历史工作负载数据;所述状态预测单元,还用于确定初始状态预测模型,所述初始状态预测模型采用一种神经网络模型;利用所述训练样本对所述初始状态预测模型进行训练以得到所述状态预测模型。
结合第二方面,在第二方面的一种可能的实现方式中,所述指标信息包括CPU利用率和响应时间,所述CPU利用率和所述响应时间用于确定所述服务对应的状态。
结合第二方面,在第二方面的一种可能的实现方式中,所述状态预测单元,还用于根据预设阈值对所述CPU利用率进行区间划分得到m个区间,所述CPU利用率的范围为0-1,所述m为大于1的正整数;根据预设时长对所述响应时间进行区间划分,得到n个区间,所述n为大于1的正整数;根据所述m个CPU利用率区间和所述n个响应时间区间确定所述服务对应的m*n个状态,所述状态预测结果为所述m*n个状态中的其中一个。
结合第二方面,在第二方面的一种可能的实现方式中,所述实例调度单元,具体用于:根据所述状态预测结果,以ε-greedy策略确定所述服务对应的实例扩缩策略,所述ε-greedy策略用于选取所述状态预测结果对应的Q值最大的动作,所述Q值用于指示在给定相应状态和相应动作下的最大未来奖励期望。
第三方面,提供了一种计算设备,所述计算设备包括处理器和存储器,所述存储器用于存储程序代码,所述处理器用于执行所述存储器中的程序代码以实现上述第一方面以及结合上述第一方面中的任意一种实现方式的方法。
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,当该计算机程序被处理器执行时,所述处理器执行上述第一方面以及结合上述第一方面中的任意一种实现方式所提供的用于服务扩缩容的方法。
第五方面,提供了一种计算机程序产品,该计算机程序产品包括指令,当该计算机程序产品被计算机执行时,使得计算机可以执行上述第一方面以及结合上述第一方面中的任意一种实现方式所提供的用于服务扩缩容的方法的流程。
附图说明
为了更清楚地说明本发明实施例技术方案,下面将对实施例描述中所需要使用的 附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种应用Q-Learning算法进行实例扩缩的流程示意图;
图2是本申请实施例提供的一种应用SARSA算法进行实例扩缩的流程示意图;
图3是本申请实施例提供的一种系统架构的示意图;
图4是本申请实施例提供的一种实例扩缩系统的结构示意图;
图5是本申请实施例提供的一种用于服务扩缩容的方法的流程示意图;
图6是本申请实施例提供的一种状态预测模型的结构示意图;
图7是本申请实施例提供的一种算法运行的流程示意图;
图8是本申请实施例提供的一种计算设备的结构示意图。
具体实施方式
下面结合附图对本申请实施例中的技术方案进行清楚、完整的描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
首先,结合附图对本申请中所涉及的部分用语和相关技术进行解释说明,以便于本领域技术人员理解。
云计算(cloud computing)是与信息技术、软件、互联网相关的一种服务,云计算将多个计算资源集合起来,形成计算资源共享池,该计算资源共享池也叫做“云”,通过软件实现自动化管理,用户可以按照需求随时获取“云”上的资源,理论上,“云”上的资源可以无限扩展。
SLA是由服务提供商与客户共同签订的协议,包含服务类型、服务质量和服务性能等条目,能够最大限度的满足用户需求,保证用户满意度。
微服务(microservices)是一个新兴的软件架构,其目的是把一个大型的单个应用程序和服务拆分为数十个的支持微服务。系统中的各个微服务可以被独立部署,各个微服务之间是松耦合的,每个微服务仅关注于完成一件任务,每个任务代表着一个小的业务能力。微服务的策略可以让工作变得更为简便,它可扩展单个组件而不是整个的应用程序堆栈,从而满足SLA。
服务网格(service mesh)是服务间通信的基础设施层,专注于服务之间的通信,使得每个服务实例之间的通信更加流畅、可靠和迅速。它还提供了服务发现、均衡负载、加密、身份鉴定、授权、支持熔断器模式等功能。
虚拟机(virtual machine,VM)是指通过软件模型的具有完整硬件系统功能的、 运行在一个完全隔离环境中的完整计算机系统。实体计算机中能够完成的工作在VM中都能够实现,在创建VM时,需要将实体计算机的部分硬盘和内存容量作为VM的硬盘和容量,每个VM都有独立的硬盘和操作系统等,可以像使用实体机一样对虚拟机进行操作。
容器(container)是计算机操作系统中的一种虚拟化技术,该技术使得进程运行于相对独立和隔离的环境(包含独立的文件系统、命名空间、资源视图等),从而能够简化软件的部署流程,增强软件的可移植性和安全性,提高系统资源的利用率,容器技术广泛应用于云计算领域的服务化场景。
实例(instance)是指一个应用的某个微服务在实例化之后得到的结果。一般一个实例中包含一个或多个容器,用于执行容器所具备的功能。
弹性伸缩(auto-scaling)是云计算中的一种概念,指云平台上应用被部署的系统能动态的根据应用的工作负载确定合适的资源量,然后自动的申请和释放资源。
水平伸缩(horizontal scaling)是云计算中的一种概念,指以实例为单位进行伸缩,可以直接增加或减少实例的数量。
强化学习(reinforcement learning)又称再励学习、评价学习或增强学习,是机器学习的范式和方法论之一,用于描述和解决智能体在与环境的交互过程中通过学习策略以达成回报最大化或实现特定目标的问题。强化学习包含多种典型算法,例如Q学习(Q-learning)算法、状态动作回报状态动作(state action reward state action,SARSA)算法等,其常见模型是标准的马尔可夫决策过程,按给定条件,强化学习可分为基于模型的强化学习和无模式强化学习,以及主动强化学习和被动强化学习。
神经网络(neural network)是一种模仿动物神经网络行为特征,进行分布式并行信息处理的算法数学模型。这种网络依靠系统的复杂程度,通过调整内部大量节点之间相互连接的关系,从而达到处理信息的目的。
差分整合移动平均自回归模型(autoregressive integrated moving average model,ARIMA)又称整合滑动平均自回归模型,是时间序列预测分析方法之一。ARIMA模型包含p、d、q三个主要参数,p为自回归项数,d为使非平稳序列成为平稳序列所做的差分次数(阶数),q为滑动平均项数。
在实例扩缩的场景中,存在多种缩放策略,例如响应式策略和预测式策略,预测式策略由于可以提前进行实例扩缩的决策,能够有效减少响应时间,更好的满足SLA,因此应用更加广泛。此外,强化学习在应用过程中不需要任何的先验知识,具有自适应性和鲁棒性,因此可以基于强化学习方法指定预测式缩放策略,以保证在工作负载动态变化时应用的资源利用率处于一个较为稳定的状态。
如图1所示,是一种应用Q-Learning算法进行实例扩缩的流程示意图。首先对Q表进行初始化,Q表的行表示服务的状态,Q表的列表示在该状态对应的动作,动作具体可以是增加或减少运行该服务的实例的数量,例如增加2个实例、减少2个实例等,Q表中的值表示在某个状态下执行某个动作得到的值,在初始化时,可以将Q 表中的所有值设置为0,当然也可以设置为其它的数值(例如1或2等)。然后根据监控获取的当前周期服务的指标确定当前周期服务对应的状态S,周期的时长为可以根据需要进行设置,例如可以设置为5秒,服务的指标由部署在实例中的监控应用采集得到,实例具体可以是虚拟机、容器等,服务的指标具体可以是CPU的利用率和响应时间,响应时间表示请求到达该服务开始,直到该服务返回结果的这一段时间。接着使用贪心策略(ε-greedy)根据Q表选择状态S下的动作A,ε-greedy的含义是以ε的概率从Q表中选择当前状态S对应的Q值最大的动作,以1-ε的概率从Q表中随机选取动作,ε的值位于0-1之间,可以按照需要设置ε的值,例如将ε设置为0.9。然后执行动作A,计算得到回报值R以及服务下个周期的状态S1。在完成计算之后,利用动态规划方程(例如Bellman方程)更新Q表,具体可以利用下述公式1进行计算并更新。所述公式1为:
Q(S,A)←(1-α)*Q(S,A)+α[R+γ*maxQ a(S1,A1)]    公式1
其中,α为学习率,用于表征对之前训练效果的保留程度,α越大,保留之前的训练效果就越少。R表示回报值(从环境获得的奖励),γ为折现系数。公式1的本质就是利用下个周期的服务的状态S1所对应的最大Q值来更新当前服务的状态S在执行动作A时对应的Q值,应理解,A1是下个周期执行的动作,执行该动作将使得Q值最大,maxQ a(S1,A1)表示执行动作a使得Q值最大,即a就是动作A1。
通过运行Q-Learning算法可以实现实例扩缩,但是Q-Learning算法本身需要多次迭代执行以更新Q表,收敛速度较慢,不能及时确定实例扩缩策略,导致资源调度不及时,不能满足SLA的要求。
如图2所示,是一种应用SARSA算法进行实例扩缩的流程示意图。与Q-Learning算法相似,首先对Q表进行初始化,然后在算法的第一个周期根据监控获取的当前周期服务的指标确定当前周期服务对应的状态S,接着使用ε-greedy策略选择当前状态S下对应的动作A并执行,计算得到回报值R以及服务下个周期的状态S1。然后根据状态S1,使用ε-greedy策略从Q表中选取下个周期的需要执行的动作A1,最后利用动态规划方程更新Q表,并将动作A1确定为下个周期需要执行的动作,具体可以利用下述公式2进行更新。所述公式2为:
Figure PCTCN2021084242-appb-000001
其中,α、R和γ的含义与公式1一致。公式2的本质就是利用下个周期在服务状态S1下执行动作A1得到的Q值来更新当前服务的状态S在执行动作A时对应的Q值,应理解,下个周期所要执行的动作由当前周期确定。
通过运行SARSA算法也可以实现实例扩缩,但是SARSA算法仅仅只考虑了服务的状态受执行的动作的影响,并没有考虑服务的状态受到实例本身以及工作负载的影响,导致当前周期所确定的动作并不适用于下个周期,进而导致所确定的实例扩缩 策略不够准确,不能满足SLA的要求。
另外,不管是上述应用Q-Learning算法或SARSA算法进行实例扩缩,其动作空间都是固定的,即每个状态对应的动作都是5个(-2,-1,0,+1,+2),+2表示增加2个实例、+1表示增加1个实例、0表示实例数量不变,-2表示减少2个实例,-1表示减少1个实例。但是当服务的遇到剧烈增加的负载时,只增加2个实例可能无法有效的降低响应时间,需要执行多次增加2个实例的动作才能够有效降低响应时间以满足要求,这样将会使得资源调度不及时,不能满足SLA的要求。
基于上述,本申请提供了用于服务的扩缩容的方法及相关设备,可以通过负载预测和状态预测,可以及时高效的确定扩缩策略且保证所确定的扩缩策略的准确性,满足SLA的要求。
本申请实施例的技术方案可以应用于各种需要实例扩缩的场景,包括但不限于基于容器部署的应用(例如Bookinfo应用、Hipster-Shop应用等)、基于虚拟机部署的应用、基于物理机部署的应用等。
在一个具体的实施例中,实例扩缩系统可以部署在任意一个涉及实例扩缩的计算设备中。例如,如图3所示,可以部署在云环境上的一个或多个计算设备中(例如中心服务器),或者边缘环境中的一个或多个计算设备中(例如服务器)。其中,云环境是指云服务器提供商拥有的,用于提供计算、存储、通信资源的中心计算设备集群,具备较大的存储资源和计算资源;边缘环境是指在地理位置上距离终端设备较近的,用于提供计算、存储、通信资源的边缘计算设备集群。
实例扩缩系统用于采集服务的各项指标和工作负载,预测得到下个周期的工作负载以及服务的状态,从而及时准确的确定实例扩缩策略并进行扩缩。实例扩缩系统内部的单元可以由多种划分方式,本申请对此不作限制。图4为一种示例性的划分方式,如图4所示,下面将分别简述每个功能单元的功能。
所示实例扩缩系统400包括多个功能单元,其中,采集单元410,用于从容器云集群或虚拟机集群中采集服务运行时的各种指标和工作负载值,例如服务响应时间、CPU利用率、内存使用率等,容器云集群提供用于构建、发布和运行容器化服务的平台,并允许开发者或管理员对容器进行管理和维护。工作负载预测单元420,用于对采集单元410采集到的工作负载值进行动态拟合,并预测出下个周期的工作负载值;状态预测单元430,用于根据采集单元410采集到的指标和工作负载预测单元420预测得到的工作负载值对服务的状态进行预测,得到下个周期服务的状态;实例调度单元440,用于根据状态预测单元430预测得到的状态确定实例扩缩策略,并执行该策略以完成实例扩缩。
本申请中,实例扩缩系统400可以为软件系统,其内部包括的各部分以及功能单元部署在硬件设备上的形式比较灵活。
下面对本申请实施例提供的用于服务扩缩容的方法及相关设备进行描述。参见图5,图5为本申请实施例提供的一种用于服务扩缩容的方法的流程示意图。如图5所示,该方法包括但不限于以下步骤:
S501:计算设备获取当前周期服务运行时的指标信息和工作负载数据。
具体地,该计算设备部署有上述图4所示的实例扩缩系统400。计算设备可以从容器云集群中采集指标信息,容器云集群中运行有多种应用,每个应用包括一个或多个服务,分别运行在不同的容器中,一个物理机上可以部署多个容器。计算设备中的采集单元410可以实时或周期性采集指标信息和工作负载数据,采集周期可以根据需要进行设置,例如可以设置为5秒。采集到的指标信息包括当前周期的CPU利用率、内存利用率、响应时间、实例(容器)数量等,采集到的工作负载为流量数据集。
值得说明的是,CPU利用率可以真实有效的反应服务的资源利用情况,响应时间可以直观反应用户的体验,是判断服务是否满足SLA的重要依据,因此,本申请利用CPU利用率和响应时间确定服务的状态。应理解,CPU利用率和响应时间都是连续的指标,若直接使用CPU利用率和响应时间来构造服务的状态,将会构造得到无穷个状态,这样将导致耗费大量的资源管理和后续预测服务的状态,导致系统崩溃。所以,需要对CPU利用率和响应时间进行离散化处理,以使得所构造的状态空间所包含的服务的状态为确定且有限的。
在一种可能的实现方式中,根据预设阈值对CPU利用率进行区间划分,得到m个区间,其中,所述CPU利用率的取值范围为0-1,所述m为大于1的正整数;根据预设时长对响应时间进行区间划分,得到n个区间,其中,所述n为大于1的正整数;所述m个CPU利用率区间和所述n个响应时间区间确定服务对应的m*n个状态。
可选的,在对CPU利用率进行区间划分时,可以按照等间隔划分,例如可以将CPU利用率划分为[0,0.2]、[0.2,0.4]、[0.4,0.6]、[0.6,0.8]以及[0.8,1],即m的值为5,每个区间大小都为0.2,当然区间大小也可以设置为其它值,本申请对此不作限定。在对响应时间进行区间划分时,由于响应时间可以达到无穷大(在应用卡死或系统崩溃的情况下),所以不能简单进行等间隔划分,而是需要利用一定的划分策略进行划分,例如可以将响应时间划分为[0,100ms]、[100ms,250ms]、[250ms,500ms]、[500ms,1000ms]、[1000ms,∞],100ms表示100毫秒,∞表示无穷大,这样将响应时间划分为了5个,即n的值为5,当然也可以按照其它方式进行划分,本申请对此不作限定。
可以看出,通过上述划分方法,将连续的CPU利用率和响应时间划分为了不同的区间,这样就可以得到有限个状态,避免状态空间爆炸。示例性的,当采集单元410采集到当前的CPU利用率为0.772,响应时间为291ms时,则可以确定当前周期服务的状态为区间[0.6,0.8]和区间[250ms,500ms]所确定的状态。
S502:计算设备输入所述工作负载数据至工作负载预测模型,得到工作负载预测结果。
具体地,工作负载预测模型在用于对当前工作负载数据进行预测之前,需要对历史的工作负载数据进行拟合,以使得工作负载预测模型具备预测下个周期的工作负载值的能力。应理解,在实际应用中,工作负载(用户请求速率)是复杂多变且受多种因素影响的,因此工作负载预测模型应该选择用于非平稳时间序列预测的模型,例如 ARIMA模型。
ARIMA模型是统计模型中用来进行时间序列预测的模型,利用一个固定大小的队列类型的时间序列来动态拟合,队列的长度可以根据需要进行设置,例如可以设置为50,本申请对此不作限定。本申请中,利用采集单元410采集到的历史工作负载数据(每个周期的平均用户请求速率)来拟合模型。应理解,采集单元410采集到的历史工作负载数据(时间序列)可能不是稳定的,这时不能直接进行模型拟合,需要对其做进一步处理,使得其变为稳定的序列以满足模型拟合的要求,若一个序列是平稳的,则该序列的均值、方差和协方差不会发生明显的变化。
具体地,在得到历史工作负载数据之后,首先检验其是否为平稳的,可以利用扩展迪基-福勒(augmented dickey-fuller,ADF)检验法测试其是否存在单位根(unit root),若存在单位根,则表示该序列是不平稳的,需要进行差分处理,以使得序列变为平稳,从而满足模型拟合的要求;若存在单位根,则表示该序列是平稳的,可以直接进行模型拟合。一般来说,检测得到的ADF的值越小,越说明序列不存在单位根,序列越平稳。在对序列进行差分处理时,序列由不平稳变得平稳所经过的最少差分次数定义为ARIMA模型的参数d。在得到d之后,利用偏自相关函数(partial autocorrelation function,PACF)确定模型中的自回归阶数p,PACF用于描述在给定中间观测值的条件下时间序列观测值与过去的观测值之间的线性相关性,p表示模型中所采用的时序数据本身的滞后数;利用自相关函数(autocorrelation function,ACF)确定移动平均阶数q,ACF用于描述时间序列观测值与过去的观测值之间的线性相关性,q表示模型中采用的预测误差的滞后数。
确定了p和q之后,将差分后稳定的数据代入到自回归滑动平均(autoregressive moving average model,ARMA)模型中进行拟合,得到下述公式3。所述公式3为:
Y t=β 01Y t-1+…+β pγ t-pt1ε t-1+…+α qε t-q    公式3
其中,Y t表示负载预测值,其受自身变化的影响,根据回归分析可得到公式3中关于观测值的多项式,ε t表示误差,其在不同时期具有依存关系,对应公式3中关于误差的多项式。由于是对历史工作负载数据进行差分之后进行ARMA模型拟合,因此在利用公式3求得Y t之后,需要对Y t进行逆差分操作,从而最终得到下个周期的工作负载预测值。
可选的,本申请在利用ARIMA模型进行拟合时,可以实时动态拟合,例如当采集单元410采集到当前周期的工作负载值(即平均用户请求速率)时,将其加入到用于模型拟合的时间序列中,并按照时间先后顺序从该时间序列中找出并丢弃最早采集的历史工作负载值,利用更新后的时间序列进行模型拟合,保证预测的准确性。当然,也可以是经过预设时长,将采集单元410采集到的多个周期的工作负载值(例如最近5个周期的工作负载值)加入上述时间序列中,并按照时间先后顺序从该时间序列中丢弃相应数量的历史工作负载值,然后进行模型拟合,这样可以减轻计算设备的计算压力,提高计算设备的资源利用效率。
S503:计算设备输入所述指标信息和所述工作负载预测结果至状态预测模型,得到状态预测结果。
具体地,本申请中的状态预测模型是一种已训练完成的神经网络模型,例如循环神经网络(recurrent neural networks,RCNN)、递归神经网络(recursive neural networks,RNN)、卷积神经网络(convolutional neural networks,CNN)等,状态预测模型在用于对服务的状态进行预测之前需要进行训练,以使其具有预测服务的状态的能力。在训练过程中,需要使用特别的训练数据进行训练,从模型能力需求出发进行分析,需要使用采集单元410采集的历史样本数据进行训练,样本数据包括采集单元预先采集的每个历史周期服务的CPU利用率、内存利用率、响应时间、实例数量、工作负载值以及针对每个历史周期利用工作负载预测模型预测得到的工作负载预测值。
此外,在选取训练样本时,需要保证样本的全面性,保证均匀的获取到各个场景下的训练样本,例如工作负载较大实例数量较多的情况、工作负载较大实例数量较少的情况、工作负载较小实例数量较多的情况、工作负载较小实例数量较小的情况等。容易理解,通过使用较为全面的样本数据进行训练,可以避免训练得到的状态预测模型具有倾向性失去通用性。
在获取样本数据之后,首先确定初始状态预测模型为一种神经网络模型,如图6所示,本申请的初始状态预测模型600的结构主要包括三部分,即输入层610、隐藏层620和输出层630。然后将初始状态预测模型600的参数初始化,之后将样本数据输入至输入层610,输入层610对样本数据进行处理之后将其传送给隐藏层620,隐藏层620中的特征提取单元621对输入的样本数据进行特征提取和识别,然后输入至预测单元622,预测单元622预测出下个周期的CPU利用率和响应时间,损失函数计算单元623根据预测单元622预测得到的结果计算出损失函数,以损失函数为目标函数使用反向传播算法更新调整模型中的参数,输出层630将隐藏层620预测得到的响应时间预测值和CPU利用率预测值进行输出。依次输入不同的训练样本,不断迭代执行上述训练过程,直至损失函数值收敛时,即每次计算得到的损失函数值在某一个值附件上下波动,则停止训练,此时,状态预测模型已经训练完成,即状态预测模型已经具备预测下个周期服务的状态的功能。
应理解,服务的状态由响应时间和CPU利用率确定,因此,状态预测模型输出的为下个周期服务的响应时间和CPU利用率。另外,需要说明的是,本申请提供状态预测模型还支持在使用过程中动态训练,以使得模型贴合实际情况,提高预测精度,例如,若在实际应用场景中,负载具有倾向性(负载一直偏高或偏低等),那么对之前已经训练好的模型继续进行动态训练,可以使得模型更加适用于当前场景。
在实际应用中,状态预测模型的输入层610的神经元个数为7,分别为当前周期的CPU利用率、内存利用率、响应时间、实例数量、工作负载值、下个周期的工作负载预测值和一个偏置项(给网络分类增加平移的能力,使得模型拟合效果更好);输出层630的神经元个数为2,分别为响应时间预测值和CPU利用率预测值;隐藏层620的层数以及每一层的神经元数量可以灵活设置,通过大量实验和效果比对,本申请优选隐藏层神经元个数为12。
S504:计算设备根据状态预测结果,确定服务对应的实例扩缩策略。
具体地,利用状态预测模型得到响应时间预测值和CPU利用率预测值之后,可以确定服务在下个周期所对应的状态。然后根据确定的状态,使用ε-greedy策略从Q表中选择该状态下对应的动作,该动作即为确定的实例扩缩策略,例如增加5个实例、减少2个实例等。
应理解,上述Q表是执行改进后的SARSA算法得到的结果,在算法收敛之前,Q表中的值需要迭代执行上述算法以更新Q表中的值和动作空间,直至算法收敛,此时,Q表已经稳定,可以直接用于确定实例扩缩策略。
示例性的,在运行算法之前,对Q表进行初始化,将Q表中的各个项的值设为0,Q表的行代表服务的不同状态,Q表的列表示不同的动作,每个状态对应的动作都为5(-2,-1,0,+1,+2),Q(S,A)表示在状态S下执行动作A得到的值。该算法具体运行流程如图7所示,包括以下步骤:
S701:确定算法第一个周期的服务的状态S,从Q表中选择需要执行的动作。
具体地,利用采集单元410采集当前周期的响应时间和CPU利用率,从而确定服务的状态S,然后选择在状态S下最大的Q值所对应的动作,若存在多个Q值并列最大时,从中随机选择一个Q值对应的动作。
S702:执行上个周期选择的动作A,计算得到执行动作后的回报值R并确定当前周期的状态S1。
具体地,为了避免出现SLA违约并尽可能提高CPU利用率,在构造回报值函数时需要同时考虑响应时间、SLA规定的响应时间阈值以及CPU利用率,可以利用下述公式4计算回报值。所述公式4为:
Figure PCTCN2021084242-appb-000002
其中,R表示回报值,ρ表示CPU利用率,p是设定的常数,用于控制响应时间对回报值的影响,p越大则响应时间对回报值的影响就越大,一般设置为2,a表示响应时间,b表示SLA规定的响应时间。
可以看出,当响应时间大于SLA规定的响应时间,回报值一定为负,当响应时间小于SLA规定的响应时间,回报值一定为正。
此外,在执行完上个周期所确定的动作之后,可以通过测量当前周期的响应时间和CPU利用率得到当前周期的状态S1。
S703:更新Q(S,A)的值。
具体地,在确定当前周期的状态S1之后,利用ε-greedy策略选取动作A1,然后再根据计算得到的回报值,代入上述公式1,完成对Q(S,A)的更新。
S704:根据当前周期的状态S1,判断是否增加服务在状态S下的动作空间。
具体地,若上个周期的动作(即动作A)是增加最大数量的实例,例如动作A为增加2个实例,但是当前周期的响应时间依然超过SLA规定的响应时间,则说明 上个周期增加的实例数量不够,因此,需要增加服务在状态S下对应的动作空间,例如可以增加2个动作,此时服务在状态S下对应的动作空间为(-3,-2,-1,0,+1,+2,+3),新增加的动作对应的Q值可以初始化为0。
可以理解,通过增加动作空间,能够避免多次重复调度,节省系统资源开销,同时保证资源调度更加及时。
S705:使用负载预测模型对下个周期的负载进行预测,得到工作负载预测值,并根据工作负载预测值和当前周期的各个指标利用状态预测模型预测得到下个周期服务的状态,使用ε-greedy策略选取动作A2。
S706:判断算法是否收敛,若算法收敛,则停止更新Q表,若算法未收敛,则继续重复执行步骤S702-S706。
具体地,在算法未收敛之前,Q表中的项将不断被更新,算法在运行过程中将要耗费大量的资源,为了提高整个系统的资源利用率,需要设定终止条件,当满足终止条件时,则可以确定算法已经收敛,可以停止运行该算法。
可选的,当算法运行周期数量超过预设周期数量时,则判定算法已经收敛,预设周期数量可以按照需要进行设置,例如设置为500。或者是,当Q表中的每个项都被更新了N次及以上,则判定算法已经收敛,N的值可以为3。或者是,每隔固定周期数(例如50个周期)进行Q表更新状况检测,计算Q表中所有被更新的项的绝对值与该项在更新前的绝对值的差值,若某些项存在被多次更新的情况,则计算最后一次更新后的项的绝对值与检测之前(50个周期前)的项的绝对值的差值,若这些差值均小于原来Q表中对应项的绝对值的百分之一,且Q表中不存在从算法运行开始直到当前周期为止没有被更新过的项,则判定算法已经收敛。
应理解,算法的收敛条件还可以通过其它方式进行设定,本申请对此不作限定。
S505:计算设备根据确定的实例扩缩策略,对所述服务对应的实例进行扩缩。
具体地,计算设备确定了服务需要扩缩的实例数量之后,将通过调用容器云集群暴露的对外接口,例如应用程序接口(application programming interface,API),将服务需要扩缩的实例数量传给复制控制器,由复制控制器增减服务的实例数量。
需要说明的是,图5所述的方法是针对每一个服务进行实例扩缩,若要对应用完成实例扩缩,则需要对组成该应用的每个服务分别执行上述S501-S505的步骤,从而完成对整个应用的实例扩缩。
上述详细阐述了本申请实施例的方法,为了便于更好的实施本申请实施例的上述方案,相应地,下面还提供用于配合实施上述方案的相关设备。
如图4所示,本申请还提供一种实例扩缩系统,该实例扩缩系统用于执行前述用于服务扩缩容的方法。本申请对该实例扩缩系统中的功能单元的划分不作限定,可以根据需要对该实例扩缩系统中的各个单元进行增加、减少或合并。图4示例性的提供了一种功能单元的划分:
实例扩缩系统400包括采集单元410、工作负载预测单元420、状态预测单元430以及实例调度单元440。
具体地,所述采集单元410用于执行前述步骤S501,且可选的执行前述步骤中 可选的方法。
所述工作负载预测单元420用于执行前述步骤S502,且可选的执行前述步骤中可选的方法。
所述状态预测单元430用于执行前述步骤S503,且可选的执行前述步骤中可选的方法。
所述实例调度单元440用于执行前述步骤S504和S505,且可选的执行前述步骤中可选的方法。
上述四个单元之间互相可以通过通信通路进行数据传输,应理解,实例扩缩系统400包括的各单元可以为软件单元、也可以为硬件单元、或部分为软件单元部分为硬件单元。
参见图8,图8是本申请实施例提供的一种计算设备的结构示意图。如图8所示,该计算设备800包括:处理器810、通信接口820以及存储器830,所述处理器810、通信接口820以及存储器830通过内部总线840相互连接。应理解,该计算设备800可以是云计算中的计算设备,或边缘环境中的计算设备。
所述处理器810可以由一个或者多个通用处理器构成,例如中央处理器(central processing unit,CPU),或者CPU和硬件芯片的组合。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC)、可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD)、现场可编程逻辑门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合。
总线840可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。所述总线840可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但不表示仅有一根总线或一种类型的总线。
存储器830可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM);存储器130也可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM)、快闪存储器(flash memory)、硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器830还可以包括上述种类的组合。
需要说明的是,计算设备800的存储器830中存储了实例扩缩系统400的各个单元对应的代码,处理器810执行这些代码实现了实例扩缩系统400的各个单元的功能,即执行了S501-S505的方法。
本申请还提供一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,当该计算机程序被处理器执行时,可以实现上述方法实施例中记载的任意一种的部分或全部步骤。
本发明实施例还提供一种计算机程序,该计算机程序包括指令,当该计算机程序被计算机执行时,使得计算机可以执行任意一种用于服务扩缩容的方法的部分或全部 步骤。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。

Claims (12)

  1. 一种用于服务扩缩容的方法,其特征在于,包括:
    获取当前周期服务的指标信息和工作负载数据,所述指标信息用于指示当前服务的状态;
    输入所述工作负载数据至工作负载预测模型,得到工作负载预测结果,其中,所述工作负载预测模型用于对所述服务接收的工作负载值进行预测,所述工作负载预测结果包括所述服务在下一个周期接收到的平均用户请求速率;
    输入所述指标信息和所述工作负载预测结果至状态预测模型,得到状态预测结果,其中,所述状态预测模型用于对服务的状态进行预测,所述预测结果包括所述服务在下一个周期的状态;
    根据所述状态预测结果,确定所述服务对应的实例扩缩策略,并根据所述实例扩缩策略对所述服务对应的实例进行扩缩。
  2. 如权利要求1所述的方法,其特征在于,所述状态预测模型在用于对所述服务的状态进行预测之前,所述方法还包括:
    确定初始状态预测模型,所述初始状态预测模型采用一种神经网络模型;
    获取训练样本,所述训练样本包括所述服务对应的历史指标信息和历史工作负载数据;
    利用所述训练样本对所述初始状态预测模型进行训练以得到所述状态预测模型。
  3. 如权利要求1或2所述的方法,其特征在于,所述指标信息包括中央处理器CPU利用率和响应时间,所述CPU利用率和所述响应时间用于确定所述服务所对应的状态。
  4. 如权利要求3所述的方法,其特征在于,所述CPU利用率和所述响应时间用于确定所述应用所对应的状态包括:
    根据m个CPU利用率区间和n个响应时间区间确定所述服务对应的m*n个状态,所述状态预测结果为所述m*n个状态中的其中一个,其中,
    所述m个CPU利用率区间为根据预设阈值对所述CPU利用率进行区间划分得到,所述CPU利用率的范围为0-1,所述m为大于1的正整数;
    所述n个响应时间区间为根据预设时长对所述响应时间进行区间划分得到,所述n为大于1的正整数。
  5. 如权利要求1-4任一项所述的方法,其特征在于,所述根据所述状态预测结果,确定所述服务对应的实例扩缩策略,包括:
    根据所述状态预测结果,以ε-greedy策略确定所述服务对应的实例扩缩策略,所述ε-greedy策略用于选取所述状态预测结果对应的Q值最大的动作,所述Q值用于指示在给定相应状态和相应动作下的最大未来奖励期望。
  6. 一种实例扩缩系统,其特征在于,包括:
    采集单元,用于获取当前周期服务的指标信息和工作负载数据,所述指标信息用于指示当前服务的状态;
    工作负载预测单元,用于将所述工作负载数据输入工作负载预测模型,得到工作负载预测结果,其中,所述工作负载预测模型用于对所述服务接收的工作负载值进行预测,所述工作负载预测结果包括所述服务在下一个周期接收到的平均用户请求速率;
    状态预测单元,用于将所述指标信息和所述工作负载预测结果输入至状态预测模型,得到状态预测结果,其中,所述状态预测模型用于对服务的状态进行预测,所述预测结果包括所述服务在下一个周期的状态;
    实例调度单元,用于根据所述状态预测结果,确定所述服务对应的实例扩缩策略,并根据所述实例扩缩策略对所述服务对应的实例进行扩缩。
  7. 如权利要求6所述的实例扩缩系统,其特征在于,
    所述采集单元,还用于获取训练样本,所述训练样本包括所述服务对应的历史指标信息和历史工作负载数据;
    所述状态预测单元,还用于确定初始状态预测模型,所述初始状态预测模型采用一种神经网络模型;利用所述训练样本对所述初始状态预测模型进行训练以得到所述状态预测模型。
  8. 如权利要求6或7所述的实例扩缩系统,其特征在于,所述指标信息包括CPU利用率和响应时间,所述CPU利用率和所述响应时间用于确定所述服务对应的状态。
  9. 如权利要求8所述的实例扩缩系统,其特征在于,
    所述状态预测单元,还用于根据预设阈值对所述CPU利用率进行区间划分得到m个区间,所述CPU利用率的范围为0-1,所述m为大于1的正整数;根据预设时长对所述响应时间进行区间划分,得到n个区间,所述n为大于1的正整数;根据所述m个CPU利用率区间和所述n个响应时间区间确定所述服务对应的m*n个状态,所述状态预测结果为所述m*n个状态中的其中一个。
  10. 如权利要求6-9任一项所述的实例扩缩系统,其特征在于,所述实例调度单元,具体用于:
    根据所述状态预测结果,以ε-greedy策略确定所述服务对应的实例扩缩策略,所述ε-greedy策略用于选取所述状态预测结果对应的Q值最大的动作,所述Q值用于指示在给定相应状态和相应动作下的最大未来奖励期望。
  11. 一种计算设备,其特征在于,所述计算设备包括存储器和处理器,所述处理器执行存储器存储的计算机指令,使得所述计算设备执行权利要求1-5任一项所述的 方法。
  12. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,当所述计算机程序被处理器执行时,所述处理器执行权利要求1-5任一项所述的方法。
PCT/CN2021/084242 2020-03-31 2021-03-31 一种用于服务的扩缩容的方法及相关设备 WO2021197364A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010245634.9 2020-03-31
CN202010245634.9A CN112000459B (zh) 2020-03-31 2020-03-31 一种用于服务的扩缩容的方法及相关设备

Publications (1)

Publication Number Publication Date
WO2021197364A1 true WO2021197364A1 (zh) 2021-10-07

Family

ID=73461736

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084242 WO2021197364A1 (zh) 2020-03-31 2021-03-31 一种用于服务的扩缩容的方法及相关设备

Country Status (2)

Country Link
CN (1) CN112000459B (zh)
WO (1) WO2021197364A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115242648A (zh) * 2022-07-19 2022-10-25 北京百度网讯科技有限公司 扩缩容判别模型训练方法和算子扩缩容方法
CN115671716A (zh) * 2022-12-28 2023-02-03 北京海誉动想科技股份有限公司 预加载实例申请的处理方法、装置、存储介质及电子设备
CN116225696A (zh) * 2023-02-06 2023-06-06 北京邮电大学 用于流处理系统的算子并发度调优方法及装置
WO2023103865A1 (zh) * 2021-12-07 2023-06-15 中兴通讯股份有限公司 容器预加载方法、电子设备及存储介质
CN117455205A (zh) * 2023-12-25 2024-01-26 中国移动通信集团设计院有限公司 资源需求预测模型训练方法、系统及资源需求预测方法

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112000459B (zh) * 2020-03-31 2023-06-27 华为云计算技术有限公司 一种用于服务的扩缩容的方法及相关设备
CN112953845B (zh) * 2021-02-04 2023-04-07 中国工商银行股份有限公司 分布式系统流量控制方法及装置
CN112926868B (zh) * 2021-03-11 2024-04-09 郑州畅威物联网科技有限公司 调压设备负载状态评估方法、设备及可读存储介质
CN113032157B (zh) * 2021-05-31 2021-08-24 睿至科技集团有限公司 一种服务器自动智能扩缩容方法及系统
US20230025434A1 (en) * 2021-07-21 2023-01-26 International Business Machines Corporation Hybrid computing system management
CN113515382B (zh) * 2021-07-22 2024-04-09 中移(杭州)信息技术有限公司 云资源的分配方法、装置、电子设备及存储介质
WO2023048609A1 (en) * 2021-09-27 2023-03-30 Telefonaktiebolaget Lm Ericsson (Publ) Device and method for scaling microservices
CN114138473A (zh) * 2021-11-23 2022-03-04 西安电子科技大学 一种基于混合模式的弹性调度装置及方法
CN113886095A (zh) * 2021-12-08 2022-01-04 北京广通优云科技股份有限公司 一种基于模糊推理与强化学习结合的容器内存弹性伸缩方法
CN115017003B (zh) * 2021-12-22 2023-05-30 荣耀终端有限公司 负载预测方法和负载预测装置
CN117170855A (zh) * 2022-05-27 2023-12-05 华为云计算技术有限公司 一种业务集群的伸缩方法及相关设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120254437A1 (en) * 2011-04-04 2012-10-04 Robert Ari Hirschfeld Information Handling System Application Decentralized Workload Management
CN109800075A (zh) * 2017-11-16 2019-05-24 航天信息股份有限公司 集群管理方法及装置
CN109995583A (zh) * 2019-03-15 2019-07-09 清华大学深圳研究生院 一种延迟保证的nfv云平台动态扩缩容方法及系统
CN110149396A (zh) * 2019-05-20 2019-08-20 华南理工大学 一种基于微服务架构的物联网平台构建方法
CN110275758A (zh) * 2019-05-09 2019-09-24 重庆邮电大学 一种虚拟网络功能智能迁移方法
CN110457287A (zh) * 2019-07-03 2019-11-15 北京百度网讯科技有限公司 数据库的扩缩容处理方法及装置、计算机设备及可读介质
CN112000459A (zh) * 2020-03-31 2020-11-27 华为技术有限公司 一种用于服务的扩缩容的方法及相关设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104065663A (zh) * 2014-07-01 2014-09-24 复旦大学 一种基于混合云调度模型的自动伸缩、费用优化的内容分发服务方法
US20180097744A1 (en) * 2016-10-05 2018-04-05 Futurewei Technologies, Inc. Cloud Resource Provisioning for Large-Scale Big Data Platform
CN109787855A (zh) * 2018-12-17 2019-05-21 深圳先进技术研究院 基于马尔可夫链和时间序列模型的服务器负载预测方法及系统
CN110418416B (zh) * 2019-07-26 2023-04-18 东南大学 移动边缘计算系统中基于多智能体强化学习的资源分配方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120254437A1 (en) * 2011-04-04 2012-10-04 Robert Ari Hirschfeld Information Handling System Application Decentralized Workload Management
CN109800075A (zh) * 2017-11-16 2019-05-24 航天信息股份有限公司 集群管理方法及装置
CN109995583A (zh) * 2019-03-15 2019-07-09 清华大学深圳研究生院 一种延迟保证的nfv云平台动态扩缩容方法及系统
CN110275758A (zh) * 2019-05-09 2019-09-24 重庆邮电大学 一种虚拟网络功能智能迁移方法
CN110149396A (zh) * 2019-05-20 2019-08-20 华南理工大学 一种基于微服务架构的物联网平台构建方法
CN110457287A (zh) * 2019-07-03 2019-11-15 北京百度网讯科技有限公司 数据库的扩缩容处理方法及装置、计算机设备及可读介质
CN112000459A (zh) * 2020-03-31 2020-11-27 华为技术有限公司 一种用于服务的扩缩容的方法及相关设备

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023103865A1 (zh) * 2021-12-07 2023-06-15 中兴通讯股份有限公司 容器预加载方法、电子设备及存储介质
CN115242648A (zh) * 2022-07-19 2022-10-25 北京百度网讯科技有限公司 扩缩容判别模型训练方法和算子扩缩容方法
CN115671716A (zh) * 2022-12-28 2023-02-03 北京海誉动想科技股份有限公司 预加载实例申请的处理方法、装置、存储介质及电子设备
CN116225696A (zh) * 2023-02-06 2023-06-06 北京邮电大学 用于流处理系统的算子并发度调优方法及装置
CN117455205A (zh) * 2023-12-25 2024-01-26 中国移动通信集团设计院有限公司 资源需求预测模型训练方法、系统及资源需求预测方法
CN117455205B (zh) * 2023-12-25 2024-04-19 中国移动通信集团设计院有限公司 资源需求预测模型训练方法、系统及资源需求预测方法

Also Published As

Publication number Publication date
CN112000459A (zh) 2020-11-27
CN112000459B (zh) 2023-06-27

Similar Documents

Publication Publication Date Title
WO2021197364A1 (zh) 一种用于服务的扩缩容的方法及相关设备
Toka et al. Machine learning-based scaling management for kubernetes edge clusters
CN109324875B (zh) 一种基于强化学习的数据中心服务器功耗管理与优化方法
Abdullah et al. Burst-aware predictive autoscaling for containerized microservices
Shahin Automatic cloud resource scaling algorithm based on long short-term memory recurrent neural network
Tran et al. A proactive cloud scaling model based on fuzzy time series and SLA awareness
US20210263663A1 (en) Predictive allocation of ephemeral containers for cloud computing services
US20230117088A1 (en) Method and device for improving performance of data processing model, storage medium and electronic device
Dogani et al. Multivariate workload and resource prediction in cloud computing using CNN and GRU by attention mechanism
Nguyen et al. Scaling upf instances in 5g/6g core with deep reinforcement learning
Shang A dynamic resource allocation algorithm in cloud computing based on workflow and resource clustering
Dogani et al. K-agrued: a container autoscaling technique for cloud-based web applications in kubernetes using attention-based gru encoder-decoder
Zhang et al. Service workload patterns for Qos-driven cloud resource management
CN107608781B (zh) 一种负载预测方法、装置以及网元
Panwar et al. Dynamic resource provisioning for service-based cloud applications: A Bayesian learning approach
CN110796591A (zh) 一种gpu卡的使用方法及相关设备
Liu et al. ScaleFlux: Efficient stateful scaling in NFV
da Silva et al. Online machine learning for auto-scaling in the edge computing
US11651271B1 (en) Artificial intelligence system incorporating automatic model updates based on change point detection using likelihood ratios
US11636377B1 (en) Artificial intelligence system incorporating automatic model updates based on change point detection using time series decomposing and clustering
CN115913967A (zh) 一种云环境下基于资源需求预测的微服务弹性伸缩方法
US20230116810A1 (en) Automated predictive infrastructure scaling
KR20230089509A (ko) Bi-LSTM 기반 웹 애플리케이션 워크로드 예측 방법 및 장치
WO2019153188A1 (en) Gpu power modeling using system performance data
Shim et al. Predictive Auto-scaler for Kubernetes Cloud

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21778987

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21778987

Country of ref document: EP

Kind code of ref document: A1