CN114780244A - Container cloud resource elastic allocation method and device, computer equipment and medium - Google Patents

Container cloud resource elastic allocation method and device, computer equipment and medium Download PDF

Info

Publication number
CN114780244A
CN114780244A CN202210474031.5A CN202210474031A CN114780244A CN 114780244 A CN114780244 A CN 114780244A CN 202210474031 A CN202210474031 A CN 202210474031A CN 114780244 A CN114780244 A CN 114780244A
Authority
CN
China
Prior art keywords
request
predicted
unit
container cloud
arrival rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210474031.5A
Other languages
Chinese (zh)
Inventor
周起如
耿伟
眭小红
谷国栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Sunwin Intelligent Co Ltd
Original Assignee
Shenzhen Sunwin Intelligent Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Sunwin Intelligent Co Ltd filed Critical Shenzhen Sunwin Intelligent Co Ltd
Priority to CN202210474031.5A priority Critical patent/CN114780244A/en
Publication of CN114780244A publication Critical patent/CN114780244A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses a container cloud resource elastic allocation method, a device, computer equipment and a medium, wherein the method comprises the following steps: collecting a user request arrival rate of a container cloud processing platform; capturing a Qos index of an application related to a user request, wherein the Qos index comprises a CPU (Central processing Unit) utilization rate, a memory utilization rate and request response data of a Pod service processing unit; predicting the request arrival rate at the next moment by adopting a prediction model according to the request arrival rate; calculating the predicted number of Pod service units required according to the predicted request arrival rate at the next moment; and performing resource allocation according to the predicted required Pod service unit quantity. According to the method and the device, the resource is flexibly distributed according to the predicted quantity of the Pod service units, so that the application resources deployed on the container cloud cluster are distributed in advance before the peak comes, the response time is reduced, the user access experience and the stability of platform service are improved, and the resource waste is avoided.

Description

Container cloud resource elastic allocation method and device, computer equipment and medium
Technical Field
The invention relates to the field of cloud computing, in particular to a container cloud resource elastic allocation method, a container cloud resource elastic allocation device, computer equipment and a medium.
Background
The rapid development of urbanization leads to increasingly prominent disaster safety problems, the total amount of urban disaster accidents and the damage degree are greatly increased, and the improvement of urban disaster prevention and control capability becomes one of important missions in urban development. The cloud resource elastic distribution technology is one of key technologies of a disaster emergency cloud computing center, and by establishing a reasonable resource scheduling and distribution method, the problems of stability, performance and the like of application in a high-concurrency scene can be effectively solved.
With the exponential increase of the data scale and the user request quantity, higher service quality requirements are put forward for the disaster emergency cloud computing center. The container cloud is one of mainstream cloud computing platforms supporting high concurrent access, with the capability of providing a more lightweight virtualization solution. Compared with the traditional virtualization cloud, the container cloud has the advantages of being smaller in system overhead, higher in application deployment starting speed and the like. Investigations have shown that 75% of page visitors will not revisit the web site beyond a load time of 4 s.
kubernets are used as a current mainstream container cloud resource management platform, a built-in response type HPA elastic expansion strategy has the problems of resource over-supply, resource under-supply, response delay and the like, the over-supply can increase the cost and the resource waste, the under-supply and the response delay can cause the default of a Service Level Agreement (SLA), the quality of service (Qos) is reduced, and the user access experience is influenced.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a container cloud resource elastic allocation method, a container cloud resource elastic allocation device, a container cloud resource elastic allocation computer device and a container cloud resource elastic allocation medium, so that the user access experience and the stability of platform services are ensured, and the resource waste is avoided.
In order to achieve the purpose, the invention adopts the following technical scheme:
in one aspect, a container cloud resource elastic allocation method includes:
acquiring a user request arrival rate of a container cloud processing platform;
capturing Qos indexes of applications related to user requests, wherein the Qos indexes comprise CPU utilization rate, memory utilization rate and request response data of a Pod service processing unit;
predicting the request arrival rate at the next moment by adopting a prediction model according to the request arrival rate;
calculating the quantity of Pod service units required by prediction according to the predicted request arrival rate at the next moment;
and performing resource allocation according to the predicted required Pod service unit quantity.
The further technical scheme is as follows: the capturing of the Qos index of the application related to the user request comprises the following steps:
collecting CPU utilization rate and memory utilization rate data by using a cAdviror assembly in the kubelet assembly;
acquiring request response time data through prometheus;
converting the request response time data collected by the promemeus into a format which can be recognized by a kubernetese API interface by adopting a promemeus adapter;
and converting the request response time data converted into a format which can be identified by the kubernets API into data compatible with the collection of CPU utilization rate and memory utilization rate data by the cAdvisor component through the index aggregator.
The further technical scheme is as follows: and predicting the request arrival rate at the next moment by adopting a prediction model according to the request arrival rate, wherein the prediction model is a Prophet-TCN mixed model.
The further technical scheme is as follows: the calculation formula of the Prophet-TCN mixed model is as follows:
Yt=αpt+(1-α)Ntwherein Y istIndicates the final predicted value, PtRepresents the predicted value of the Prophet model, NtAnd expressing the predicted value of the TCN model, wherein alpha is an optimal parameter.
The further technical scheme is as follows: the calculating the predicted number of Pod service units according to the predicted request arrival rate at the next moment comprises:
obtaining a predicted request arrival rate value at the next moment;
setting Qos maximum average request response time, Qos response time percentage and Qos maximum service unit number;
and calculating the predicted required Pod service unit number according to the request arrival rate value at the next moment, the set Qos maximum average request response time, the Qos response time percentage and the Qos maximum service unit number.
The further technical scheme is as follows: the resource allocation according to the predicted required Pod service unit quantity comprises the following steps:
and if the number of the Pod service units required by prediction is larger than the number of the current Pod service units, performing prediction and expansion.
The further technical scheme is as follows: the resource allocation according to the predicted required Pod service unit quantity further comprises:
and if the number of the Pod service units required by prediction is less than the number of the current Pod service units, performing responsive capacity reduction.
In a second aspect, the container cloud resource elastic allocation device comprises an acquisition unit, a grabbing unit, a prediction unit, a calculation unit and a resource allocation unit;
the acquisition unit is used for acquiring the user request arrival rate of the container cloud processing platform;
the capturing unit is used for capturing a Qos index of an application related to a user request, wherein the Qos index comprises a CPU (central processing unit) utilization rate, a memory utilization rate and request response data of a Pod service processing unit;
the prediction unit is used for predicting the request arrival rate at the next moment by adopting a prediction model according to the request arrival rate;
the calculating unit is used for calculating the quantity of Pod service units required by prediction according to the predicted request arrival rate at the next moment;
and the resource allocation unit is used for allocating resources according to the predicted required Pod service unit quantity.
In a third aspect, a computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to implement the container cloud resource flexible allocation method steps as described above.
In a fourth aspect, a computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the container cloud resource elastic allocation method steps as described above.
Compared with the prior art, the invention has the beneficial effects that: the method comprises the steps of collecting user request arrival rate of a container cloud processing platform; capturing a Qos index of an application related to a user request, wherein the Qos index comprises a CPU (Central processing Unit) utilization rate, a memory utilization rate and request response data of a Pod service processing unit; predicting the request arrival rate at the next moment by adopting a prediction model according to the request arrival rate; calculating the predicted number of Pod service units required according to the predicted request arrival rate at the next moment; and performing resource allocation according to the predicted required Pod service unit quantity. By flexibly allocating resources according to the predicted quantity of the needed Pod service units, the application resources deployed on the container cloud cluster are responded and allocated in advance before a peak comes, the response time is shortened, the user access experience and the stability of platform service are improved, and the resource waste is avoided.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood, the present invention can be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more apparent, the following detailed description of the preferred embodiments is given as follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a container cloud resource elastic allocation method according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of a container cloud resource elastic allocation apparatus according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
Referring to fig. 1, fig. 1 is a flowchart of a container cloud resource flexible allocation method according to an embodiment of the present invention.
As shown in fig. 1, the container cloud resource elastic allocation method includes the following steps: S10-S50.
And S10, collecting the arrival rate of the user request of the container cloud processing platform.
The user request refers to a request of a user to access an application on the container cloud processing platform.
In this embodiment, the request sequence is characterized as a queue waiting service request, and the Pod service unit deployed on the kubernets platform is characterized as a service station providing services, which is constructed as an M/M queuing network. Assuming that the request arrival and the request service are independent from each other, the waiting time and the service time are independent from each other, the service request arrives according to a poisson flow, and the service time of the request by the service unit is distributed according to a negative exponential, then the container cloud processing platform can be regarded as an M/s queuing system with a user request arrival rate of λ, a Pod processing efficiency of each service of μ, and a number of Pod service units of s. The user request arrival rate is distributed to the Pod service unit set by using a roundbin load balancing algorithm, and is also persisted to a PostgreSQL database.
And S20, capturing Qos indexes of the application relevant to the user request, wherein the Qos indexes comprise CPU utilization rate, memory utilization rate and request response data of the Pod service processing unit.
The application related to the user request refers to an application installed on the container cloud processing platform, and when the user accesses the application, the response time is the SLA index most concerned in the user experience access.
In some embodiments, step S20 specifically includes the following steps: S201-S204.
S201, collecting CPU utilization rate and memory utilization rate data by adopting the cAdvisor component in the kubelet component.
S202, acquiring request response time data through prometheus.
S203, converting the request response time data collected by the prometheus into a format which can be recognized by a kubernetese API interface by adopting a prometheus adapter.
And S204, converting the request response time data converted into the format which can be identified by the kubernetese API interface into data compatible with the data of the CPU utilization rate and the memory utilization rate acquired by the cAdvisor component through the index aggregator.
For steps S201 to S204, in this embodiment, the CPU utilization and the memory utilization load information are mainly collected by using a cAdvisor component in the kubelet component, and the custom request response time indicator obtains the performance data of the microservice interface through promemeus, and since the data formats of the two are incompatible, the data collected by promemeus needs to be converted into a format that can be recognized by the kubernetese api interface through a promemeus adapter, and finally converted into the memory average utilization, the CPU average utilization, the request average response time, and the maximum request response time indicator through the indicator aggregator.
And S30, predicting the request arrival rate at the next moment by adopting a prediction model according to the request arrival rate.
In this embodiment, the prediction model is a Prophet-TCN hybrid model.
For a Prophet-TCN hybrid model, the Prophet model can solve the difficult problems of trend, periodicity, seasonality and the like of a time series in a friendly manner, has a high fitting speed, and is essentially a prediction time series data model based on an auto-additive model. The TCN type of the time convolution network is an algorithm for solving time sequence prediction, a weighting distribution strategy is designed according to the characteristics of a cloud resource time sequence, different weights are distributed for a prophet model and a TCN model, and the time convolution network has higher prediction precision compared with a single model.
The calculation formula of the Prophet-TCN mixed model is as follows:
Yt=αpt+(1-α)Ntwherein Y istRepresents the final predicted value, PtRepresents the predicted value of the Prophet model, NtRepresenting the predicted value of the TCN model, alpha is an optimal parameter, and the selection of the optimal parameter of alpha is realized by traversing the weight
If the initial value of alpha is set to be 1, the initial weight of the TCN is 0, alpha is iterated in a decreasing mode with the step length of 0.01 each time, the mae value of the Prophet-TCN model is calculated, and finally the weight alpha corresponding to the minimum mae is taken out from all traversal results.
The TCN model is used as one of the neural networks, a long time is needed for training parameter adjustment, in order to find an optimal parameter adjustment combination, a TPOT (tree-based parameter optimization tool) is used for optimizing the parameter combination of the TCN model, on one hand, the training time of the model is reduced, and on the other hand, a globally optimal parameter combination is found for the xgboost model. TPOT is a tool for optimizing a machine learning model and automatically selecting model parameters, is also a specific application of automatic machine learning aOToML, is an extension of grid search, realizes adjustment of each parameter in prediction model configuration and optimal setting, and can automatically execute main steps in a machine learning problem, thereby saving model parameter adjusting time.
And S40, calculating the predicted required Pod service unit number according to the predicted request arrival rate at the next moment.
In some embodiments, step S40 specifically includes the following steps: s401-403.
S401, obtaining the predicted request arrival rate value at the next moment.
And S402, setting Qos maximum average request response time, Qos response time percentage and Qos maximum service unit number.
And S403, calculating the number of Pod service units required for prediction according to the request arrival rate value at the next moment, the set Qos maximum average request response time, the Qos response time percentage and the Qos maximum service unit number.
For steps S401-403, in the present embodiment, the setting of the Qos maximum number of service units is to prevent unlimited resource allocation.
The formula for calculating the number of Pod service units needed by prediction is as follows:
Figure BDA0003624452530000081
where AvgT is the Qos average request response time and μ is the average service rate queued.
Figure BDA0003624452530000082
Wherein MaxT is the Qos maximum average request response time, and k is the request coefficient capable of being processed in the maximum response time.
Figure BDA0003624452530000083
Sum1=0。
By the obtained rho, rho*The probability of a request arriving without waiting can be calculated, specifically:
Figure BDA0003624452530000084
wherein j belongs to [0, s-1 ]]。
Figure BDA0003624452530000085
Res is the probability that the request will arrive without waiting.
The probability that the requests have 0 to s x k waits can be calculated by calculating the probability that the requests arrive without waiting, and if the probability that the requests have more than the percentage of the Qos response time from the 0 to s x k waits, the requests can be processed in the maximum response time, so that the number of the required Pod service units can be determined.
And S50, allocating resources according to the predicted required Pod service unit quantity.
In an embodiment, step S50 specifically includes the following steps: S501-S502.
S501, if the number of the Pod service units needed by prediction is larger than the number of the current Pod service units, performing prediction and expansion.
S502, if the number of the Pod service units needed by prediction is smaller than the number of the current Pod service units, carrying out responsive capacity reduction.
In this embodiment, if the number of Pod service units required for prediction is greater than the number of current Pod service units, performing prediction expansion; in consideration of the fact that the capacity expansion usually needs to be completed before the request peak comes to avoid response delay caused by the Pod initialization process, the capacity expansion stage adopts a strategy of advancing the capacity expansion by one time unit so that the application instance has sufficient time to complete the initialization. If the predicted number of the needed Pod service units is smaller than the current number of Pod service units, considering that triggering the capacity reduction in advance may cause insufficient resources, responding to the service quality, and therefore, adopting a traditional responsive scaling strategy in the capacity reduction stage.
According to the invention, the resource is flexibly distributed according to the predicted quantity of the needed Pod service units, so that the application resources deployed on the container cloud cluster are responded and distributed in advance before the peak comes, the response time is reduced, the user access experience and the stability of platform service are improved, and the resource waste is avoided.
Fig. 2 is a schematic block diagram of a container cloud resource elastic allocation apparatus according to an embodiment of the present invention; corresponding to the above container cloud resource elastic allocation method, the embodiment of the present invention further provides a container cloud resource elastic allocation apparatus 100.
As shown in fig. 2, the container cloud resource elastic allocation apparatus 100 includes an acquisition unit 110, a grabbing unit 120, a prediction unit 130, a calculation unit 140, and a resource allocation unit 150.
The acquisition unit 110 is configured to acquire a user request arrival rate of the container cloud processing platform.
The user request refers to a request for a user to access an application on the container cloud processing platform.
In this embodiment, the request sequence is characterized as a queue waiting service request, and the Pod service unit deployed on the kubernets platform is characterized as a service desk providing services, which is constructed as an M/M queuing network. Assuming that the request arrival and the request service are independent of each other, the waiting time and the service time are independent of each other, the service request arrives according to a poisson flow, and the service time of the request by the service unit is distributed according to a negative exponential, then the container cloud processing platform can be considered as an M/s queuing system with a user request arrival rate of lambda, each service Pod processing efficiency of mu and the number of Pod service units of s. The user request arrival rate is distributed to the Pod service unit set by using a roundbin load balancing algorithm, and is also persisted to a PostgreSQL database.
The fetching unit 120 is configured to fetch Qos indicators of applications related to the user request, where the Qos indicators include a CPU usage rate, a memory usage rate, and request response data of the Pod service processing unit.
The application related to the user request refers to an application installed on the container cloud processing platform, and when the user accesses the application, the response time is the SLA index most concerned about accessing the user experience.
In some embodiments, the grasping unit 120 includes an acquisition module, a conversion module, and an aggregation module.
And the acquisition module is used for acquiring the CPU utilization rate and the memory utilization rate data by adopting the cAdviror assembly in the kubelet assembly.
And the acquisition module is used for acquiring the request response time data through prometheus.
And the conversion module is used for converting the request response time data collected by the promemeus into a format which can be identified by the kubernetese API interface by adopting a promemeus adapter.
And the aggregation module is used for converting the request response time data converted into the format which can be identified by the kubernetese API interface into data compatible with the data of the CPU utilization rate and the memory utilization rate acquired by the cAdvisor component through the index aggregator.
For the acquisition module, the conversion module and the aggregation module, in this embodiment, the CPU utilization and the memory utilization load information are mainly acquired by using a cAdvisor component in a kubelet component, and the user-defined request response time indicator acquires performance data of the microservice interface through promemeus.
And the prediction unit 130 is used for predicting the request arrival rate at the next moment by adopting a prediction model according to the request arrival rate.
In this embodiment, the prediction model is a Prophet-TCN hybrid model.
For a Prophet-TCN mixed model, the Prophet model can be used for friendly solving the problems of trend, periodicity, seasonality and the like of time series, has high fitting speed and is essentially a prediction time series data model based on an auto-additive model. The TCN type of the time convolution network is an algorithm for solving time sequence prediction, a weighting distribution strategy is designed according to the characteristics of a cloud resource time sequence, different weights are distributed for a prophet model and a TCN model, and the time convolution network has higher prediction precision compared with a single model.
The calculation formula of the Prophet-TCN mixed model is as follows:
Yt=αpt+(1-α)Ntwherein Y istRepresents the final predicted value, PtRepresents the predicted value of the Prophet model, NtExpressing the predicted value of the TCN model, alpha is an optimal parameter, and the selection of the optimal parameter of alpha is realized by traversing the weight
If the initial value of alpha is set to be 1, the initial weight of the TCN is 0, alpha is iterated in a decreasing mode with the step length of 0.01 each time, the mae value of the Prophet-TCN model is calculated, and finally the weight alpha corresponding to the minimum mae is taken out from all traversal results.
As one of the neural networks, the TCN model needs a long time for training and adjusting parameters, in order to find the optimal parameter adjustment combination, a TPOT (tree-based parametric optimization tool) is used for adjusting the parameter adjustment idea to optimize the parameter combination of the TCN model, so that on one hand, the training time of the model is reduced, and on the other hand, the overall optimal parameter combination is found for the xgboost model. TPOT is a tool for optimizing a machine learning model and automatically selecting model parameters, is also a specific application of automatic machine learning aOToML, is an extension of grid search, realizes adjustment of each parameter in prediction model configuration and optimal setting, and can automatically execute main steps in a machine learning problem, thereby saving model parameter adjusting time.
And the calculating unit 140 is configured to calculate the predicted number of Pod service units required according to the predicted request arrival rate at the next time.
In some embodiments, the calculation unit 140 includes an acquisition module, a setting module, and a calculation module.
And the obtaining module is used for obtaining the predicted request arrival rate value at the next moment.
And the setting module is used for setting the Qos maximum average request response time, the Qos response time percentage and the Qos maximum service unit number.
And the calculating module is used for calculating the quantity of the Pod service units required by prediction according to the request arrival rate value at the next moment, the set Qos maximum average request response time, the Qos response time percentage and the Qos maximum service unit quantity.
For the acquisition module, the setting module and the calculation module, in this embodiment, the setting of the Qos maximum number of service units is to prevent unlimited allocation of resources.
The formula for calculating the number of Pod service units needed by prediction is as follows:
Figure BDA0003624452530000111
where AvgT is the Qos average request response time and μ is the average service rate queued.
Figure BDA0003624452530000112
Wherein MaxT is the Qos maximum average request response time, and k is the request coefficient capable of being processed in the maximum response time.
Figure BDA0003624452530000113
Sum1=0。
By the obtained rho, rho*The probability of the request arriving without waiting can be calculated, in particular:
Figure BDA0003624452530000121
wherein j belongs to [0, s-1 ]]。
Figure BDA0003624452530000122
Res is the probability that a request will arrive without waiting.
The probability that the requests have 0 to s x k waits can be calculated by calculating the probability that the requests arrive without waiting, and if the probability that the requests have more than the percentage of the Qos response time from the 0 to s x k waits, the requests can be processed in the maximum response time, so that the number of the required Pod service units can be determined.
And a resource allocation unit 150, configured to perform resource allocation according to the predicted number of Pod service units required.
In this embodiment, if the number of Pod service units required for prediction is greater than the current number of Pod service units, performing prediction and expansion; in consideration of the fact that the capacity expansion usually needs to be completed before the request peak comes to avoid response delay caused by the Pod initialization process, the capacity expansion stage adopts a strategy of advancing the capacity expansion by one time unit so that the application instance has sufficient time to complete the initialization. If the predicted number of the needed Pod service units is smaller than the current number of Pod service units, considering that triggering the capacity reduction in advance may cause insufficient resources, responding to the service quality, and therefore, adopting a traditional responsive scaling strategy in the capacity reduction stage.
According to the invention, the resource is flexibly distributed according to the predicted quantity of the needed Pod service units, so that the application resources deployed on the container cloud cluster are responded and distributed in advance before the peak comes, the response time is reduced, the user access experience and the stability of platform service are improved, and the resource waste is avoided.
The container cloud resource elastic allocation apparatus may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in fig. 3.
Referring to fig. 3, fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a server, where the server may be an independent server or a server cluster composed of a plurality of servers.
As shown in fig. 3, the computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the container cloud resource flexible allocation method are implemented.
The computer device 700 may be a terminal or a server. The computer device 700 includes a processor 720, memory, and a network interface 750, which are connected by a system bus 710, where the memory may include non-volatile storage media 730 and internal memory 740.
The non-volatile storage medium 730 may store an operating system 731 and computer programs 732. The computer programs 732, when executed, enable the processor 720 to perform any of the container cloud resource flexible allocation methods.
The processor 720 is used to provide computing and control capabilities, supporting the operation of the overall computer device 700.
The internal memory 740 provides an environment for the operation of the computer program 732 in the non-volatile storage medium 730, and when the computer program 732 is executed by the processor 720, the processor 720 may be enabled to execute any one of the container cloud resource flexible allocation methods.
The network interface 750 is used for network communication such as sending assigned tasks and the like. Those skilled in the art will appreciate that the configuration shown in fig. 3 is a block diagram of only a portion of the configuration associated with aspects of the present application, and is not intended to limit the computing device 700 to which aspects of the present application may be applied, and that a particular computing device 700 may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components. Wherein the processor 720 is configured to execute the program code stored in the memory to perform the following steps:
acquiring a user request arrival rate of a container cloud processing platform;
capturing Qos indexes of applications related to user requests, wherein the Qos indexes comprise CPU utilization rate, memory utilization rate and request response data of a Pod service processing unit;
predicting the request arrival rate at the next moment by adopting a prediction model according to the request arrival rate;
calculating the quantity of Pod service units required by prediction according to the predicted request arrival rate at the next moment;
and performing resource allocation according to the predicted required Pod service unit quantity.
In one embodiment: the Qos index for grabbing the application related to the user request comprises the following steps:
collecting CPU utilization rate and memory utilization rate data by using a cAdviror assembly in the kubelet assembly;
acquiring request response time data through prometheus;
converting the request response time data collected by the promemeus into a format which can be recognized by a kubernetese API interface by adopting a promemeus adapter;
and converting the request response time data converted into a format which can be identified by the kubernets API into data compatible with the collection of CPU utilization rate and memory utilization rate data by the cAdvisor component through the index aggregator.
In one embodiment: and predicting the request arrival rate at the next moment by adopting a prediction model according to the request arrival rate, wherein the prediction model is a Prophet-TCN mixed model.
In one embodiment: the calculation formula of the Prophet-TCN mixed model is as follows:
Yt=αpt+(1-α)Ntwherein, YtIndicates the final predicted value, PtRepresents the predicted value of the Prophet model, NtAnd (4) representing the predicted value of the TCN model, wherein alpha is an optimal parameter.
In one embodiment: the calculating the predicted number of Pod service units according to the predicted request arrival rate at the next moment comprises:
acquiring a predicted request arrival rate value at the next moment;
setting Qos maximum average request response time, Qos response time percentage and Qos maximum service unit number;
and calculating the predicted required Pod service unit number according to the request arrival rate value at the next moment, the set Qos maximum average request response time, the Qos response time percentage and the Qos maximum service unit number.
In one embodiment: the resource allocation according to the predicted required Pod service unit quantity comprises the following steps:
and if the number of the Pod service units required by prediction is larger than the number of the current Pod service units, performing prediction and expansion.
In one embodiment: the resource allocation according to the predicted required Pod service unit quantity further comprises:
and if the number of the Pod service units required by prediction is less than the number of the current Pod service units, performing responsive capacity reduction.
It should be understood that, in the embodiment of the present Application, the Processor 720 may be a Central Processing Unit (CPU), and the Processor 720 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Those skilled in the art will appreciate that the configuration of computer device 700 depicted in FIG. 3 is not intended to be limiting of computer device 700 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
In another embodiment of the present invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the container cloud resource elastic allocation method disclosed by the embodiment of the invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components and steps of the various examples have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions in actual implementation, or units with the same function may be grouped into one unit, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electrical, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. The elastic container cloud resource allocation method is characterized by comprising the following steps:
collecting a user request arrival rate of a container cloud processing platform;
capturing Qos indexes of applications related to user requests, wherein the Qos indexes comprise CPU utilization rate, memory utilization rate and request response data of a Pod service processing unit;
predicting the request arrival rate at the next moment by adopting a prediction model according to the request arrival rate;
calculating the quantity of Pod service units required by prediction according to the predicted request arrival rate at the next moment;
and performing resource allocation according to the predicted required Pod service unit quantity.
2. The elastic container cloud resource allocation method according to claim 1, wherein the crawling of Qos metrics of applications related to user requests comprises:
collecting CPU utilization rate and memory utilization rate data by using a cAdvisor assembly in the kubel assembly;
acquiring request response time data through prometheus;
converting request response time data collected by prometheus into a format which can be identified by a kubernets API (application programming interface) by adopting a prometheus adapter;
and converting the request response time data converted into a format which can be identified by the kubernets API into data compatible with the collection of CPU utilization rate and memory utilization rate data by the cAdvisor component through the index aggregator.
3. The elastic container cloud resource allocation method according to claim 1, wherein the request arrival rate at the next time is predicted by using a prediction model according to the request arrival rate, and the prediction model is a Prophet-TCN hybrid model.
4. The elastic container cloud resource allocation method according to claim 3, wherein the calculation formula of the Prophet-TCN hybrid model is as follows:
Yt=αpt+(1-α)Ntwherein, YtIndicates the final predicted value, PtRepresents the predicted value of the Prophet model, NtAnd expressing the predicted value of the TCN model, wherein alpha is an optimal parameter.
5. The method for flexibly allocating container cloud resources according to claim 1, wherein the calculating the predicted number of Pod service units required according to the predicted request arrival rate at the next time comprises:
acquiring a predicted request arrival rate value at the next moment;
setting Qos maximum average request response time, Qos response time percentage and Qos maximum service unit number;
and calculating the predicted required Pod service unit number according to the request arrival rate value at the next moment, the set Qos maximum average request response time, the Qos response time percentage and the Qos maximum service unit number.
6. The method for flexibly allocating container cloud resources according to claim 1, wherein the allocating resources according to the predicted required Pod service unit number comprises:
and if the number of the Pod service units required by prediction is larger than the number of the current Pod service units, performing prediction and expansion.
7. The method for flexibly allocating container cloud resources according to claim 6, wherein the allocating resources according to the predicted required Pod service unit number further comprises:
and if the number of the Pod service units required by prediction is smaller than the number of the current Pod service units, performing responsive capacity reduction.
8. The elastic container cloud resource distribution device is characterized by comprising a collection unit, a grabbing unit, a prediction unit, a calculation unit and a resource distribution unit;
the acquisition unit is used for acquiring the user request arrival rate of the container cloud processing platform;
the capturing unit is used for capturing a Qos index of an application related to a user request, wherein the Qos index comprises a CPU (central processing unit) utilization rate, a memory utilization rate and request response data of a Pod service processing unit;
the prediction unit is used for predicting the request arrival rate at the next moment by adopting a prediction model according to the request arrival rate;
the calculating unit is used for calculating the number of Pod service units required by prediction according to the predicted request arrival rate at the next moment;
and the resource allocation unit is used for allocating resources according to the predicted quantity of the Pod service units.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method steps of the container cloud resource flexible allocation method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method steps of the container cloud resource elastic allocation method according to any one of claims 1 to 7.
CN202210474031.5A 2022-04-29 2022-04-29 Container cloud resource elastic allocation method and device, computer equipment and medium Pending CN114780244A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210474031.5A CN114780244A (en) 2022-04-29 2022-04-29 Container cloud resource elastic allocation method and device, computer equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210474031.5A CN114780244A (en) 2022-04-29 2022-04-29 Container cloud resource elastic allocation method and device, computer equipment and medium

Publications (1)

Publication Number Publication Date
CN114780244A true CN114780244A (en) 2022-07-22

Family

ID=82435733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210474031.5A Pending CN114780244A (en) 2022-04-29 2022-04-29 Container cloud resource elastic allocation method and device, computer equipment and medium

Country Status (1)

Country Link
CN (1) CN114780244A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115373764A (en) * 2022-10-27 2022-11-22 中诚华隆计算机技术有限公司 Automatic container loading method and device
CN117076142A (en) * 2023-10-17 2023-11-17 阿里云计算有限公司 Multi-tenant resource pool configuration method and multi-tenant service system
CN117093330A (en) * 2023-10-16 2023-11-21 南京奕起嗨信息科技有限公司 Container management method and device in serverless computing

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115373764A (en) * 2022-10-27 2022-11-22 中诚华隆计算机技术有限公司 Automatic container loading method and device
CN117093330A (en) * 2023-10-16 2023-11-21 南京奕起嗨信息科技有限公司 Container management method and device in serverless computing
CN117093330B (en) * 2023-10-16 2023-12-22 南京奕起嗨信息科技有限公司 Container management method and device in serverless computing
CN117076142A (en) * 2023-10-17 2023-11-17 阿里云计算有限公司 Multi-tenant resource pool configuration method and multi-tenant service system
CN117076142B (en) * 2023-10-17 2024-01-30 阿里云计算有限公司 Multi-tenant resource pool configuration method and multi-tenant service system

Similar Documents

Publication Publication Date Title
WO2021179462A1 (en) Improved quantum ant colony algorithm-based spark platform task scheduling method
CN114780244A (en) Container cloud resource elastic allocation method and device, computer equipment and medium
CN108776934B (en) Distributed data calculation method and device, computer equipment and readable storage medium
US9652150B2 (en) Global memory sharing method and apparatus, and communications system
CN109981744B (en) Data distribution method and device, storage medium and electronic equipment
CN103699433B (en) One kind dynamically adjusts number of tasks purpose method and system in Hadoop platform
CN105022668B (en) Job scheduling method and system
CN110618867A (en) Method and device for predicting resource usage amount
CN115629865B (en) Deep learning inference task scheduling method based on edge calculation
CN112860974A (en) Computing resource scheduling method and device, electronic equipment and storage medium
CN110198267B (en) Traffic scheduling method, system and server
CN114327811A (en) Task scheduling method, device and equipment and readable storage medium
CN114490078A (en) Dynamic capacity reduction and expansion method, device and equipment for micro-service
CN103729417A (en) Method and device for data scanning
CN110347477B (en) Service self-adaptive deployment method and device in cloud environment
CN111966480A (en) Task execution method and related device
CN115562841B (en) Cloud video service self-adaptive resource scheduling system and method
CN115952054A (en) Simulation task resource management method, device, equipment and medium
CN114035906A (en) Virtual machine migration method and device, electronic equipment and storage medium
CN114090256A (en) Application delivery load management method and system based on cloud computing
CN109918366B (en) Data security processing method based on big data
CN112003900A (en) Method and system for realizing high service availability under high-load scene in distributed system
CN114598705B (en) Message load balancing method, device, equipment and medium
CN116909758B (en) Processing method and device of calculation task and electronic equipment
CN110677463B (en) Parallel data transmission method, device, medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination