CN115118602A - Container resource dynamic scheduling method and system based on usage prediction - Google Patents

Container resource dynamic scheduling method and system based on usage prediction Download PDF

Info

Publication number
CN115118602A
CN115118602A CN202210701215.0A CN202210701215A CN115118602A CN 115118602 A CN115118602 A CN 115118602A CN 202210701215 A CN202210701215 A CN 202210701215A CN 115118602 A CN115118602 A CN 115118602A
Authority
CN
China
Prior art keywords
container
resource
usage
data
copy number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210701215.0A
Other languages
Chinese (zh)
Other versions
CN115118602B (en
Inventor
朱大鹏
刘彩云
姜厚禄
侍守创
胡昌平
胡翔宇
徐雷
左刚
单文金
杨庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Shipbuilding Digital Information Technology Co ltd
Jiangsu Jierui Information Technology Co ltd
716th Research Institute of CSIC
Original Assignee
Jiangsu Jierui Information Technology Co ltd
716th Research Institute of CSIC
CSIC Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Jierui Information Technology Co ltd, 716th Research Institute of CSIC, CSIC Information Technology Co Ltd filed Critical Jiangsu Jierui Information Technology Co ltd
Priority to CN202210701215.0A priority Critical patent/CN115118602B/en
Publication of CN115118602A publication Critical patent/CN115118602A/en
Application granted granted Critical
Publication of CN115118602B publication Critical patent/CN115118602B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0896Bandwidth or capacity management, i.e. automatically increasing or decreasing capacities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0813Configuration setting characterised by the conditions triggering a change of settings
    • H04L41/0816Configuration setting characterised by the conditions triggering a change of settings the condition being an adaptation, e.g. in response to network events
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0888Throughput
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a container resource dynamic scheduling method and a system based on usage prediction.A container resource usage prediction model based on a Transformer is adopted to predict the usage of application resources, the program is applied to different time nodes with different requirements on the usage rate of a CPU, the memory usage, the network usage and the disk usage, the usage of the resources is collected according to a time sequence, the resources are normalized and then are arranged into a time sequence characteristic sequence according to a proper time window and are input into a prediction model row to predict the usage of the container resources; and then calculating the expected copy number of the next period on the basis of the prediction result, comparing the expected copy number with the currently calculated copy number calculated by a response type scaling algorithm, and dynamically scheduling container resources according to an HPA strategy, thereby achieving the purposes of improving the resource utilization rate of the system and reducing the downtime risk of the system while ensuring the reliability.

Description

Container resource dynamic scheduling method and system based on usage prediction
Technical Field
The invention relates to the technical field of container resource scheduling, in particular to a container resource dynamic scheduling method and system based on usage prediction.
Background
Currently, more and more applications are deployed in containers. Typically, we allocate sufficient resources for hosted applications in order to be able to run stably. However, in most cases, the hosted application program does not run in the highest load state, and various resources such as a CPU and a memory are not simultaneously in the highest load state, so that the pre-allocated resources are in an idle state in most of the time, thereby causing waste of resources. Furthermore, when the application is in a state of high load, the pre-allocated resources are not necessarily sufficient. At present, the application in the container depends on the operating system of the host machine to perform resource allocation, limitation and weight setting. The problems that arise in doing so are: the resource allocation weight and the limit of each container are fixed from the beginning, and cannot be dynamically adjusted, which easily causes resource waste and resource shortage.
In order to solve the problems, a specific prediction algorithm is usually used in cloud computing to predict resource requirements of a virtual machine and an application, and resource allocation optimization is made in advance to improve the utilization rate and service quality of resources, help to make resource allocation in advance, and optimize resource management of Docker. In the existing Kubernetes system, the task of resource scheduling is mainly handled by the Scheduler component. When the application is scheduled for the first time, the Scheduler selects a most suitable Node from all Node nodes in the cluster to deploy according to the resource configuration condition of the application, namely a static scheduling strategy. First, the mechanism can only perform resource configuration when the application is initially deployed, and cannot dynamically adjust allocated resources as needed when the application runs, which may result in low utilization rate of host resources. Secondly, due to the lack of prediction on the resource use condition, an alarm mechanism cannot give an alarm before the resource index violates, and the Scheduler cannot perform resource scheduling or automatic scaling of the instances before the resource consumption bottleneck occurs. Moreover, the Scheduler does not consider the sensitivity of the application to the resources, and the bottleneck to a single resource on the Node is easily caused. Therefore, a corresponding resource management mechanism and a scheduling strategy must be formulated from the viewpoints of resource utilization maximization, application sensitivity to resources and the like, so that dynamic scheduling and automatic scaling of instances can be triggered in advance before the application has a bottleneck, the utilization rate of system resources is improved, and the scheduling flexibility is increased.
Disclosure of Invention
The invention aims to provide a container resource dynamic scheduling method and system based on usage prediction, which can improve the utilization rate of system resources, increase the scheduling flexibility and realize that a container can respond to the resource demand of an application program deployed on the container in advance.
The technical solution for realizing the purpose of the invention is as follows:
a dynamic scheduling method of container resources based on usage prediction comprises the following steps:
setting a capacity expansion or capacity reduction threshold, and setting a capacity reduction judgment frequency threshold and a monitoring period;
monitoring the use condition of the container and each node in the cluster, monitoring the resources and the container on the machine of each node in real time and acquiring performance data, and acquiring resource indexes and use rates of all the copies;
aggregating system state, system resource index and application program performance index based on resource index and utilization rate, and storing the aggregated data into corresponding files;
calculating load sequence data of application history, building a container resource usage prediction model through a deep learning algorithm, and predicting the container resource usage at a future moment;
and calculating the expected copy number of the next period according to the predicted load data, calculating the current expected copy number by adopting a response type scaling algorithm, comparing the copy number with the current expected copy number, determining the final copy number as the input of an HPA strategy, and performing dynamic shrinkage on container resources.
Further, the performance data includes CPU usage, memory usage, network throughput, and file system usage.
Further, the steps of monitoring the use condition of the container in the cluster and the resource of each node, monitoring the resource and the container on the machine of each node in real time and collecting performance data, and acquiring the resource index and the use ratio of all the copies are as follows:
step 2-1: the Heapster acquires all node information lists in the system from the Master node by adopting Kubernetes;
step 2-2: in each node, the Kubelet collects the resource utilization information of a container and the whole physical node deployed on the node by using the cAdvisor;
step 2-3: after acquiring the resource utilization information of each node, the Heapster stores the data in the databases, newly establishes a database named as 'k 8 s' in InfluxDB, queries the data in the 'k 8 s' database through Grafana, and displays the queried data on a graphical interface.
Further, the aggregating the system state, the system resource index and the application performance index based on the resource index and the usage rate, and saving the aggregated data to the corresponding file specifically includes the steps of:
step 3-1: obtaining information in the system by using Kubernets CLI kubecect, and retrieving information of a joint point, a name space, a Pod and a service from the whole system; for each namespace, comparing the tags of each Pod and each service, and then recording the mapping of the pods and the services;
step 3-2: inquiring a resource index value stored in a database from the database through an HTTP interface in the InfluxDB and an InfluxQL statement of an InfluxDB database;
step 3-3: and sorting and storing the inquired data into a JSON file.
Further, the load sequence data of the application history is calculated as: and calculating the load of each application by adopting dynamic weighting, comprehensively considering each resource index, and performing weighting calculation according to the utilization rate of each resource index to obtain comprehensive load sequence data of the application.
Further, the calculating of the load sequence data of the application history specifically includes the steps of:
assume that there are n copies of Pod, denoted P ═ P, applied to one application at a time in the collected data 1 ,p 2 ,…,p d ]The resource request amount of each Pod copy is R ═ R 1 ,r 2 ,…,r d ]The resource usage is Q ═ Q 1 ,q 2 ,…,q d ]And D ═ 1,2,3, …, D]Representing a resource dimension for each Pod copy;
the resource usage rate of each application is expressed as U ═ U 1 ,u 2 ,…,u d ]The calculation formula is as follows:
Figure BDA0003704291100000031
the weight of each dimension of resource is:
Figure BDA0003704291100000032
the integrated load is then:
Figure BDA0003704291100000033
and adding the load calculated by the formula at a certain moment into the load queue, updating the load queue until the number of the sequences in the load queue reaches the preset size, deleting the oldest load value from the load queue, and adding the newly calculated load value into the load queue.
Furthermore, the input of the container resource usage prediction model is the CPU usage, memory usage, network usage and disk usage, and the output is a container load, which includes an encoder and a decoder; the encoder is formed by stacking 6 isomorphic network layers, and each network layer comprises a multi-head attention sublayer and a position-based feedforward neural network; the decoder and the encoder are consistent in structure, Masked multi-head attention layers are adopted when the features of the input sequence are extracted, and the whole model is optimized for outputting of each layer by using residual connection and normalization processing.
Further, the container resource usage prediction model preprocesses model input data to obtain an input X ═ X 1 ,x 2 ,...,x n ] T ∈R n×d Where n denotes the time window length, d denotes the input quantity dimension, x i Representing the system performance data at the ith time point, i is 1, 2.
Figure BDA0003704291100000041
Figure BDA0003704291100000042
Where pos represents a specific position of each time-series data in the input sequence X, and i represents a dimension;
vectorization of each system performance data is:
re i =we i +pe i
therein, we i Value, pe, representing the ith data in input X i A position vector representing the ith data in X;
the multi-head attention sublayer is:
MultiHead(Q,K,V)=Concat(head 1 ,head 2 ,...,head h )W o
Figure BDA0003704291100000043
Figure BDA0003704291100000044
wherein Q is a query vector, K is a key vector, V is a value vector, Q, K and V are multiplied by 3 different weight matrices W through an input vector matrix X Q ,W K ,W V Obtaining:
Q=X·W Q
K=X.W K
V=X.W V
the feedforward neural network consists of two linear transformations, namely ffn (x) max (0, xW) 1 +b 1 )W 2 +b 2 Wherein, W 1 And W 2 Is a state matrix, b 1 And b 2 To compensate the parameter; there is an activation function of ReLU between two linear transformations;
the encoder outputs a sequence feature vector Z ═ (Z ═ Z) a1 ,z a2 ,...,z an );
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2
When the decoder decodes, firstly, Masked multi-head attention layer is used to obtain continuous characterization vector Z ═ Z (Z is) b1 ,z b2 ,...,z bn ) And performing multi-head attention calculation and translation alignment by using Values of continuous characterization vectors of Queries, Keys and Masked multi-head attention layers obtained by outputting sequence feature vectors by an encoder, and associating the source sequence with the high-level features of the target sequence.
Further, the step of performing dynamic contraction on container resources by using the finally predicted copy number as an input of the HPA policy after comparing the two is specifically as follows: if the number of the current expected copies is larger than the number of the currently applied copies, taking the largest one of the number of the copies expected in the next period and the number of the current expected copies as the input of an HPA strategy; otherwise, if the number of copies expected in the next period is larger than the number of copies currently applied, taking the number of copies expected in the next period as the input of the HPA strategy; and if the copy number expected in the next period and the copy number expected at present are less than the copy number applied currently for multiple times, taking the copy number expected in the next period at last as the input of the HPA strategy, and performing dynamic contraction on the container resources.
A container resource dynamic scheduling system based on usage prediction comprises a container resource usage monitoring module, a data aggregation module, a container resource usage prediction module and an automatic scaling module, wherein:
the container resource usage monitoring module is used for monitoring the containers in the cluster and the resource usage condition of each node, performing real-time monitoring and performance data acquisition on the resources and the containers on the machines of each node, and acquiring resource indexes and usage rates of all the copies;
the data aggregation module aggregates the system state, the system resource index and the application program performance index based on the resource index and the utilization rate, and stores the aggregated data into a corresponding file;
the container resource usage prediction module is used for calculating load sequence data of application history and predicting the container resource usage at a future moment through a container resource usage prediction model established by a deep learning algorithm;
and the automatic expansion module calculates the number of copies expected in the next period according to the predicted load data, calculates the number of currently expected copies by adopting a response type expansion algorithm, determines the final number of copies as the input of an HPA strategy after comparing the number of the currently expected copies with the number of the currently expected copies, and performs dynamic expansion of container resources.
Compared with the prior art, the invention has the following beneficial technical effects:
the method adopts a Transformer-based model to predict the usage amount of the platform application resources, and constructs a dynamic scheduling and balancing model of the container resources by taking the load balance of the container and the container distance as the target of container scheduling according to the prediction result, thereby realizing the rapid allocation of the container resources. The technology can improve the utilization rate of system resources, increase the scheduling flexibility, realize that the container can respond to the resource demand of the application program deployed on the container in advance, and can improve the utilization rate of the resources to the greatest extent compared with the traditional static resource allocation mode based on the priori knowledge.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a diagram of the overall architecture for dynamic resource scheduling of the present invention.
Fig. 3 is a load queue diagram of the present invention.
FIG. 4 is a diagram of a Transformer model of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The application does not have a constant requirement on the container resources but dynamically changes, on the basis, a Transformer-based container resource usage prediction model is adopted to predict the usage of the application resources, the requirements of program application on the CPU usage rate, the memory usage amount, the network usage amount and the disk usage amount of different time nodes are different, the usage conditions of the resources are collected according to a time sequence, the resources are normalized and then are arranged into a time sequence characteristic sequence according to a proper time window, and the time sequence characteristic sequence is input into a prediction model row to predict the usage of the container resources; and then calculating the expected copy number of the next period on the basis of the prediction result, comparing the expected copy number with the currently calculated copy number calculated by a response type scaling algorithm, and dynamically scheduling container resources according to an HPA strategy, thereby achieving the purposes of improving the resource utilization rate of the system and reducing the downtime risk of the system while ensuring the reliability.
The embodiment provides a container resource dynamic scheduling system based on usage prediction, as shown in fig. 2, including a container resource usage monitoring module, a data aggregation module, a container resource usage prediction module, and an automatic scaling module, where the following modules are specifically:
(1) container resource usage monitoring module
The monitoring module monitors containers in the cluster and the resource use condition of each Node by using a cAdvison tool of Google, and carries out real-time monitoring and performance data acquisition on the resources and the containers on the machine of each Node, wherein the real-time monitoring and performance data acquisition comprise the CPU use condition, the memory use condition, the network throughput and the use condition of a file system. System resource indicator monitoring will use the following procedure:
step 1-1: the Heapster will use the Kubernetes API to obtain a list of all node information in the system from the Master node.
Step 1-2: in each node, the Kubelet collects resource utilization information, including resource indexes and utilization, of the container and the whole physical node deployed on the node by using the cAdvisor.
Step 1-3: after receiving the resource utilization information about each node, the Heapster stores the data in a database, and a database named as 'k 8 s' is newly established in the InfluxDB. Grafana deployed in our system will query the data in the "k 8 s" database and present the data on a graphical interface.
(2) Data aggregation module
The data aggregation module is used for integrating system state (including nodes and Pod), system resource index and application performance index. And inputting and saving the aggregated data into a corresponding file. The specific process is as follows:
step 2-1: information was obtained in the system using kubernets CLI kubecect. Information about the nodes, namespaces, Pod and services is retrieved from the entire system. For each namespace, the tags for each Pod and each service are compared, and then the mapping of pods and services is recorded.
Step 2-2: and inquiring the resource index value stored in the database from the database through an HTTPAPI interface in the InfluxDB and an InfluxQL statement of the InfluxDB database.
Step 2-3: and (3) sorting and storing the data queried in the step (2) into a JSON file for use by a following module.
(3) Container resource usage prediction module
In order to solve the problem that a single resource cannot well measure the load of the application, the load of each application is calculated by dynamic weighting, and when the load of the application is calculated by a dynamic weighting algorithm, each resource index is comprehensively considered, weighting calculation is carried out according to the utilization rate of each resource index, and finally the comprehensive load of the application is obtained.
Assume that there are n Pod copies of an application in the collected data at a time, which may be denoted as P ═ P 1 ,p 2 ,…,p d ]The resource request amount of each Pod copy is R ═ R 1 ,r 2 ,…,r d ]The resource usage is Q ═ Q 1 ,q 2 ,…,q d ]Wherein D ═ 1,2,3, …, D]The resource dimension of each Pod copy is represented.
The resource usage rate of each application may be expressed as U ═ U 1 ,u 2 ,…,u d ]The calculation formula is as follows:
Figure BDA0003704291100000071
the weight calculation formula of each dimension resource is as follows:
Figure BDA0003704291100000072
the overall load of the application can eventually be calculated according to the formula:
Figure BDA0003704291100000073
the load at a certain time calculated by the above equation may be added to the load queue shown in fig. 3, until the number of sequences in the load queue reaches a preset size, the load queue is updated, the oldest load value is deleted from the load queue, the newly calculated load value is added to the load queue, and then the load in the cluster is predicted by using a Transformer-based container resource usage prediction model.
As shown in fig. 4, the transform model is composed of an encoder and a decoder. The encoder is formed by stacking 6 isomorphic network layers, and each network layer comprises a Multi-Head Attention sub-layer (Multi-Head Attention) and a position-based Feed-Forward neural network (Feed Forward); the decoder and the encoder are roughly consistent, but a Masked multi-head attention layer is adopted when the input sequence is subjected to feature extraction, and the whole model uses residual connection and normalization on the output of each layer to better optimize the network.
The input of the prediction model is CPU utilization rate, memory usage, network usage and disk usage, and the output is container load. Preprocessing the model input data to obtain an input X ═ X 1 ,x 2 ,…,x n ] T ∈R n×d . Where n denotes the time window length, d denotes the input quantity dimension, x i The system performance data at the ith time point is shown, i ═ 1,2, …, n. The calculation of the position vector of the system performance data for each time point is as follows:
Figure BDA0003704291100000081
Figure BDA0003704291100000082
where pos represents the specific location of each time series data in the input sequence X and i represents a dimension.
Finally, the vectorization of each data is represented as follows:
re i =we i +pe i
therein, we i Value, pe, representing the ith data in input X i Representing the position vector of the ith datum in X.
The self-attention mechanism for calculating the degree of correlation between data can be generally described by three vectors, a query vector (Q), a key vector (K), and a value vector (V), wherein Q, K and V are obtained by multiplying an input vector matrix X by 3 different weight matrices W Q ,W K ,W V Obtained as follows:
Q=X·W Q
K=X·W k
v=X·W V
when the self-attention information is obtained, firstly, Q vectors are used for inquiring all candidate positions, each candidate position has a pair of K and V vectors, the inquiring process is a process of performing dot product operation on the Q vectors and the K vectors of all the candidate positions, dot product results are weighted to the respective V vectors after passing through a Softmax function, and final self-attention results are obtained by summation, and the calculation is as shown below.
Figure BDA0003704291100000091
The multi-head self-attention mechanism is equivalent to the integration of h parallel self-attention layers, and the calculation method is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,head 2 ,…,head h )W o
Figure BDA0003704291100000092
then, the processed signal is processed by a Feed-Forward neural Network (Feed-Forward Network) based on position, wherein the Feed-Forward neural Network consists of two linear transformations, and FFN (x) is max (0, xW) 1 +b 1 )W 2 +b 2 Wherein W is 1 And W 2 Is a state matrix, b 1 And b 2 To compensate the parameter; there is an activation function for the ReLU between the two. Encoder output sequence feature vector Z ═ (Z) a1 ,z a2 ,…,z an )。
The Decoder (Decoder) first calculates the continuous token vector Z ═ using Masked multi-head attention (Z ═ Z) to obtain b1 ,z b2 ,…,z bn ). Obtaining z after multi-head attention and residual error standardization processing b And the encoder generated feature vector z a Performing multi-head attention calculation; using the encoder output vector z in this step a Derived Queries, Keys and decoder derived z b Multiple-headed notation for vector ValuesThe idea computation is aligned with translation, namely, the source sequence is associated with the high-level characteristics of the target sequence, the representation of the sequence is learned by using multi-head self-attention in an encoder and a decoder, and then the probability output of the load is finally realized through the same residual error, normalization processing and position-based feed-forward network processing as well as linear optimization and Softmax series processing.
(4) Automatic telescopic module
And calculating the expected copy number of the next period according to the load data predicted by the container resource usage prediction module, and calculating the current expected copy number B by adopting a response type scaling algorithm. If the copy number B calculated by the adopted response formula is larger than the copy number C currently applied in the system, taking the largest one of the predicted copy number A and the copy number B calculated by the response formula as the input of an HPA strategy; otherwise, if the predicted copy number A is larger than the current copy number C in the system, the predicted copy number is used as the input of the HPA strategy; and when the predicted copy number A and the calculated copy number B of the response equation are less than the current copy number C in the system for a plurality of times (preferably 5 times), taking the finally predicted copy number A as the input of the HPA strategy to perform dynamic contraction of the container resources. Therefore, the phenomenon that the applied load is jittered to cause premature capacity shrinkage and further influence the quality of service can be avoided.
Based on the system, as shown in fig. 1, a method for dynamically scheduling container resources based on usage prediction includes the following steps:
step 1: setting a threshold value of capacity expansion or capacity reduction, and setting a threshold value th of capacity reduction judgment times and a monitoring period;
step 2: acquiring k resource indexes t utilization rates of all the copies in each monitoring period through a monitoring module;
and step 3: calculating load sequence data of application history according to the application resource utilization rate acquired in the step 2;
and 4, step 4: predicting the load of the application in the next monitoring period by using a Transformer prediction model, calculating an expected copy number predicted Pod according to the predicted load, calculating an expected copy number recovery Pod according to the Current load of the application, if the recovery Pod is greater than the Current copy number Current Pod of the system, taking max (predicted Pod, recovery Pod) as the input of an HPA (elastic expansion) strategy, and skipping to the step 7, otherwise, turning to the step 5;
and 5: if the Presect Pod Current Pod or n-th, taking the Presect Pod as the input of the HPA strategy and resetting the number of times of load prediction reduction, skipping to step 7, otherwise, turning to step 6;
step 6: the number n of load prediction reduction is increased by one;
and 7: and triggering dynamic scaling according to the input application copy number.

Claims (10)

1. A method for dynamically scheduling container resources based on usage prediction is characterized by comprising the following steps:
setting a capacity expansion or capacity reduction threshold, and setting a capacity reduction judgment frequency threshold and a monitoring period;
monitoring the use condition of the container and each node in the cluster, monitoring the resources and the container on the machine of each node in real time and acquiring performance data, and acquiring resource indexes and use rates of all the copies;
aggregating system state, system resource index and application program performance index based on resource index and utilization rate, and storing the aggregated data into corresponding files;
calculating load sequence data of application history, building a container resource usage prediction model through a deep learning algorithm, and predicting the container resource usage at a future moment;
and calculating the expected copy number of the next period according to the predicted load data, calculating the current expected copy number by adopting a response type scaling algorithm, comparing the copy number with the current expected copy number, determining the final copy number as the input of an HPA strategy, and performing dynamic shrinkage on container resources.
2. The method of claim 1, wherein the method for dynamically scheduling the container resources based on the usage prediction comprises: the performance data includes CPU usage, memory usage, network throughput, and file system usage.
3. The method of claim 1, wherein the method for dynamically scheduling the container resources based on the usage prediction comprises: the method comprises the following steps of monitoring the use condition of the container and each node in the cluster, monitoring the resources and the container on the machine of each node in real time, acquiring performance data, and acquiring resource indexes and use rates of all the copies:
step 2-1: the Heapster acquires all node information lists in the system from the Master node by adopting Kubernetes;
step 2-2: in each node, the Kubelet collects the resource utilization information of a container and the whole physical node deployed on the node by using the cAdvisor;
step 2-3: after acquiring the resource utilization information of each node, the Heapster stores the data in the databases, newly establishes a database named as 'k 8 s' in InfluxDB, queries the data in the 'k 8 s' database through Grafana, and displays the queried data on a graphical interface.
4. The method of claim 1, wherein the method for dynamically scheduling the container resources based on the usage prediction comprises: the aggregating the system state, the system resource index and the application program performance index based on the resource index and the utilization rate and storing the aggregated data into the corresponding file specifically comprises the following steps:
step 3-1: obtaining information in the system by using Kubernets CLI kubecect, and retrieving information of a joint point, a name space, a Pod and a service from the whole system; for each namespace, comparing the tags of each Pod and each service, and then recording the mapping of the pods and the services;
step 3-2: inquiring a resource index value stored in a database from the database through an HTTP interface in the InfluxDB and an InfluxQL statement of an InfluxDB database;
step 3-3: and sorting and storing the inquired data into a JSON file.
5. The method of claim 1, wherein the method for dynamically scheduling the container resources based on the usage prediction comprises: the load sequence data of the application history is calculated as follows: and calculating the load of each application by adopting dynamic weighting, comprehensively considering each resource index, and performing weighting calculation according to the utilization rate of each resource index to obtain comprehensive load sequence data of the application.
6. The method of claim 5, wherein the container resource dynamic scheduling based on usage prediction comprises: the calculating of the load sequence data of the application history specifically comprises the steps of:
assume that there are n Pod copies in the collected data, denoted P ═ P, applied one at a time 1 ,p 2 ,…,p d ]The resource request amount of each Pod copy is R ═ R 1 ,r 2 ,…,r d ]The resource usage is Q ═ Q 1 ,q 2 ,…,q d ]And D ═ 1,2,3, …, D]Representing a resource dimension for each Pod copy;
the resource usage rate of each application is expressed as U ═ U 1 ,u 2 ,…,u d ]The calculation formula is as follows:
Figure FDA0003704291090000021
the weight of each dimension resource is:
Figure FDA0003704291090000022
the integrated load is then:
Figure FDA0003704291090000023
and adding the load calculated by the formula at a certain moment into the load queue, updating the load queue until the number of the sequences in the load queue reaches the preset size, deleting the oldest load value from the load queue, and adding the newly calculated load value into the load queue.
7. The method of claim 1, wherein the method for dynamically scheduling the container resources based on the usage prediction comprises: the input of the container resource usage prediction model is CPU usage rate, memory usage, network usage and disk usage, and the output is container load, which comprises an encoder and a decoder; the encoder is formed by stacking 6 isomorphic network layers, and each network layer comprises a multi-head attention sublayer and a position-based feedforward neural network; the decoder and the encoder are consistent in structure, Masked multi-head attention layers are adopted when the features of the input sequence are extracted, and the whole model is optimized for outputting of each layer by using residual connection and normalization processing.
8. The method of claim 7, wherein the container resource dynamic scheduling based on usage prediction comprises: the container resource usage prediction model preprocesses model input data to obtain input X ═ X 1 ,x 2 ,…,x n ] T ∈R n×d Where n denotes the time window length, d denotes the input quantity dimension, x i Representing the system performance data at the ith time point, i is 1, 2.
Figure FDA0003704291090000031
Figure FDA0003704291090000032
Where pos represents the specific location of each time-series data in the input sequence X, and i represents a dimension;
the vectorization of each system performance data is:
re i =we i +pe i
therein, we i Value, pe, representing the ith data in input X i A position vector representing the ith data in X;
the multi-head attention sublayer is:
MultiHead(Q,K,V)=Concat(head 1 ,head 2 ,…,head h )W o
Figure FDA0003704291090000033
Figure FDA0003704291090000034
wherein Q is a query vector, K is a key vector, V is a value vector, Q, K and V are multiplied by 3 different weight matrices W through an input vector matrix X Q ,W K ,W V Obtaining:
Q=X·W Q
K=X·W K
V=X·W V
the feedforward neural network consists of two linear transformations, namely ffn (x) max (0, xW) 1 +b 1 )W 2 +b 2 Wherein, W 1 And W 2 Is a state matrix, b 1 And b 2 To compensate the parameter; there is an activation function of ReLU between two linear transformations;
the encoder outputs a sequence feature vector Z ═ (Z ═ Z) a1 ,z a2 ,…,z an );
FFN(x)=max(0,xW 1 +b 1 )W 2 +b 2
When the decoder decodes, firstly, Masked multi-head attention layer is used to obtain continuous characterization vector Z ═ Z (Z is) b1 ,z b2 ,…,z bn ) Q obtained by using the encoder to output the sequence feature vectorValues of continuous characterization vectors of multiple attention layers of the facilities, Keys and Masked perform multiple attention calculation and translation alignment, and associate the source sequence with the high-level features of the target sequence.
9. The method of claim 1, wherein the method for dynamically scheduling the container resources based on the usage prediction comprises: after comparing the two, taking the finally predicted copy number as the input of the HPA strategy, and performing dynamic contraction on the container resources specifically as follows: if the number of the currently expected copies is larger than the number of the currently applied copies, taking the largest one of the number of the copies expected in the next period and the number of the currently expected copies as the input of an HPA strategy; otherwise, if the number of copies expected in the next period is larger than the number of copies currently applied, taking the number of copies expected in the next period as the input of the HPA strategy; and if the copy number expected in the next period and the copy number expected at present are less than the copy number applied currently for multiple times, taking the copy number expected in the next period at last as the input of the HPA strategy, and performing dynamic contraction on the container resources.
10. The utility model provides a container resource dynamic scheduling system based on use amount prediction which characterized in that, includes container resource use amount monitoring module, data aggregation module, container resource use amount prediction module and automatic flexible module, wherein:
the container resource usage monitoring module is used for monitoring the containers in the cluster and the resource usage condition of each node, performing real-time monitoring and performance data acquisition on the resources and the containers on the machines of each node, and acquiring resource indexes and usage rates of all the copies;
the data aggregation module aggregates the system state, the system resource index and the application program performance index based on the resource index and the utilization rate, and stores the aggregated data into a corresponding file;
the container resource usage prediction module is used for calculating load sequence data of application history and predicting the container resource usage at a future moment through a container resource usage prediction model established by a deep learning algorithm;
and the automatic expansion module calculates the expected copy number of the next period according to the predicted load data, calculates the current expected copy number by adopting a response type expansion algorithm, determines the final copy number after comparing the copy number and the current expected copy number as the input of an HPA strategy, and performs dynamic contraction on container resources.
CN202210701215.0A 2022-06-21 2022-06-21 Container resource dynamic scheduling method and system based on usage prediction Active CN115118602B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210701215.0A CN115118602B (en) 2022-06-21 2022-06-21 Container resource dynamic scheduling method and system based on usage prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210701215.0A CN115118602B (en) 2022-06-21 2022-06-21 Container resource dynamic scheduling method and system based on usage prediction

Publications (2)

Publication Number Publication Date
CN115118602A true CN115118602A (en) 2022-09-27
CN115118602B CN115118602B (en) 2024-05-07

Family

ID=83329252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210701215.0A Active CN115118602B (en) 2022-06-21 2022-06-21 Container resource dynamic scheduling method and system based on usage prediction

Country Status (1)

Country Link
CN (1) CN115118602B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115562841A (en) * 2022-11-10 2023-01-03 军事科学院系统工程研究院网络信息研究所 Cloud video service self-adaptive resource scheduling system and method
CN116048734A (en) * 2023-03-29 2023-05-02 贵州大学 Method, device, medium and equipment for realizing AI (advanced technology attachment) service
CN116643844A (en) * 2023-05-24 2023-08-25 方心科技股份有限公司 Intelligent management system and method for automatic expansion of power super-computing cloud resources
CN116662010A (en) * 2023-06-14 2023-08-29 肇庆学院 Dynamic resource allocation method and system based on distributed system environment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107734052A (en) * 2017-11-02 2018-02-23 华南理工大学 The load balancing container dispatching method that facing assembly relies on
CN110990159A (en) * 2019-12-25 2020-04-10 浙江大学 Historical data analysis-based container cloud platform resource quota prediction method
CN111638958A (en) * 2020-06-02 2020-09-08 中国联合网络通信集团有限公司 Cloud host load processing method and device, control equipment and storage medium
CN113010260A (en) * 2020-09-29 2021-06-22 证通股份有限公司 Elastic expansion method and system for container quantity
CN113505879A (en) * 2021-07-12 2021-10-15 中国科学技术大学 Prediction method and device based on multi-attention feature memory model
CN114238054A (en) * 2021-12-17 2022-03-25 山东省计算中心(国家超级计算济南中心) Cloud server resource utilization quantity prediction method based on improved TFT
CN114296872A (en) * 2021-12-23 2022-04-08 中国电信股份有限公司 Scheduling method and device for container cluster management system
CN114462664A (en) * 2021-12-09 2022-05-10 武汉长江通信智联技术有限公司 Short-range branch flight scheduling method integrating deep reinforcement learning and genetic algorithm
CN114489944A (en) * 2022-01-24 2022-05-13 合肥工业大学 Kubernetes-based prediction type elastic expansion method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107734052A (en) * 2017-11-02 2018-02-23 华南理工大学 The load balancing container dispatching method that facing assembly relies on
CN110990159A (en) * 2019-12-25 2020-04-10 浙江大学 Historical data analysis-based container cloud platform resource quota prediction method
CN111638958A (en) * 2020-06-02 2020-09-08 中国联合网络通信集团有限公司 Cloud host load processing method and device, control equipment and storage medium
CN113010260A (en) * 2020-09-29 2021-06-22 证通股份有限公司 Elastic expansion method and system for container quantity
CN113505879A (en) * 2021-07-12 2021-10-15 中国科学技术大学 Prediction method and device based on multi-attention feature memory model
CN114462664A (en) * 2021-12-09 2022-05-10 武汉长江通信智联技术有限公司 Short-range branch flight scheduling method integrating deep reinforcement learning and genetic algorithm
CN114238054A (en) * 2021-12-17 2022-03-25 山东省计算中心(国家超级计算济南中心) Cloud server resource utilization quantity prediction method based on improved TFT
CN114296872A (en) * 2021-12-23 2022-04-08 中国电信股份有限公司 Scheduling method and device for container cluster management system
CN114489944A (en) * 2022-01-24 2022-05-13 合肥工业大学 Kubernetes-based prediction type elastic expansion method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨忠: "面向Docker 容器的动态负载集群伸缩研究", 舰船电子工程, vol. 38, no. 8, pages 109 - 115 *
苗立尧;陈莉君;: "一种基于Docker容器的集群分段伸缩方法", 计算机应用与软件, no. 01 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115562841A (en) * 2022-11-10 2023-01-03 军事科学院系统工程研究院网络信息研究所 Cloud video service self-adaptive resource scheduling system and method
CN116048734A (en) * 2023-03-29 2023-05-02 贵州大学 Method, device, medium and equipment for realizing AI (advanced technology attachment) service
CN116048734B (en) * 2023-03-29 2023-06-02 贵州大学 Method, device, medium and equipment for realizing AI (advanced technology attachment) service
CN116643844A (en) * 2023-05-24 2023-08-25 方心科技股份有限公司 Intelligent management system and method for automatic expansion of power super-computing cloud resources
CN116643844B (en) * 2023-05-24 2024-02-06 方心科技股份有限公司 Intelligent management system and method for automatic expansion of power super-computing cloud resources
CN116662010A (en) * 2023-06-14 2023-08-29 肇庆学院 Dynamic resource allocation method and system based on distributed system environment
CN116662010B (en) * 2023-06-14 2024-05-07 肇庆学院 Dynamic resource allocation method and system based on distributed system environment

Also Published As

Publication number Publication date
CN115118602B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
CN115118602B (en) Container resource dynamic scheduling method and system based on usage prediction
US10311044B2 (en) Distributed data variable analysis and hierarchical grouping system
CA3088899C (en) Systems and methods for preparing data for use by machine learning algorithms
Park et al. EvoGraph: An effective and efficient graph upscaling method for preserving graph properties
CN116089883B (en) Training method for improving classification degree of new and old categories in existing category increment learning
CN117573328B (en) Parallel task rapid processing method and system based on multi-model driving
CN108829846B (en) Service recommendation platform data clustering optimization system and method based on user characteristics
CN109976974B (en) System monitoring method under cloud computing environment aiming at operation state judgment
Yang et al. Trust-based scheduling strategy for cloud workflow applications
CN115860856A (en) Data processing method and device, electronic equipment and storage medium
CN113010774B (en) Click rate prediction method based on dynamic deep attention model
CN113535527A (en) Load shedding method and system for real-time flow data predictive analysis
CN116737607B (en) Sample data caching method, system, computer device and storage medium
US20240119295A1 (en) Generalized Bags for Learning from Label Proportions
CN113313313B (en) City perception-oriented mobile node task planning method
Fu et al. Federated Transfer Learning for Soalr Flare Forecasting
CN116471197A (en) Task fault prediction method based on Transformer under cloud data center environment
Li et al. Scheduling queue resource occupation prediction based on hybrid neural network
Li et al. Software Group Rejuvenation Based on Matrix Completion and Cerebellar Model Articulation Controller
CN118138638A (en) Service push method, apparatus, computer device, readable storage medium, and program product
Guo et al. A context-aware data processing model in power communication networks
CN117805637A (en) Battery safety monitoring method and system
CN117112163A (en) Data processing process scheduling method and system based on improved jellyfish search algorithm
CN114615131A (en) Self-adaptive fault diagnosis algorithm for multi-stage cloud computing system
CN117592011A (en) Job resource prediction method based on feature similarity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 409-48, 4th Floor, Building 1, No. 38 Yongda Road, Daxing Biomedical Industry Base, Zhongguancun Science and Technology Park, Daxing District, Beijing 100190

Applicant after: CSIC Information Technology Co.,Ltd.

Applicant after: The 716th Research Institute of China Shipbuilding Corp.

Applicant after: Jiangsu Jierui Information Technology Co.,Ltd.

Address before: Room 409-48, 4th Floor, Building 1, No. 38 Yongda Road, Daxing Biomedical Industry Base, Zhongguancun Science and Technology Park, Daxing District, Beijing 100190

Applicant before: CSIC Information Technology Co.,Ltd.

Applicant before: 716TH RESEARCH INSTITUTE OF CHINA SHIPBUILDING INDUSTRY Corp.

Applicant before: Jiangsu Jierui Information Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 409-48, 4th Floor, Building 1, No. 38 Yongda Road, Daxing Biomedical Industry Base, Zhongguancun Science and Technology Park, Daxing District, Beijing 100190

Patentee after: China Shipbuilding Digital Information Technology Co.,Ltd.

Country or region after: China

Patentee after: The 716th Research Institute of China Shipbuilding Corp.

Patentee after: Jiangsu Jierui Information Technology Co.,Ltd.

Address before: Room 409-48, 4th Floor, Building 1, No. 38 Yongda Road, Daxing Biomedical Industry Base, Zhongguancun Science and Technology Park, Daxing District, Beijing 100190

Patentee before: CSIC Information Technology Co.,Ltd.

Country or region before: China

Patentee before: The 716th Research Institute of China Shipbuilding Corp.

Patentee before: Jiangsu Jierui Information Technology Co.,Ltd.