CN115118602A

CN115118602A - Container resource dynamic scheduling method and system based on usage prediction

Info

Publication number: CN115118602A
Application number: CN202210701215.0A
Authority: CN
Inventors: 朱大鹏; 刘彩云; 姜厚禄; 侍守创; 胡昌平; 胡翔宇; 徐雷; 左刚; 单文金; 杨庆
Original assignee: Jiangsu Jierui Information Technology Co ltd; 716th Research Institute of CSIC; CSIC Information Technology Co Ltd
Current assignee: China Shipbuilding Digital Information Technology Co ltd; Jiangsu Jierui Information Technology Co ltd; 716th Research Institute of CSIC
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2022-09-27
Anticipated expiration: 2042-06-21
Also published as: CN115118602B

Abstract

The invention discloses a container resource dynamic scheduling method and a system based on usage prediction.A container resource usage prediction model based on a Transformer is adopted to predict the usage of application resources, the program is applied to different time nodes with different requirements on the usage rate of a CPU, the memory usage, the network usage and the disk usage, the usage of the resources is collected according to a time sequence, the resources are normalized and then are arranged into a time sequence characteristic sequence according to a proper time window and are input into a prediction model row to predict the usage of the container resources; and then calculating the expected copy number of the next period on the basis of the prediction result, comparing the expected copy number with the currently calculated copy number calculated by a response type scaling algorithm, and dynamically scheduling container resources according to an HPA strategy, thereby achieving the purposes of improving the resource utilization rate of the system and reducing the downtime risk of the system while ensuring the reliability.

Description

Container resource dynamic scheduling method and system based on usage prediction

Technical Field

The invention relates to the technical field of container resource scheduling, in particular to a container resource dynamic scheduling method and system based on usage prediction.

Background

Currently, more and more applications are deployed in containers. Typically, we allocate sufficient resources for hosted applications in order to be able to run stably. However, in most cases, the hosted application program does not run in the highest load state, and various resources such as a CPU and a memory are not simultaneously in the highest load state, so that the pre-allocated resources are in an idle state in most of the time, thereby causing waste of resources. Furthermore, when the application is in a state of high load, the pre-allocated resources are not necessarily sufficient. At present, the application in the container depends on the operating system of the host machine to perform resource allocation, limitation and weight setting. The problems that arise in doing so are: the resource allocation weight and the limit of each container are fixed from the beginning, and cannot be dynamically adjusted, which easily causes resource waste and resource shortage.

In order to solve the problems, a specific prediction algorithm is usually used in cloud computing to predict resource requirements of a virtual machine and an application, and resource allocation optimization is made in advance to improve the utilization rate and service quality of resources, help to make resource allocation in advance, and optimize resource management of Docker. In the existing Kubernetes system, the task of resource scheduling is mainly handled by the Scheduler component. When the application is scheduled for the first time, the Scheduler selects a most suitable Node from all Node nodes in the cluster to deploy according to the resource configuration condition of the application, namely a static scheduling strategy. First, the mechanism can only perform resource configuration when the application is initially deployed, and cannot dynamically adjust allocated resources as needed when the application runs, which may result in low utilization rate of host resources. Secondly, due to the lack of prediction on the resource use condition, an alarm mechanism cannot give an alarm before the resource index violates, and the Scheduler cannot perform resource scheduling or automatic scaling of the instances before the resource consumption bottleneck occurs. Moreover, the Scheduler does not consider the sensitivity of the application to the resources, and the bottleneck to a single resource on the Node is easily caused. Therefore, a corresponding resource management mechanism and a scheduling strategy must be formulated from the viewpoints of resource utilization maximization, application sensitivity to resources and the like, so that dynamic scheduling and automatic scaling of instances can be triggered in advance before the application has a bottleneck, the utilization rate of system resources is improved, and the scheduling flexibility is increased.

Disclosure of Invention

The invention aims to provide a container resource dynamic scheduling method and system based on usage prediction, which can improve the utilization rate of system resources, increase the scheduling flexibility and realize that a container can respond to the resource demand of an application program deployed on the container in advance.

The technical solution for realizing the purpose of the invention is as follows:

a dynamic scheduling method of container resources based on usage prediction comprises the following steps:

setting a capacity expansion or capacity reduction threshold, and setting a capacity reduction judgment frequency threshold and a monitoring period;

monitoring the use condition of the container and each node in the cluster, monitoring the resources and the container on the machine of each node in real time and acquiring performance data, and acquiring resource indexes and use rates of all the copies;

aggregating system state, system resource index and application program performance index based on resource index and utilization rate, and storing the aggregated data into corresponding files;

calculating load sequence data of application history, building a container resource usage prediction model through a deep learning algorithm, and predicting the container resource usage at a future moment;

and calculating the expected copy number of the next period according to the predicted load data, calculating the current expected copy number by adopting a response type scaling algorithm, comparing the copy number with the current expected copy number, determining the final copy number as the input of an HPA strategy, and performing dynamic shrinkage on container resources.

Further, the performance data includes CPU usage, memory usage, network throughput, and file system usage.

Further, the steps of monitoring the use condition of the container in the cluster and the resource of each node, monitoring the resource and the container on the machine of each node in real time and collecting performance data, and acquiring the resource index and the use ratio of all the copies are as follows:

step 2-1: the Heapster acquires all node information lists in the system from the Master node by adopting Kubernetes;

step 2-2: in each node, the Kubelet collects the resource utilization information of a container and the whole physical node deployed on the node by using the cAdvisor;

step 2-3: after acquiring the resource utilization information of each node, the Heapster stores the data in the databases, newly establishes a database named as 'k 8 s' in InfluxDB, queries the data in the 'k 8 s' database through Grafana, and displays the queried data on a graphical interface.

Further, the aggregating the system state, the system resource index and the application performance index based on the resource index and the usage rate, and saving the aggregated data to the corresponding file specifically includes the steps of:

step 3-1: obtaining information in the system by using Kubernets CLI kubecect, and retrieving information of a joint point, a name space, a Pod and a service from the whole system; for each namespace, comparing the tags of each Pod and each service, and then recording the mapping of the pods and the services;

step 3-2: inquiring a resource index value stored in a database from the database through an HTTP interface in the InfluxDB and an InfluxQL statement of an InfluxDB database;

step 3-3: and sorting and storing the inquired data into a JSON file.

Further, the load sequence data of the application history is calculated as: and calculating the load of each application by adopting dynamic weighting, comprehensively considering each resource index, and performing weighting calculation according to the utilization rate of each resource index to obtain comprehensive load sequence data of the application.

Further, the calculating of the load sequence data of the application history specifically includes the steps of:

assume that there are n copies of Pod, denoted P ═ P, applied to one application at a time in the collected data ₁ ,p ₂ ,…,p _d ]The resource request amount of each Pod copy is R ═ R ₁ ,r ₂ ,…,r _d ]The resource usage is Q ═ Q ₁ ,q ₂ ,…,q _d ]And D ═ 1,2,3, …, D]Representing a resource dimension for each Pod copy;

the resource usage rate of each application is expressed as U ═ U ₁ ,u ₂ ,…,u _d ]The calculation formula is as follows:

the weight of each dimension of resource is:

the integrated load is then:

and adding the load calculated by the formula at a certain moment into the load queue, updating the load queue until the number of the sequences in the load queue reaches the preset size, deleting the oldest load value from the load queue, and adding the newly calculated load value into the load queue.

Furthermore, the input of the container resource usage prediction model is the CPU usage, memory usage, network usage and disk usage, and the output is a container load, which includes an encoder and a decoder; the encoder is formed by stacking 6 isomorphic network layers, and each network layer comprises a multi-head attention sublayer and a position-based feedforward neural network; the decoder and the encoder are consistent in structure, Masked multi-head attention layers are adopted when the features of the input sequence are extracted, and the whole model is optimized for outputting of each layer by using residual connection and normalization processing.

Further, the container resource usage prediction model preprocesses model input data to obtain an input X ═ X ₁ ，x ₂ ，...，x _n ] ^T ∈R ^n×d Where n denotes the time window length, d denotes the input quantity dimension, x _i Representing the system performance data at the ith time point, i is 1, 2.

Where pos represents a specific position of each time-series data in the input sequence X, and i represents a dimension;

vectorization of each system performance data is:

re _i ＝we _i +pe _i

therein, we _i Value, pe, representing the ith data in input X _i A position vector representing the ith data in X;

the multi-head attention sublayer is:

MultiHead(Q，K，V)＝Concat(head ₁ ，head ₂ ，...，head _h )W ^o

wherein Q is a query vector, K is a key vector, V is a value vector, Q, K and V are multiplied by 3 different weight matrices W through an input vector matrix X ^Q ，W ^K ，W ^V Obtaining:

Q＝X·W ^Q

K＝X.W ^K

V＝X.W ^V

the feedforward neural network consists of two linear transformations, namely ffn (x) max (0, xW) ₁ +b ₁ )W ₂ +b ₂ Wherein, W ₁ And W ₂ Is a state matrix, b ₁ And b ₂ To compensate the parameter; there is an activation function of ReLU between two linear transformations;

the encoder outputs a sequence feature vector Z ═ (Z ═ Z) _a1 ，z _a2 ，...，z _an )；

FFN(x)＝max(0，xW ₁ +b ₁ )W ₂ +b ₂

When the decoder decodes, firstly, Masked multi-head attention layer is used to obtain continuous characterization vector Z ═ Z (Z is) _b1 ，z _b2 ，...，z _bn ) And performing multi-head attention calculation and translation alignment by using Values of continuous characterization vectors of Queries, Keys and Masked multi-head attention layers obtained by outputting sequence feature vectors by an encoder, and associating the source sequence with the high-level features of the target sequence.

Further, the step of performing dynamic contraction on container resources by using the finally predicted copy number as an input of the HPA policy after comparing the two is specifically as follows: if the number of the current expected copies is larger than the number of the currently applied copies, taking the largest one of the number of the copies expected in the next period and the number of the current expected copies as the input of an HPA strategy; otherwise, if the number of copies expected in the next period is larger than the number of copies currently applied, taking the number of copies expected in the next period as the input of the HPA strategy; and if the copy number expected in the next period and the copy number expected at present are less than the copy number applied currently for multiple times, taking the copy number expected in the next period at last as the input of the HPA strategy, and performing dynamic contraction on the container resources.

A container resource dynamic scheduling system based on usage prediction comprises a container resource usage monitoring module, a data aggregation module, a container resource usage prediction module and an automatic scaling module, wherein:

the container resource usage monitoring module is used for monitoring the containers in the cluster and the resource usage condition of each node, performing real-time monitoring and performance data acquisition on the resources and the containers on the machines of each node, and acquiring resource indexes and usage rates of all the copies;

the data aggregation module aggregates the system state, the system resource index and the application program performance index based on the resource index and the utilization rate, and stores the aggregated data into a corresponding file;

the container resource usage prediction module is used for calculating load sequence data of application history and predicting the container resource usage at a future moment through a container resource usage prediction model established by a deep learning algorithm;

and the automatic expansion module calculates the number of copies expected in the next period according to the predicted load data, calculates the number of currently expected copies by adopting a response type expansion algorithm, determines the final number of copies as the input of an HPA strategy after comparing the number of the currently expected copies with the number of the currently expected copies, and performs dynamic expansion of container resources.

Compared with the prior art, the invention has the following beneficial technical effects:

the method adopts a Transformer-based model to predict the usage amount of the platform application resources, and constructs a dynamic scheduling and balancing model of the container resources by taking the load balance of the container and the container distance as the target of container scheduling according to the prediction result, thereby realizing the rapid allocation of the container resources. The technology can improve the utilization rate of system resources, increase the scheduling flexibility, realize that the container can respond to the resource demand of the application program deployed on the container in advance, and can improve the utilization rate of the resources to the greatest extent compared with the traditional static resource allocation mode based on the priori knowledge.

Drawings

The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:

FIG. 1 is a flow chart of the present invention.

Fig. 2 is a diagram of the overall architecture for dynamic resource scheduling of the present invention.

Fig. 3 is a load queue diagram of the present invention.

FIG. 4 is a diagram of a Transformer model of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The application does not have a constant requirement on the container resources but dynamically changes, on the basis, a Transformer-based container resource usage prediction model is adopted to predict the usage of the application resources, the requirements of program application on the CPU usage rate, the memory usage amount, the network usage amount and the disk usage amount of different time nodes are different, the usage conditions of the resources are collected according to a time sequence, the resources are normalized and then are arranged into a time sequence characteristic sequence according to a proper time window, and the time sequence characteristic sequence is input into a prediction model row to predict the usage of the container resources; and then calculating the expected copy number of the next period on the basis of the prediction result, comparing the expected copy number with the currently calculated copy number calculated by a response type scaling algorithm, and dynamically scheduling container resources according to an HPA strategy, thereby achieving the purposes of improving the resource utilization rate of the system and reducing the downtime risk of the system while ensuring the reliability.

The embodiment provides a container resource dynamic scheduling system based on usage prediction, as shown in fig. 2, including a container resource usage monitoring module, a data aggregation module, a container resource usage prediction module, and an automatic scaling module, where the following modules are specifically:

(1) container resource usage monitoring module

The monitoring module monitors containers in the cluster and the resource use condition of each Node by using a cAdvison tool of Google, and carries out real-time monitoring and performance data acquisition on the resources and the containers on the machine of each Node, wherein the real-time monitoring and performance data acquisition comprise the CPU use condition, the memory use condition, the network throughput and the use condition of a file system. System resource indicator monitoring will use the following procedure:

step 1-1: the Heapster will use the Kubernetes API to obtain a list of all node information in the system from the Master node.

Step 1-2: in each node, the Kubelet collects resource utilization information, including resource indexes and utilization, of the container and the whole physical node deployed on the node by using the cAdvisor.

Step 1-3: after receiving the resource utilization information about each node, the Heapster stores the data in a database, and a database named as 'k 8 s' is newly established in the InfluxDB. Grafana deployed in our system will query the data in the "k 8 s" database and present the data on a graphical interface.

(2) Data aggregation module

The data aggregation module is used for integrating system state (including nodes and Pod), system resource index and application performance index. And inputting and saving the aggregated data into a corresponding file. The specific process is as follows:

step 2-1: information was obtained in the system using kubernets CLI kubecect. Information about the nodes, namespaces, Pod and services is retrieved from the entire system. For each namespace, the tags for each Pod and each service are compared, and then the mapping of pods and services is recorded.

Step 2-2: and inquiring the resource index value stored in the database from the database through an HTTPAPI interface in the InfluxDB and an InfluxQL statement of the InfluxDB database.

Step 2-3: and (3) sorting and storing the data queried in the step (2) into a JSON file for use by a following module.

(3) Container resource usage prediction module

In order to solve the problem that a single resource cannot well measure the load of the application, the load of each application is calculated by dynamic weighting, and when the load of the application is calculated by a dynamic weighting algorithm, each resource index is comprehensively considered, weighting calculation is carried out according to the utilization rate of each resource index, and finally the comprehensive load of the application is obtained.

Assume that there are n Pod copies of an application in the collected data at a time, which may be denoted as P ═ P ₁ ,p ₂ ,…,p _d ]The resource request amount of each Pod copy is R ═ R ₁ ,r ₂ ,…,r _d ]The resource usage is Q ═ Q ₁ ,q ₂ ,…,q _d ]Wherein D ═ 1,2,3, …, D]The resource dimension of each Pod copy is represented.

The resource usage rate of each application may be expressed as U ═ U ₁ ,u ₂ ,…,u _d ]The calculation formula is as follows:

the weight calculation formula of each dimension resource is as follows:

the overall load of the application can eventually be calculated according to the formula:

the load at a certain time calculated by the above equation may be added to the load queue shown in fig. 3, until the number of sequences in the load queue reaches a preset size, the load queue is updated, the oldest load value is deleted from the load queue, the newly calculated load value is added to the load queue, and then the load in the cluster is predicted by using a Transformer-based container resource usage prediction model.

As shown in fig. 4, the transform model is composed of an encoder and a decoder. The encoder is formed by stacking 6 isomorphic network layers, and each network layer comprises a Multi-Head Attention sub-layer (Multi-Head Attention) and a position-based Feed-Forward neural network (Feed Forward); the decoder and the encoder are roughly consistent, but a Masked multi-head attention layer is adopted when the input sequence is subjected to feature extraction, and the whole model uses residual connection and normalization on the output of each layer to better optimize the network.

The input of the prediction model is CPU utilization rate, memory usage, network usage and disk usage, and the output is container load. Preprocessing the model input data to obtain an input X ═ X ₁ ,x ₂ ,…,x _n ] ^T ∈R ^n×d . Where n denotes the time window length, d denotes the input quantity dimension, x _i The system performance data at the ith time point is shown, i ═ 1,2, …, n. The calculation of the position vector of the system performance data for each time point is as follows:

where pos represents the specific location of each time series data in the input sequence X and i represents a dimension.

Finally, the vectorization of each data is represented as follows:

re _i ＝we _i +pe _i

therein, we _i Value, pe, representing the ith data in input X _i Representing the position vector of the ith datum in X.

The self-attention mechanism for calculating the degree of correlation between data can be generally described by three vectors, a query vector (Q), a key vector (K), and a value vector (V), wherein Q, K and V are obtained by multiplying an input vector matrix X by 3 different weight matrices W ^Q ，W ^K ，W ^V Obtained as follows:

Q＝X·W ^Q

K＝X·W ^k

v＝X·W ^V

when the self-attention information is obtained, firstly, Q vectors are used for inquiring all candidate positions, each candidate position has a pair of K and V vectors, the inquiring process is a process of performing dot product operation on the Q vectors and the K vectors of all the candidate positions, dot product results are weighted to the respective V vectors after passing through a Softmax function, and final self-attention results are obtained by summation, and the calculation is as shown below.

The multi-head self-attention mechanism is equivalent to the integration of h parallel self-attention layers, and the calculation method is as follows:

MultiHead(Q，K，V)＝Concat(head ₁ ，head ₂ ，…，head _h )W ^o

then, the processed signal is processed by a Feed-Forward neural Network (Feed-Forward Network) based on position, wherein the Feed-Forward neural Network consists of two linear transformations, and FFN (x) is max (0, xW) ₁ +b ₁ )W ₂ +b ₂ Wherein W is ₁ And W ₂ Is a state matrix, b ₁ And b ₂ To compensate the parameter; there is an activation function for the ReLU between the two. Encoder output sequence feature vector Z ═ (Z) _a1 ,z _a2 ,…,z _an )。

The Decoder (Decoder) first calculates the continuous token vector Z ═ using Masked multi-head attention (Z ═ Z) to obtain _b1 ,z _b2 ,…,z _bn ). Obtaining z after multi-head attention and residual error standardization processing _b And the encoder generated feature vector z _a Performing multi-head attention calculation; using the encoder output vector z in this step _a Derived Queries, Keys and decoder derived z _b Multiple-headed notation for vector ValuesThe idea computation is aligned with translation, namely, the source sequence is associated with the high-level characteristics of the target sequence, the representation of the sequence is learned by using multi-head self-attention in an encoder and a decoder, and then the probability output of the load is finally realized through the same residual error, normalization processing and position-based feed-forward network processing as well as linear optimization and Softmax series processing.

(4) Automatic telescopic module

And calculating the expected copy number of the next period according to the load data predicted by the container resource usage prediction module, and calculating the current expected copy number B by adopting a response type scaling algorithm. If the copy number B calculated by the adopted response formula is larger than the copy number C currently applied in the system, taking the largest one of the predicted copy number A and the copy number B calculated by the response formula as the input of an HPA strategy; otherwise, if the predicted copy number A is larger than the current copy number C in the system, the predicted copy number is used as the input of the HPA strategy; and when the predicted copy number A and the calculated copy number B of the response equation are less than the current copy number C in the system for a plurality of times (preferably 5 times), taking the finally predicted copy number A as the input of the HPA strategy to perform dynamic contraction of the container resources. Therefore, the phenomenon that the applied load is jittered to cause premature capacity shrinkage and further influence the quality of service can be avoided.

Based on the system, as shown in fig. 1, a method for dynamically scheduling container resources based on usage prediction includes the following steps:

step 1: setting a threshold value of capacity expansion or capacity reduction, and setting a threshold value th of capacity reduction judgment times and a monitoring period;

step 2: acquiring k resource indexes t utilization rates of all the copies in each monitoring period through a monitoring module;

and step 3: calculating load sequence data of application history according to the application resource utilization rate acquired in the step 2;

and 4, step 4: predicting the load of the application in the next monitoring period by using a Transformer prediction model, calculating an expected copy number predicted Pod according to the predicted load, calculating an expected copy number recovery Pod according to the Current load of the application, if the recovery Pod is greater than the Current copy number Current Pod of the system, taking max (predicted Pod, recovery Pod) as the input of an HPA (elastic expansion) strategy, and skipping to the step 7, otherwise, turning to the step 5;

and 5: if the Presect Pod Current Pod or n-th, taking the Presect Pod as the input of the HPA strategy and resetting the number of times of load prediction reduction, skipping to step 7, otherwise, turning to step 6;

step 6: the number n of load prediction reduction is increased by one;

and 7: and triggering dynamic scaling according to the input application copy number.

Claims

1. A method for dynamically scheduling container resources based on usage prediction is characterized by comprising the following steps:

2. The method of claim 1, wherein the method for dynamically scheduling the container resources based on the usage prediction comprises: the performance data includes CPU usage, memory usage, network throughput, and file system usage.

3. The method of claim 1, wherein the method for dynamically scheduling the container resources based on the usage prediction comprises: the method comprises the following steps of monitoring the use condition of the container and each node in the cluster, monitoring the resources and the container on the machine of each node in real time, acquiring performance data, and acquiring resource indexes and use rates of all the copies:

4. The method of claim 1, wherein the method for dynamically scheduling the container resources based on the usage prediction comprises: the aggregating the system state, the system resource index and the application program performance index based on the resource index and the utilization rate and storing the aggregated data into the corresponding file specifically comprises the following steps:

step 3-3: and sorting and storing the inquired data into a JSON file.

5. The method of claim 1, wherein the method for dynamically scheduling the container resources based on the usage prediction comprises: the load sequence data of the application history is calculated as follows: and calculating the load of each application by adopting dynamic weighting, comprehensively considering each resource index, and performing weighting calculation according to the utilization rate of each resource index to obtain comprehensive load sequence data of the application.

6. The method of claim 5, wherein the container resource dynamic scheduling based on usage prediction comprises: the calculating of the load sequence data of the application history specifically comprises the steps of:

assume that there are n Pod copies in the collected data, denoted P ═ P, applied one at a time ₁ ,p ₂ ,…,p _d ]The resource request amount of each Pod copy is R ═ R ₁ ,r ₂ ,…,r _d ]The resource usage is Q ═ Q ₁ ,q ₂ ,…,q _d ]And D ═ 1,2,3, …, D]Representing a resource dimension for each Pod copy;

the weight of each dimension resource is:

the integrated load is then:

7. The method of claim 1, wherein the method for dynamically scheduling the container resources based on the usage prediction comprises: the input of the container resource usage prediction model is CPU usage rate, memory usage, network usage and disk usage, and the output is container load, which comprises an encoder and a decoder; the encoder is formed by stacking 6 isomorphic network layers, and each network layer comprises a multi-head attention sublayer and a position-based feedforward neural network; the decoder and the encoder are consistent in structure, Masked multi-head attention layers are adopted when the features of the input sequence are extracted, and the whole model is optimized for outputting of each layer by using residual connection and normalization processing.

8. The method of claim 7, wherein the container resource dynamic scheduling based on usage prediction comprises: the container resource usage prediction model preprocesses model input data to obtain input X ═ X ₁ ,x ₂ ,…,x _n ] ^T ∈R ^n×d Where n denotes the time window length, d denotes the input quantity dimension, x _i Representing the system performance data at the ith time point, i is 1, 2.

Where pos represents the specific location of each time-series data in the input sequence X, and i represents a dimension;

the vectorization of each system performance data is:

re _i ＝we _i +pe _i

the multi-head attention sublayer is:

MultiHead(Q,K,V)＝Concat(head ₁ ,head ₂ ,…,head _h )W ^o

Q＝X·W ^Q

K＝X·W ^K

V＝X·W ^V

the encoder outputs a sequence feature vector Z ═ (Z ═ Z) _a1 ,z _a2 ,…,z _an )；

FFN(x)＝max(0,xW ₁ +b ₁ )W ₂ +b ₂

When the decoder decodes, firstly, Masked multi-head attention layer is used to obtain continuous characterization vector Z ═ Z (Z is) _b1 ,z _b2 ,…,z _bn ) Q obtained by using the encoder to output the sequence feature vectorValues of continuous characterization vectors of multiple attention layers of the facilities, Keys and Masked perform multiple attention calculation and translation alignment, and associate the source sequence with the high-level features of the target sequence.

9. The method of claim 1, wherein the method for dynamically scheduling the container resources based on the usage prediction comprises: after comparing the two, taking the finally predicted copy number as the input of the HPA strategy, and performing dynamic contraction on the container resources specifically as follows: if the number of the currently expected copies is larger than the number of the currently applied copies, taking the largest one of the number of the copies expected in the next period and the number of the currently expected copies as the input of an HPA strategy; otherwise, if the number of copies expected in the next period is larger than the number of copies currently applied, taking the number of copies expected in the next period as the input of the HPA strategy; and if the copy number expected in the next period and the copy number expected at present are less than the copy number applied currently for multiple times, taking the copy number expected in the next period at last as the input of the HPA strategy, and performing dynamic contraction on the container resources.

10. The utility model provides a container resource dynamic scheduling system based on use amount prediction which characterized in that, includes container resource use amount monitoring module, data aggregation module, container resource use amount prediction module and automatic flexible module, wherein:

and the automatic expansion module calculates the expected copy number of the next period according to the predicted load data, calculates the current expected copy number by adopting a response type expansion algorithm, determines the final copy number after comparing the copy number and the current expected copy number as the input of an HPA strategy, and performs dynamic contraction on container resources.