CN110222840B - Cluster resource prediction method and device based on attention mechanism - Google Patents

Cluster resource prediction method and device based on attention mechanism Download PDF

Info

Publication number
CN110222840B
CN110222840B CN201910413227.1A CN201910413227A CN110222840B CN 110222840 B CN110222840 B CN 110222840B CN 201910413227 A CN201910413227 A CN 201910413227A CN 110222840 B CN110222840 B CN 110222840B
Authority
CN
China
Prior art keywords
attention
time
weight
unit
hidden layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910413227.1A
Other languages
Chinese (zh)
Other versions
CN110222840A (en
Inventor
窦耀勇
唐家伟
吴维刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201910413227.1A priority Critical patent/CN110222840B/en
Publication of CN110222840A publication Critical patent/CN110222840A/en
Application granted granted Critical
Publication of CN110222840B publication Critical patent/CN110222840B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a cluster resource prediction method and device based on an attention mechanism, which adopts an improved attention mechanism and integrates the improved attention mechanism into an LSTM, so that the correlation among a plurality of time sequences can be mined, a scheme of predicting the resource demand in a cluster by using the plurality of time sequences is provided, the prediction accuracy is effectively improved, the resource planning can be effectively assisted, the resource utilization rate of the cluster is improved, and the operation and maintenance cost of a data center is effectively reduced.

Description

Cluster resource prediction method and device based on attention mechanism
Technical Field
The invention relates to the technical field of cluster resource management, in particular to a cluster resource prediction method and device based on an attention mechanism.
Background
The volume of the existing data center is larger and larger, and resource management is effectively carried out on clusters in the data center, so that the utilization rate of hardware resources can be improved, the operation and maintenance cost is reduced, and the profit of operation and maintenance is improved. One method for effectively improving the resource utilization rate is to predict the future resource demands of the clusters, so that resource planning is performed in advance, and the waste of resources is reduced.
Currently, cluster resource demand prediction mainly uses time series data of cluster resources. The common time series prediction model is ARIMA (integrated moving average autoregressive model), VAR (vector autoregressive model), GBRT (gradient lifting regression tree) LSTM (long-short-term memory network), and the like, and can be directly used for predicting the resource demand in the cluster.
However, the current cluster resource prediction method has two main problems: 1. these methods use mainly a single time series as a feature for prediction (such as ARIMA) and few use multiple time series for resource demand prediction. The accuracy of the prediction depends on whether the historical value of this time series implies a clear law or not; 2. although there are many general multi-time sequence prediction models (such as VAR) at present, these model methods do not consider the characteristics of clusters in the data center, and in particular, do not consider the correlation and mutual interference between application loads in the clusters, and both the above problems can lead to inaccurate cluster resource prediction results.
Therefore, how to provide a method for accurately predicting the cluster resource is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for predicting cluster resources based on an attention mechanism, which use a plurality of resource time sequences to predict the future resource demand, and further adopts an improved deep learning attention mechanism to mine the correlation between the plurality of resource demand time sequences according to the characteristics of the application load in the cluster on the resource usage, so as to effectively improve the accuracy of resource prediction.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a cluster resource prediction method based on an attention mechanism comprises the following steps:
s1: taking the first hidden layer state at the last moment, all time series data belonging to one deployment unit with the target instance, all time series data belonging to one host unit with the target instance and the target time sequence at the historical moment as input of an input attention layer to obtain a first input vector;
s2: inputting the first input vector to an LSTM coder to obtain a current first hidden layer state;
s3: inputting the current first hidden layer state and the second hidden layer state at the previous moment into a time correlation attention layer to obtain a context vector;
S4: inputting the context vector, the second hidden layer state at the last moment and the target time sequence at the historical moment to an LSTM decoder to obtain the current second hidden layer state;
s5: and linearly transforming the current second hidden layer state and the context vector to obtain a predicted value.
Preferably, step S1 specifically includes:
s11: taking the state of the first hidden layer at the last moment and all time series data belonging to the same deployment unit with the target instance as the input of the attention layer of the deployment unit to obtain the output vector of the attention layer of the deployment unit;
s12: taking the state of the first hidden layer at the last moment and all time series data belonging to the same host unit as the target instance as the input of the attention layer of the host unit to obtain an attention output vector of the host unit;
s13: taking the state of the first hidden layer at the last moment and the target time sequence at the historical moment as the input of the autocorrelation attention layer to obtain an autocorrelation attention layer output vector;
s14: the deployment unit attention layer output vector, the host unit attention output vector, and the autocorrelation attention layer output vector are combined as a first input vector.
Preferably, step S11 specifically includes:
Calculating a first attention weight based on the first hidden layer state at the previous time and all time series data belonging to the same deployment unit as the target instance;
calculating a normalized deployment unit attention weight using a softmax function based on the first attention weight;
calculating a deployment unit attention layer output vector based on the first hidden layer state at the previous time, all time series data belonging to one deployment unit with the target instance and the normalized deployment unit attention weight;
the step S12 specifically includes:
calculating a first-order time sequence correlation coefficient of each time sequence data belonging to one host unit with the target instance relative to the historical target time sequence, and obtaining static time correlation weights of all time sequences and the historical target time sequence in the corresponding host unit;
calculating a second attention weight based on the first hidden layer state and all time-series data of the target instance belonging to one host unit at the previous time;
obtaining the attention weight of the host unit based on the static time correlation weight and the second attention weight, and normalizing to obtain the normalized attention weight of the host unit;
calculating a host unit attention output vector based on the first hidden layer state at the previous time, all time series data belonging to one host unit with the target instance, the target time sequence at the historical time and the normalized host unit attention weight;
The step S13 specifically includes:
calculating correlation coefficients between historical moment target time sequences in different time windows, and obtaining corresponding autocorrelation weights;
calculating a third attention weight based on the first hidden layer state at the previous time and the target time sequence at the different historical time;
obtaining the attention weight of the autocorrelation unit based on the autocorrelation weight and the third attention weight, and normalizing to obtain the attention weight of the normalized autocorrelation unit;
an autocorrelation attention layer output vector is calculated based on the first hidden layer state at the last time, the target timing at the historical time, and the normalized autocorrelation unit attention weights.
Preferably, step S3 specifically includes:
calculating the time attention layer weight based on the state of the second hidden layer at the previous moment, and normalizing to obtain the normalized time attention layer weight;
a context vector is calculated based on the current first hidden layer state and the normalized temporal attention layer weight.
A cluster resource prediction apparatus based on an attention mechanism, comprising:
the first input vector calculation module is used for taking the first hidden layer state at the last moment, all time series data belonging to one deployment unit with the target instance, all time series data belonging to one host unit with the target instance and the target time sequence at the historical moment as input of the input attention layer to obtain a first input vector;
The first hidden layer state calculation module is used for inputting the first input vector to an LSTM coder to obtain a current first hidden layer state;
the context vector calculation module is used for inputting the current first hidden layer state and the second hidden layer state at the previous moment into the time correlation attention layer to obtain a context vector;
the second hidden layer state calculation module is used for inputting the context vector, the second hidden layer state at the last moment and the target time sequence at the historical moment to the LSTM decoder to obtain the current second hidden layer state;
and the linear transformation module is used for carrying out linear transformation on the current second hidden layer state and the context vector to obtain a predicted value.
Preferably, the first input vector calculation module specifically includes:
the first computing unit is used for taking the state of the first hidden layer at the previous moment and all time series data which belong to the same deployment unit as the target instance as the input of the attention layer of the deployment unit to obtain the output vector of the attention layer of the deployment unit;
the second calculating unit is used for taking the state of the first hidden layer at the last moment and all time series data which belong to the same host unit as the target instance as the input of the attention layer of the host unit to obtain an attention output vector of the host unit;
The third calculation unit is used for taking the first hidden layer state at the last moment and the target time sequence at the historical moment as the input of the autocorrelation attention layer to obtain an autocorrelation attention layer output vector;
and the merging unit is used for merging the deployment unit attention layer output vector, the host unit attention output vector and the autocorrelation attention layer output vector as a first input vector.
Preferably, the first computing unit specifically includes:
a first attention weight calculation subunit for calculating a first attention weight based on the first hidden layer state at the previous time and all time-series data belonging to the same deployment unit as the target instance;
a first normalized weight calculation subunit for calculating a normalized deployment unit attention weight using a softmax function based on the first attention weight;
the first attention layer output vector calculation unit is used for calculating an attention layer output vector of the deployment unit based on the first hidden layer state at the last moment, all time series data belonging to the same deployment unit with the target instance and the normalized attention weight of the deployment unit;
the second computing unit specifically includes:
The static time correlation weight calculation subunit is used for calculating a first-order time sequence correlation coefficient of each time sequence data which belongs to one host unit together with the target instance relative to the historical target time sequence, and obtaining static time correlation weights of all time sequences and the historical target time sequence in the corresponding host unit;
a second attention weight calculation subunit for calculating a second attention weight based on the first hidden layer state and all time-series data of the target instance belonging to one host unit at the previous time;
the second normalization weight subunit is used for obtaining the attention weight of the host unit based on the static time correlation weight and the second attention weight, and normalizing the attention weight to obtain the attention weight of the normalized host unit;
the second attention layer output vector calculation unit is used for calculating the attention layer output vector of the host unit based on the state of the first hidden layer at the last moment, all time sequence data belonging to one host unit with the target instance, the target time sequence of the historical moment and the attention weight of the normalized host unit;
the third computing unit specifically includes:
the self-correlation weight calculation subunit is used for calculating correlation coefficients among the historical moment target time sequences in different time windows and obtaining corresponding self-correlation weights;
A third attention weight calculation subunit, configured to calculate a third attention weight based on the first hidden layer state at the previous time and the target time sequence at the different historical time;
the third normalization weight subunit is configured to obtain an attention weight of the autocorrelation unit based on the autocorrelation weight and the third attention weight, and normalize the attention weight of the autocorrelation unit to obtain a normalized attention weight of the autocorrelation unit;
the third attention layer output vector subunit calculates an autocorrelation attention layer output vector based on the first hidden layer state at the last time, the target timing at the historical time, and the normalized autocorrelation unit attention weight.
Preferably, the context vector calculation module specifically includes:
the fourth normalization weight subunit is used for calculating the time attention layer weight based on the state of the second hidden layer at the previous moment, and normalizing the time attention layer weight to obtain the normalized time attention layer weight;
a context vector calculation subunit for calculating a context vector based on the current first hidden layer state and the normalized temporal attention layer weight.
Compared with the prior art, the invention discloses a cluster resource prediction method and device based on an attention mechanism, which adopts an improved attention mechanism and integrates the improved attention mechanism into an LSTM, so that the correlation among a plurality of time sequences can be mined, a scheme of predicting the resource demand in a cluster by using the plurality of time sequences is provided, the prediction accuracy is effectively improved, the resource planning can be effectively assisted, the resource utilization rate of the cluster is improved, and the operation and maintenance cost of a data center is effectively reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a cluster resource prediction method based on an attention mechanism provided by the invention;
FIG. 2 is a flowchart showing the calculation of a first input vector according to the present invention;
FIG. 3 is a schematic diagram of a cluster resource prediction device based on an attention mechanism according to the present invention;
FIG. 4 is a schematic diagram illustrating a first input vector calculation module according to the present invention;
FIG. 5 is a schematic diagram of a time-series acquisition architecture according to the present invention;
fig. 6 is a schematic diagram of a prediction model based on an attention mechanism according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the embodiment of the invention discloses a cluster resource prediction method based on an attention mechanism, which comprises the following steps:
s1: taking the first hidden layer state at the last moment, all time series data belonging to one deployment unit with the target instance, all time series data belonging to one host unit with the target instance and the target time sequence at the historical moment as input of an input attention layer to obtain a first input vector;
s2: inputting the first input vector to an LSTM coder to obtain a current first hidden layer state;
s3: inputting the current first hidden layer state and the second hidden layer state at the previous moment into a time correlation attention layer to obtain a context vector;
s4: inputting the context vector, the second hidden layer state at the last moment and the target time sequence at the historical moment to an LSTM decoder to obtain the current second hidden layer state;
s5: and linearly transforming the current second hidden layer state and the context vector to obtain a predicted value.
In the invention, when the resource prediction is carried out, a plurality of resources and time sequences are used instead of a time sequence for the prediction, in addition, the improved deep learning attention mechanism is adopted to excavate the correlation among the plurality of resource time sequences according to the characteristic of the application load in the cluster on the use of the resources, and finally, the accuracy of the resource prediction is effectively improved.
Referring to fig. 2, in order to further optimize the above technical solution, the embodiment of the present invention further discloses that step S1 specifically includes:
s11: taking the state of the first hidden layer at the last moment and all time series data belonging to the same deployment unit with the target instance as the input of the attention layer of the deployment unit to obtain the output vector of the attention layer of the deployment unit;
s12: taking the state of the first hidden layer at the last moment and all time series data belonging to the same host unit as the target instance as the input of the attention layer of the host unit to obtain an attention output vector of the host unit;
s13: taking the state of the first hidden layer at the last moment and the target time sequence at the historical moment as the input of the autocorrelation attention layer to obtain an autocorrelation attention layer output vector;
s14: the deployment unit attention layer output vector, the host unit attention output vector, and the autocorrelation attention layer output vector are combined as a first input vector.
In order to further optimize the above technical solution, the embodiment of the present invention further discloses that step S11 specifically includes:
calculating a first attention weight based on the first hidden layer state at the previous time and all time series data belonging to the same deployment unit as the target instance;
Calculating a normalized deployment unit attention weight using a softmax function based on the first attention weight;
calculating a deployment unit attention layer output vector based on the first hidden layer state at the previous time, all time series data belonging to one deployment unit with the target instance and the normalized deployment unit attention weight;
the step S12 specifically includes:
calculating a first-order time sequence correlation coefficient of each time sequence data belonging to one host unit with the target instance relative to the historical target time sequence, and obtaining static time correlation weights of all time sequences and the historical target time sequence in the corresponding host unit;
calculating a second attention weight based on the first hidden layer state and all time-series data of the target instance belonging to one host unit at the previous time;
obtaining the attention weight of the host unit based on the static time correlation weight and the second attention weight, and normalizing to obtain the normalized attention weight of the host unit;
calculating a host unit attention output vector based on the first hidden layer state at the previous time, all time series data belonging to one host unit with the target instance, the target time sequence at the historical time and the normalized host unit attention weight;
The step S13 specifically includes:
calculating correlation coefficients between historical moment target time sequences in different time windows, and obtaining corresponding autocorrelation weights;
calculating a third attention weight based on the first hidden layer state at the previous time and the target time sequence at the different historical time;
obtaining the attention weight of the autocorrelation unit based on the autocorrelation weight and the third attention weight, and normalizing to obtain the attention weight of the normalized autocorrelation unit;
an autocorrelation attention layer output vector is calculated based on the first hidden layer state at the last time, the target timing at the historical time, and the normalized autocorrelation unit attention weights.
In order to further optimize the above technical solution, the embodiment of the present invention further discloses that step S3 specifically includes:
calculating the time attention layer weight based on the state of the second hidden layer at the previous moment, and normalizing to obtain the normalized time attention layer weight;
a context vector is calculated based on the current first hidden layer state and the normalized temporal attention layer weight.
Referring to fig. 3, the embodiment of the invention also discloses a cluster resource prediction device based on an attention mechanism, which comprises:
The first input vector calculation module is used for taking the first hidden layer state at the last moment, all time series data belonging to one deployment unit with the target instance, all time series data belonging to one host unit with the target instance and the target time sequence at the historical moment as input of the input attention layer to obtain a first input vector;
the first hidden layer state calculation module is used for inputting a first input vector to the LSTM coder to obtain a current first hidden layer state;
the context vector calculation module is used for inputting the current first hidden layer state and the second hidden layer state at the previous moment into the time correlation attention layer to obtain a context vector;
the second hidden layer state calculation module is used for inputting the context vector, the second hidden layer state at the last moment and the target time sequence at the historical moment to the LSTM decoder to obtain the current second hidden layer state;
and the linear transformation module is used for carrying out linear transformation on the current second hidden layer state and the context vector to obtain a predicted value.
Referring to fig. 4, in order to further optimize the above technical solution, an embodiment of the present invention further discloses that the first input vector calculation module specifically includes:
The first computing unit is used for taking the state of the first hidden layer at the previous moment and all time series data which belong to the same deployment unit as the target instance as the input of the attention layer of the deployment unit to obtain the output vector of the attention layer of the deployment unit;
the second calculating unit is used for taking the state of the first hidden layer at the last moment and all time series data which belong to the same host unit as the target instance as the input of the attention layer of the host unit to obtain an attention output vector of the host unit;
the third calculation unit is used for taking the first hidden layer state at the last moment and the target time sequence at the historical moment as the input of the autocorrelation attention layer to obtain an autocorrelation attention layer output vector;
and the merging unit is used for merging the deployment unit attention layer output vector, the host unit attention output vector and the autocorrelation attention layer output vector as a first input vector.
In order to further optimize the above technical solution, the embodiment of the present invention further discloses that the first computing unit specifically includes:
a first attention weight calculation subunit for calculating a first attention weight based on the first hidden layer state at the previous time and all time-series data belonging to the same deployment unit as the target instance;
A first normalized weight calculation subunit for calculating a normalized deployment unit attention weight using a softmax function based on the first attention weight;
the first attention layer output vector calculation unit is used for calculating an attention layer output vector of the deployment unit based on the first hidden layer state at the last moment, all time series data belonging to the same deployment unit with the target instance and the normalized attention weight of the deployment unit;
the second calculation unit specifically includes:
the static time correlation weight calculation subunit is used for calculating a first-order time sequence correlation coefficient of each time sequence data which belongs to one host unit together with the target instance relative to the historical target time sequence, and obtaining static time correlation weights of all time sequences and the historical target time sequence in the corresponding host unit;
a second attention weight calculation subunit for calculating a second attention weight based on the first hidden layer state and all time-series data of the target instance belonging to one host unit at the previous time;
the second normalization weight subunit is used for obtaining the attention weight of the host unit based on the static time correlation weight and the second attention weight, and normalizing the attention weight to obtain the attention weight of the normalized host unit;
The second attention layer output vector calculation unit is used for calculating the attention layer output vector of the host unit based on the state of the first hidden layer at the last moment, all time sequence data belonging to one host unit with the target instance, the target time sequence of the historical moment and the attention weight of the normalized host unit;
the third calculation unit specifically includes:
the self-correlation weight calculation subunit is used for calculating correlation coefficients among the historical moment target time sequences in different time windows and obtaining corresponding self-correlation weights;
a third attention weight calculation subunit, configured to calculate a third attention weight based on the first hidden layer state at the previous time and the target time sequence at the different historical time;
the third normalization weight subunit is configured to obtain an attention weight of the autocorrelation unit based on the autocorrelation weight and the third attention weight, and normalize the attention weight of the autocorrelation unit to obtain a normalized attention weight of the autocorrelation unit;
the third attention layer output vector subunit calculates an autocorrelation attention layer output vector based on the first hidden layer state at the last time, the target timing at the historical time, and the normalized autocorrelation unit attention weight.
In order to further optimize the above technical solution, the embodiment of the present invention further discloses a context vector calculation module specifically including:
the fourth normalization weight subunit is used for calculating the time attention layer weight based on the state of the second hidden layer at the previous moment, and normalizing the time attention layer weight to obtain the normalized time attention layer weight;
a context vector calculation subunit for calculating a context vector based on the current first hidden layer state and the normalized temporal attention layer weight.
The prediction method adopted by the invention uses an attention mechanism, can excavate the correlation of a plurality of time sequences with correlation, uses weights to express the correlation of the time sequences to the target time sequence, and further predicts the target time sequence by using the correlation, thereby effectively improving the accuracy of the prediction. The model provided by the invention is applied to cluster resource prediction, so that the future resource demand of the cluster can be predicted more accurately, and the resource planning can be assisted more effectively, thereby improving the resource utilization rate of the cluster and reducing the operation and maintenance cost of the data center more effectively.
The technical scheme provided by the invention is further described in detail below in combination with a specific implementation method.
In modern clusters, an application is typically made up of multiple application instances, which are categorized as a unit of deployment. While these instances are typically distributed across different physical hosts, so that each physical host may have multiple different application instances, classifying the application instances residing on one physical host as one host unit, there is a high likelihood that the application instances within one deployment unit and one host unit will be related by correlation. Therefore, for a target instance to be predicted, in collecting the time series data of the target instance, it is also possible to collect the time series data of other application instances of the deployment unit where the target instance is located and the time series data of other application instances of the host unit where the target instance is located at the same time, and finally use these time series data in the prediction of the target instance.
Before describing the method provided by the invention in detail, the mathematical symbols used for inputting and outputting the description model in the invention are stated as follows:
TABLE 1
Figure SMS_1
First, as shown in fig. 5, a time series data acquisition architecture is designed: a local time series database is deployed on each host, and the local time series databases of all hosts can upload data into the global time series database. Then for a resource to be pre-made For the target instance, it can obtain its own time data (target sequence) and all time sequence data X of a host unit same as the target instance i All time series data X belonging to the same deployment unit as the target instance are then queried and obtained from the global time series database o
The invention designs a prediction model based on the attention mechanism according to the data and is named as MLA-LSTM (Multi-level Attention LSTM, short-term memory network with Multi-layer attention)
The model sets a time window with a size T, each time sequence uses T values in the window, and then predicts the value of the next time point of the target instance, that is, the value of the t+1 time point, and this process can be abstracted as:
Figure SMS_2
wherein F is a model to be trained.
Two LSTM's are included in the model: the first LSTM as encoder for processing multiple time sequences of inputs and outputting hidden state h t The method comprises the steps of carrying out a first treatment on the surface of the The second LSTM is used as a decoder and is responsible for processing the hidden state h of the first LSTM output t And finally outputs the predicted value, a schematic diagram of this model is shown in fig. 6.
1. LSTM encoder
The calculation process defining the LSTM encoder is:
Figure SMS_3
Wherein h is t Is the hidden state vector of LSTM at time point t, its length is set to m,
Figure SMS_4
for LSTM input, this input is obtained by calculation of three attention layers, the calculation of which will be described in detail below. The LSTM encoder is expanded in the time dimension as shown in fig. 6.
For LSTM encoders, three attention layers are combined as one input attention layer to mine the correlation between time series, the three attention layers being:
(1) Mining multiple time series X within a deployment unit using a common attentiveness mechanism i And is referred to as a deployment unit attention layer.
(2) Mining multiple time series X within a host unit using improved attention mechanisms o And is referred to as the host unit attention layer.
(3) The autocorrelation of the time series of target instances is mined using an improved attention mechanism and is referred to as the autocorrelation attention layer.
1. The calculation formula of the deployment unit attention layer is as follows:
Figure SMS_5
Figure SMS_6
Figure SMS_7
wherein, the parameters are described as follows:
Figure SMS_8
/>
Figure SMS_9
summarizing, the inputs and outputs of the deployment unit attention layer are:
Figure SMS_10
2. the calculation formula of the host unit attention layer is as follows:
(1) First, a first-order timing correlation coefficient CORT (the first order temporal correlation coefficient) of each time series with respect to the target timing needs to be calculated.
In the first time sequence x in the host unit o,l For example, it is calculated to be in time series with the target Y T When the first order time sequence correlation coefficient of the sequence is needed to do some clipping processing to the two sequences.
First, x is o,l Is removed from the last value to obtain
Figure SMS_11
Figure SMS_12
/>
Removing the hysteresis value Y of the target sequence at the time T T Is the first value of (1), to obtain
Figure SMS_13
Figure SMS_14
Then calculate
Figure SMS_15
And->
Figure SMS_16
CORT absolute value C of (2) o,l
Figure SMS_17
This absolute value is taken as the timing x o,l And a static temporal correlation weight over time for the target sequence. The CORT calculating method comprises the following steps:
Figure SMS_18
wherein S is 1 ,S 2 Two time sequences of length q, S 1,t ,S 2,t Respectively S 1 ,S 2 The value at time t.
Finally, the static time correlation weights of all the time sequences and the target time sequences in the host unit can be obtained and combined into a vector C out
Figure SMS_19
(2) The attention weight is calculated using common attention mechanisms.
Or in the first time sequence x o,l For example, the attention weight at time t of this time series is calculated:
Figure SMS_20
Figure SMS_21
Figure SMS_22
the attention weights of the time series of the whole host unit at time t may form an attention weight vector g at time t t
Figure SMS_23
(3) Combining the time-dependent weight vectors C obtained in the above two steps out And an attention weight vector g t . Combining is accomplished by a linear transformation, and a new weight vector theta is obtained after transformation t
Figure SMS_24
Figure SMS_25
In order to normalize all elements of this vector, a softmax function is used to map to obtain the normalized weight value of the first time series in the host unit at time t
Figure SMS_26
/>
Figure SMS_27
(4) Obtaining a value vector of the weighted host unit time sequence at the time t:
normalized weights obtained by the previous step
Figure SMS_28
This weight is multiplied by the value of the corresponding first host unit time series at time t to obtain a weighted value. The time series of all host units at time t can form a vector +.>
Figure SMS_29
Figure SMS_30
In summary, the input and output of the host unit attention layer is:
Figure SMS_31
3. the method for calculating the autocorrelation attention layer is as follows:
(1) Similar to the host unit attention layer, correlation coefficients between target time series within different time windows are calculated first. First it is necessary to calculate the target time series Y ending with the instant r r And a target time sequence Y ending with a time T T CORT coefficient C between a,r
C a,T =||CORT(Y T ,Y r )||
Then the CORT coefficients of the corresponding target time series at each instant in the time window T may form an autocorrelation vector C of length T auto
Figure SMS_32
(2) The attention weight is calculated using common attention mechanisms.
Figure SMS_33
Figure SMS_34
/>
Figure SMS_35
Figure SMS_36
(3) Combining the time-dependent weight vectors C obtained in the above two steps auto And an attention weight vector mu t . Converting two weight vectors into one weight vector phi by linear transformation method t
Figure SMS_37
Figure SMS_38
/>
Figure SMS_39
(4) Obtaining a weighted target time series vector:
normalized weights obtained in the previous step
Figure SMS_40
The degree of influence of the value of the time r of the target time sequence on the value of the time T, i.e. the correlation of the target time sequence itself with itself at different times, is described within a time window T. Weight of->
Figure SMS_41
For Y t Weighting the value of the inner r moment to obtain an output vector of t moment>
Figure SMS_42
Figure SMS_43
Figure SMS_44
Finally, the output vectors of the three attention layers are combined as the input vector of the encoder LSTM at time t, wherein
Figure SMS_45
/>
Figure SMS_46
2. Decoder LSTM
Define decoder LSTM as:
Figure SMS_47
let h' t As a hidden state vector of LSTM of the decoder at time t, let the number of vector elements be n. It should be noted that it is hidden from the encoder LSTMHiding state vector h t With differences. We spread this decoder LSTM in the time dimension as shown in fig. 6.
Integrating a time-dependent attention layer into the LSTM, wherein the weight calculation method of the time-dependent attention layer is as follows:
Figure SMS_48
Figure SMS_49
Figure SMS_50
Figure SMS_51
/>
The t moment normalization weight obtained by the above method
Figure SMS_52
Can be combined with h p Weighted summation to obtain a context vector +.>
Figure SMS_53
Figure SMS_54
Context vector c at time t t And a target time-series value y t Merging and obtaining decoder input at time t by a linear transformation
Figure SMS_55
Figure SMS_56
As is described in the foregoing description of the invention,
Figure SMS_57
is input to the decoder LSTM for operation. I.e. it and the hidden state vector h 'of the current time instant t' t Decoder LSTM hidden state h 'which is simultaneously used for updating t+1 at the next time' t+1
Figure SMS_58
Continuously cycling the above updating process until the time T is over, and obtaining the hidden state vector h 'at the time T' T And cell state vector c T
Figure SMS_59
Finally, the method for calculating the predicted value of the T+1 moment output by the decoder LSTM is as follows:
Figure SMS_60
Figure SMS_61
in summary, the input and output of the time-dependent attention layer is summarized as follows:
Figure SMS_62
finally, MSE (mean square error) is used as a training criterion for the model:
Figure SMS_63
a gradient descent algorithm is used to train this model to determine the specific values of the weight coefficient matrix/vector/bias of the neural network.
The technical scheme of the invention is further described below with reference to specific examples.
The example adopts cluster data cluster-trace published in 2018 of the Aliba, randomly selects one of containers (id is c_66550) as a target example, and takes CPU utilization time sequence data of the container as resource time sequence data of the target example. This target instance is found out as an instance belonging to one deployment unit and as an instance belonging to one host unit, and the time series data of these instances are extracted, and finally processed into time series data with an interval of 300 seconds, respectively.
The time series of the same deployment unit obtained in the end is 33, which belong to 23 of a host, and the time series of the target instance is 1. These time series are time aligned and uniformly divided into three data sets: training set, validation set and test set. Wherein the training set has 10141 time points, the verification set 563 time points and the test set 564 time points. Each data set has the same number of time series.
The model has a number of super-parameters, a window size of t= {25,35,45,60}, a hidden state vector and a cell state vector size of the hidden layer of the encoder and decoder LSTM of m=n= {32,64,128}, using MSE and MAE (mean absolute value error) as error criteria, respectively, and a batch random gradient descent algorithm is used to optimize the training model, with a learning rate of 5e-4.
And finally, training the models by using a grid searching method, and taking the hyper-parameters with the best effect obtained by each model on the verification set as the optimal parameters of the models. Then predicting in the test set, and finally accumulating errors in the test set by using MSE.
In experiments, to distinguish LSTM predicted using single sequences from multiple sequences, the designation LSTM-Un was used for single sequences and LSTM-Mul was used for multiple sequences.
The experimental results are shown in the following table:
Figure SMS_64
Figure SMS_65
/>
experimental results show that the error of the model proposed by the invention is much smaller than that of the model with respect to the model with 3 single sequences or the model with 2 multiple sequences: wherein 98.26% better than the best VAR at MSE; under MAE, 74.40% better than the best VAR, has very high prediction accuracy, thus proving the effectiveness of the multi-time sequence prediction model based on the attention mechanism.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1. A method for predicting cluster resources based on an attention mechanism, comprising:
s1: taking the first hidden layer state at the last moment, all time series data belonging to one deployment unit with the target instance, all time series data belonging to one host unit with the target instance and the target time sequence at the historical moment as input of an input attention layer to obtain a first input vector;
the step S1 specifically comprises the following steps:
s11: taking the state of the first hidden layer at the last moment and all time series data belonging to the same deployment unit with the target instance as the input of the attention layer of the deployment unit to obtain the output vector of the attention layer of the deployment unit;
s12: taking the state of the first hidden layer at the last moment and all time series data belonging to the same host unit as the target instance as the input of the attention layer of the host unit to obtain an attention output vector of the host unit;
s13: taking the state of the first hidden layer at the last moment and the target time sequence at the historical moment as the input of the autocorrelation attention layer to obtain an autocorrelation attention layer output vector;
s14: merging the deployment unit attention layer output vector, the host unit attention output vector, and the autocorrelation attention layer output vector as a first input vector;
S2: inputting the first input vector to an LSTM coder to obtain a current first hidden layer state;
s3: inputting the current first hidden layer state and the second hidden layer state at the previous moment into a time correlation attention layer to obtain a context vector;
s4: inputting the context vector, the second hidden layer state at the last moment and the target time sequence at the historical moment to an LSTM decoder to obtain the current second hidden layer state;
s5: and linearly transforming the current second hidden layer state and the context vector to obtain a predicted value.
2. The method for predicting cluster resources based on an attention mechanism according to claim 1, wherein step S11 specifically includes:
calculating a first attention weight based on the first hidden layer state at the previous time and all time series data belonging to the same deployment unit as the target instance;
calculating a normalized deployment unit attention weight using a softmax function based on the first attention weight;
calculating a deployment unit attention layer output vector based on the first hidden layer state at the previous time, all time series data belonging to one deployment unit with the target instance and the normalized deployment unit attention weight;
The step S12 specifically includes:
calculating a first-order time sequence correlation coefficient of each time sequence data belonging to one host unit with the target instance relative to the historical target time sequence, and obtaining static time correlation weights of all time sequences and the historical target time sequence in the corresponding host unit;
calculating a second attention weight based on the first hidden layer state and all time-series data of the target instance belonging to one host unit at the previous time;
obtaining the attention weight of the host unit based on the static time correlation weight and the second attention weight, and normalizing to obtain the normalized attention weight of the host unit;
calculating a host unit attention output vector based on the first hidden layer state at the previous time, all time series data belonging to one host unit with the target instance, the target time sequence at the historical time and the normalized host unit attention weight;
the step S13 specifically includes:
calculating correlation coefficients between historical moment target time sequences in different time windows, and obtaining corresponding autocorrelation weights;
calculating a third attention weight based on the first hidden layer state at the previous time and the target time sequence at the different historical time;
Obtaining the attention weight of the autocorrelation unit based on the autocorrelation weight and the third attention weight, and normalizing to obtain the attention weight of the normalized autocorrelation unit;
an autocorrelation attention layer output vector is calculated based on the first hidden layer state at the last time, the target timing at the historical time, and the normalized autocorrelation unit attention weights.
3. The method for predicting cluster resources based on an attention mechanism according to any one of claims 1 to 2, wherein step S3 specifically includes:
calculating the time attention layer weight based on the state of the second hidden layer at the previous moment, and normalizing to obtain the normalized time attention layer weight;
a context vector is calculated based on the current first hidden layer state and the normalized temporal attention layer weight.
4. A cluster resource prediction apparatus based on an attention mechanism, comprising:
the first input vector calculation module is used for taking the first hidden layer state at the last moment, all time series data belonging to one deployment unit with the target instance, all time series data belonging to one host unit with the target instance and the target time sequence at the historical moment as input of the input attention layer to obtain a first input vector;
The first input vector calculation module specifically includes:
the first computing unit is used for taking the state of the first hidden layer at the previous moment and all time series data which belong to the same deployment unit as the target instance as the input of the attention layer of the deployment unit to obtain the output vector of the attention layer of the deployment unit;
the second calculating unit is used for taking the state of the first hidden layer at the last moment and all time series data which belong to the same host unit as the target instance as the input of the attention layer of the host unit to obtain an attention output vector of the host unit;
the third calculation unit is used for taking the first hidden layer state at the last moment and the target time sequence at the historical moment as the input of the autocorrelation attention layer to obtain an autocorrelation attention layer output vector;
a merging unit configured to merge the deployment unit attention layer output vector, the host unit attention output vector, and the autocorrelation attention layer output vector as a first input vector;
the first hidden layer state calculation module is used for inputting the first input vector to an LSTM coder to obtain a current first hidden layer state;
the context vector calculation module is used for inputting the current first hidden layer state and the second hidden layer state at the previous moment into the time correlation attention layer to obtain a context vector;
The second hidden layer state calculation module is used for inputting the context vector, the second hidden layer state at the last moment and the target time sequence at the historical moment to the LSTM decoder to obtain the current second hidden layer state;
and the linear transformation module is used for carrying out linear transformation on the current second hidden layer state and the context vector to obtain a predicted value.
5. The attention mechanism based cluster resource prediction device of claim 4, wherein the first computing unit specifically comprises:
a first attention weight calculation subunit for calculating a first attention weight based on the first hidden layer state at the previous time and all time-series data belonging to the same deployment unit as the target instance;
a first normalized weight calculation subunit for calculating a normalized deployment unit attention weight using a softmax function based on the first attention weight;
the first attention layer output vector calculation unit is used for calculating an attention layer output vector of the deployment unit based on the first hidden layer state at the last moment, all time series data belonging to the same deployment unit with the target instance and the normalized attention weight of the deployment unit;
The second computing unit specifically includes:
the static time correlation weight calculation subunit is used for calculating a first-order time sequence correlation coefficient of each time sequence data which belongs to one host unit together with the target instance relative to the historical target time sequence, and obtaining static time correlation weights of all time sequences and the historical target time sequence in the corresponding host unit;
a second attention weight calculation subunit for calculating a second attention weight based on the first hidden layer state and all time-series data of the target instance belonging to one host unit at the previous time;
the second normalization weight subunit is used for obtaining the attention weight of the host unit based on the static time correlation weight and the second attention weight, and normalizing the attention weight to obtain the attention weight of the normalized host unit;
the second attention layer output vector calculation unit is used for calculating the attention layer output vector of the host unit based on the state of the first hidden layer at the last moment, all time sequence data belonging to one host unit with the target instance, the target time sequence of the historical moment and the attention weight of the normalized host unit;
the third computing unit specifically includes:
The self-correlation weight calculation subunit is used for calculating correlation coefficients among the historical moment target time sequences in different time windows and obtaining corresponding self-correlation weights;
a third attention weight calculation subunit, configured to calculate a third attention weight based on the first hidden layer state at the previous time and the target time sequence at the different historical time;
the third normalization weight subunit is configured to obtain an attention weight of the autocorrelation unit based on the autocorrelation weight and the third attention weight, and normalize the attention weight of the autocorrelation unit to obtain a normalized attention weight of the autocorrelation unit;
the third attention layer output vector subunit calculates an autocorrelation attention layer output vector based on the first hidden layer state at the last time, the target timing at the historical time, and the normalized autocorrelation unit attention weight.
6. The attention mechanism-based cluster resource prediction device according to any one of claims 4 to 5, wherein the context vector calculation module specifically includes:
the fourth normalization weight subunit is used for calculating the time attention layer weight based on the state of the second hidden layer at the previous moment, and normalizing the time attention layer weight to obtain the normalized time attention layer weight;
A context vector calculation subunit for calculating a context vector based on the current first hidden layer state and the normalized temporal attention layer weight.
CN201910413227.1A 2019-05-17 2019-05-17 Cluster resource prediction method and device based on attention mechanism Active CN110222840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910413227.1A CN110222840B (en) 2019-05-17 2019-05-17 Cluster resource prediction method and device based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910413227.1A CN110222840B (en) 2019-05-17 2019-05-17 Cluster resource prediction method and device based on attention mechanism

Publications (2)

Publication Number Publication Date
CN110222840A CN110222840A (en) 2019-09-10
CN110222840B true CN110222840B (en) 2023-05-05

Family

ID=67821396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910413227.1A Active CN110222840B (en) 2019-05-17 2019-05-17 Cluster resource prediction method and device based on attention mechanism

Country Status (1)

Country Link
CN (1) CN110222840B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909046B (en) * 2019-12-02 2023-08-11 上海舵敏智能科技有限公司 Time-series abnormality detection method and device, electronic equipment and storage medium
CN111638958B (en) * 2020-06-02 2024-04-05 中国联合网络通信集团有限公司 Cloud host load processing method and device, control equipment and storage medium
CN112863695B (en) * 2021-02-22 2024-08-02 西京学院 Quantum attention mechanism-based two-way long-short-term memory prediction model and extraction method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017097693A (en) * 2015-11-26 2017-06-01 Kddi株式会社 Data prediction device, information terminal, program, and method performing learning with data of different periodic layer
CN107730087A (en) * 2017-09-20 2018-02-23 平安科技(深圳)有限公司 Forecast model training method, data monitoring method, device, equipment and medium
CN108182260A (en) * 2018-01-03 2018-06-19 华南理工大学 A kind of Multivariate Time Series sorting technique based on semantic selection
CN109685252A (en) * 2018-11-30 2019-04-26 西安工程大学 Building energy consumption prediction technique based on Recognition with Recurrent Neural Network and multi-task learning model
CN109740419A (en) * 2018-11-22 2019-05-10 东南大学 A kind of video behavior recognition methods based on Attention-LSTM network

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10565305B2 (en) * 2016-11-18 2020-02-18 Salesforce.Com, Inc. Adaptive attention model for image captioning
CN108304846B (en) * 2017-09-11 2021-10-22 腾讯科技(深圳)有限公司 Image recognition method, device and storage medium
CN108154435A (en) * 2017-12-26 2018-06-12 浙江工业大学 A kind of stock index price expectation method based on Recognition with Recurrent Neural Network
CN108090558B (en) * 2018-01-03 2021-06-08 华南理工大学 Automatic filling method for missing value of time sequence based on long-term and short-term memory network
CN108491680A (en) * 2018-03-07 2018-09-04 安庆师范大学 Drug relationship abstracting method based on residual error network and attention mechanism
CN108804495B (en) * 2018-04-02 2021-10-22 华南理工大学 Automatic text summarization method based on enhanced semantics
CN109697304A (en) * 2018-11-19 2019-04-30 天津大学 A kind of construction method of refrigeration unit multi-sensor data prediction model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017097693A (en) * 2015-11-26 2017-06-01 Kddi株式会社 Data prediction device, information terminal, program, and method performing learning with data of different periodic layer
CN107730087A (en) * 2017-09-20 2018-02-23 平安科技(深圳)有限公司 Forecast model training method, data monitoring method, device, equipment and medium
CN108182260A (en) * 2018-01-03 2018-06-19 华南理工大学 A kind of Multivariate Time Series sorting technique based on semantic selection
CN109740419A (en) * 2018-11-22 2019-05-10 东南大学 A kind of video behavior recognition methods based on Attention-LSTM network
CN109685252A (en) * 2018-11-30 2019-04-26 西安工程大学 Building energy consumption prediction technique based on Recognition with Recurrent Neural Network and multi-task learning model

Also Published As

Publication number Publication date
CN110222840A (en) 2019-09-10

Similar Documents

Publication Publication Date Title
CN109587713B (en) Network index prediction method and device based on ARIMA model and storage medium
CN110222840B (en) Cluster resource prediction method and device based on attention mechanism
CN110941928B (en) Rolling bearing residual life prediction method based on dropout-SAE and Bi-LSTM
CN110309603B (en) Short-term wind speed prediction method and system based on wind speed characteristics
CN101620045B (en) Method for evaluating reliability of stepping stress quickened degradation experiment based on time sequence
CN109886464B (en) Low-information-loss short-term wind speed prediction method based on optimized singular value decomposition generated feature set
CN112434848B (en) Nonlinear weighted combination wind power prediction method based on deep belief network
CN105071983A (en) Abnormal load detection method for cloud calculation on-line business
US10161269B2 (en) Output efficiency optimization in production systems
CN112417028A (en) Wind speed time sequence characteristic mining method and short-term wind power prediction method
CN104199870A (en) Method for building LS-SVM prediction model based on chaotic search
CN112766603A (en) Traffic flow prediction method, system, computer device and storage medium
CN116169670A (en) Short-term non-resident load prediction method and system based on improved neural network
CN113837434A (en) Solar photovoltaic power generation prediction method and device, electronic equipment and storage medium
CN115829157A (en) Chemical water quality index prediction method based on variational modal decomposition and auto former model
CN114415488A (en) Atomic clock error data anomaly detection and correction method and system
CN112612781A (en) Data correction method, device, equipment and medium
CN110766215A (en) Wind power climbing event prediction method based on feature adaptive selection and WDNN
CN110222386A (en) A kind of planetary gear degenerate state recognition methods
Zhu et al. Wind Speed Short-Term Prediction Based on Empirical Wavelet Transform, Recurrent Neural Network and Error Correction
CN116192665B (en) Data processing method, device, computer equipment and storage medium
CN116992757A (en) Wellhead pressure prediction method and device based on deep learning and rolling optimization
CN116011655A (en) Load ultra-short-term prediction method and system based on two-stage intelligent feature engineering
CN114971062A (en) Photovoltaic power prediction method and device
CN115048856A (en) Method for predicting residual life of rolling bearing based on MS-ALSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant