CN108170529A

CN108170529A - A kind of cloud data center load predicting method based on shot and long term memory network

Info

Publication number: CN108170529A
Application number: CN201711433325.9A
Authority: CN
Inventors: 毕敬; 许伯睿; 乔俊飞
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2017-12-26
Filing date: 2017-12-26
Publication date: 2018-06-15

Abstract

The present invention discloses a kind of cloud data center load predicting method for being based on shot and long term memory network (LSTM), it is intended to solve the problems, such as that the limited computing resource of cloud data center can not obtain optimal utilization.This method makes training sample and test sample based on the magnanimity historical record of cloud data center, and the neural network that another structure one is formed by connecting by LSTM units, constantly bulk input training sample, obtains output valve；The optimum algorithm of multi-layer neural network uses newer adaptability moments estimation method, constantly updates the parameter in each unit by repetitive exercise so that the overall situation is optimal；After training, test sample need to only be inputted to network with regard to next predicted value of the sample sequence can be obtained；If constantly update list entries with predicted value, additionally it is possible to obtain the prediction value sequence of following a period of time.

Description

A kind of cloud data center load predicting method based on shot and long term memory network

Technical field

Field of cloud computer technology of the present invention more particularly to a kind of cloud data center load based on shot and long term memory network are pre- Survey method.

Background technology

Cloud computing is the increase, use and delivery mode of the related service based on internet, is usually directed to and passes through internet Dynamic easily extension and the often resource of virtualization be provided, it can be by network to magnanimity, the user of different priority levels On-demand computing resource and result of calculation are provided.The pattern that a kind of pay-for-use of resource generally use in cloud data center uses Dynamically provide a user service.

According to USA National Institute of Standard and Technology (National Institute of Standards and Technology, NIST) definition, this pattern provides network access available, easily, on-demand, into configurable Computing resources shared pool (resource includes network, server, storage, application software, service), and these resources can be quick It provides.But it is also not enough, because the demand that information-intensive society calculates large-scale data is huge, and still that these resources often again more There is the gesture of continuous growth, often will appear the situation of extensive task requests " swaring forward ".In this case, in cloud data The heart often causes calculating speed to slow down because failing computing resource doing optimum allocation, and working efficiency is low, large quantities of long-term places of request In wait state, the abnormal conditions such as the energy wastes.This not only results in calculating service profit and reduces, and input-output ratio declines, also Prestige and public praise can be damaged.To ensure that cloud data center works normally always, the efficient and whole calculating tasks in ground of guaranteeing both quality and quantity are appointed Business scheduling is essential.It is effectively dispatching premise is that estimating the task load of each priority in following a period of time in advance With unit task average resource applications, the Accurate Prediction to this two indexs is correct allotment computing resource, complete so as to reach The powerful guarantee of office's optimization.

Shot and long term memory-type network (Long Short-Term Memory, LSTM), is time recurrent neural network (RNN) A kind of special shape.Different from feedforward network, it is constantly using the output of itself as input, with the increase of recurrence number, Network can influence the judgement of t-th of time step in the judgement of the t-1 time step, this feedback cycle in actual life " by It is known to push away it is found that pushing away again unknown " derivation behavior be consistent in itself, this method has memory.

For " distance learning energy force difference " existing for traditional RNN (due to there are gradient disappearance, when learning information and prediction bits Interval widen, it is apparent that RNN infers that ability declines) problem, LSTM improves each neural unit, devised cellular State C and increase, " door " structure of removal information to cell state ability, internal structure are as shown in Figure 2.One door is by one A sigmoid layers of h and pointwise multiplication operation composition, the purpose is to screen letter with the weight of sigmod layers of generation Breath, control data flowing, determines whether information passes through.Door there are three being gathered around in one LSTM unit：It is that " forgetting door " (determines respectively Abandon which of C information), " input gate " (determining what information is inserted into C) and " out gate " (determines what is exported Value).By Fig. 2 it is evident that：In t moment, LSTM units have sample X_t, upper unit output h_t-1, a upper unit it is cellular State C_t-1Three inputs, these inputs form new cell state C by the processing of door_tH is exported with new unit_tAnd it flows to Next unit, several such units, which join end to end, is formed shot and long term memory network.As training sample continually enters, net Network constantly learns, and extracts its rule and feature, and the weight of all doors also can constantly be adjusted by optimization method, is finally reached Global optimum realizes Accurate Prediction.

Continuous development and burning hot, more and more Development Frameworks appearances, for learner and developer with artificial intelligence The Tensorflow that provides convenience be Google release the open source software library for numerical calculation, it by user design calculation Method is described with a width data flow diagram.Multidimensional data is flowed and can dynamically be adjusted as " tensor " (tensor), especially suitable Close the structure of neural network, training and the applications such as classification, reasoning.Tensorflow provides abundant call method, can be automatic It differentiates and derivative, the calculating details that complexity need not be write by making user can build computation model；When program starts to perform, Node in model can be automatically assigned in the equipment such as CPU, GPU by Tensorflow, realized process optimization, sent out to greatest extent Wave the calculating potential of equipment；It is also equipped with portability, can trained model be moved on to mobile phone without changing code, serviced It is run in device or other clusters, the features such as user group is huge, is one of current most popular artificial intelligence development platform.

In summary several sections of introductions and analysis to the relevant technologies are pair in the load estimation question essence of cloud data center The prediction of time series (task requests amount sequence and unit request resource bid amount sequence).In view of cloud data center presently, there are Resource can not optimal assignment problem, need propose one kind using LSTM neural networks as model, by by a large amount of history data sets Into each priority level task requests amount sequence and unit request resource bid amount sequence train and complete optimization, it is accurate pre- This two refer to calibration method in survey following a period of time.

Invention content

The purpose of the present invention is to provide a kind of operating in cloud data center, the pre- measuring and calculating based on shot and long term memory network Method, the scheduling for computing resource provide sufficient and accurate information support, scheduler module are enable to shift to an earlier date analytic trend and is planned The dispatching method that will be taken well, is ready early, ensures the steady Effec-tive Function of cloud data center.Index to be predicted can be divided into Two classes：Task requests total amount, unit are；Unit asks resource bid amount, no unit.Since every class can be divided into three kinds again, point Three kinds of priority levels of task requests are not corresponded to：It is rudimentary, intermediate and advanced, so a shared 2*3=6 kinds index.

According to an aspect of the invention, there is provided data set production method and a kind of structure of LSTM neural network models Method, including reading data from file；These data are handled, by their " deformations " into the number for meeting Tensorflow standards According to model, the data set being made of time series is made；Initialize the weight matrix of LSTM mode inputs layer and output layer, biasing Matrix；The hyper parameter and weight set according to sample batch quantity, the network number of plies, learning rate, time step etc., bias matrix Build LSTM models, acquisition of the Definition Model to data sample, processing, the way of output.

According to another aspect of the present invention, provide it is a kind of data sample input LSTM network models, allow its learn sequence Row rule adjusts each door weight to optimize global training method and iterative rolling forecast method, including with defeated Training sample (being referred to as " tensor " in Tensorflow frames, this statement hereafter will be used often) the counting loss letter entered Number；Make independent variable adaptability moments estimation (Adaptive moment estimation, Adam) algorithm with loss function to carry out Global optimization；Repetitive exercise repeats above two steps and periodically preserves the model adjusted and and then using test sample to be defeated Enter, the predicted value of following n period index is exported with " final version " network model iteration.

In conclusion a kind of cloud data center load predicting method based on shot and long term memory network, includes the following steps：

S1, with storage data creating historical time sequence hereof and data set；

S2, structure shot and long term neural network model；

S3, training LSTM networks：Training sample, counting loss letter are iteratively imported into shot and long term neural network model Number, and global optimization is carried out based on this, the feature of training sample and the relationship of numerical value and sequential are constantly extracted, until repeatedly In generation, terminates；

S4, after the training stage, to shot and long term neural network model；Middle importing test sample, the following number of iteration output The numerical value of a time step index to be predicted forms predicted value time series.

Preferably, data set is divided into two parts in step 1：

First part is " tensor subsets ", is replaced with " X subsets "：The tensor be a time series, be training or The minimum unit of LSTM networks is inputted during test, length is num_step, and construction method is that iteratively taking-up connects from matrix Continuous data, are converted to list and insert：

Second part is " sub-set of tags ", is replaced, is made of the corresponding labels of each tensor, Y subsets are every with " Y subsets " Sequential value on a position is the subsequent value of X subset corresponding position sequential values：

Preferably, step 2 is specially：

Step 2.1, each this input of lot sample LSTM networks, each unit can produce after screening and handling sample in network Raw two values are simultaneously passed to next unit：One is referred to as " cell state ", abbreviation C, the other is " unit output ", abbreviation h； LSTM networks have num_unit unit, have batch_size sample sequence in every batch of sample, therefore a collection of sample can be produced successively Raw num_unit*batch_size C value and h values, i.e.,：

Step 2.2 removes all c value sequences, retains all h value sequences, generates a new matrix；

This new matrix is multiplied by step 2.3 with output layer weight matrix w_out, then with output layer bias matrix b_out It is added, obtains pred.

Preferably, step 3 is specially：

1) first batch_size sample is taken out from X, two subsets of Y, is respectively designated as x and y；

2) x is inputted into LSTM models, acquires pred matrixes；

3) counting loss amount tensor loss.

4) using loss amount loss and learning rate lr as parameter, adaptability moments estimation (Adam) optimizer, meter are initialized The gradient of loss is calculated, then gradient is applied on variable, updates the weight of door in all units of LSTM networks, returns to one A tensor that result is exported comprising training operation；

5) batch_size sample of next group is taken out from X, Y subsets, updates x and y；

6) such as x, y is not sky, then repeatedly step 2- steps 5；Otherwise start next iteration.

Preferably, in step 3, in the LSTM network training stages, after giving learning rate lr and calculating loss amount loss, It is adjusted using adaptability moments estimation (Adaptive moment estimation) method, updates and forget in LSTM network models The weight of door, input gate and out gate.The present invention uses novel Neural network optimization：Adam algorithms substitution it is traditional with Machine gradient descent method.It by calculate single order moments estimation and the second order moments estimation of gradient and for different parameter designing independence from Adaptability learning rate, is suitble to solve with large-scale data and parameter, and the problem of high is required to pace of learning.It is calculated with other optimizations Method, if stochastic gradient descent (SGD) method, momentum method are compared with AdaGrad methods, faster, learning effect becomes apparent from its convergence rate, Learning rate disappearance can be also corrected, is absorbed in local optimum, loss function fluctuates the problems such as big.Because to the prediction of H to the need of speed Accuracy rate of summing is very high, so Adam algorithms are most preferred.

Adam algorithms are related to following constant：α is step factor；β₁It is single order moments estimation attenuation rate；β₂It is that second moment is estimated Count attenuation rate；ε is very small, close to zero number.In Tensorflow frames, the default setting difference of this four constants It is 0.001,0.9,0.999,10E-8.

If random targets function is f (θ), m_tIt is the first moment vector of t moment parameter θ；v_tIt is the second moment of t moment parameter θ Vector；Initial value is all zero.Algorithm is that iteration carries out, and each iteration is all along with the update of θ.When θ does not restrain, cycle changes It is operated below substitute performance.It is specific as follows：

1) time step t adds 1, t=t+1.

2) gradient of the t moment object function to parameter θ is obtained,

3) update first order and second order moments estimation, m_t←β₁·m_t-1+(1-β₁)·g_t； v_t←β₂·v_t-1+(1-β₂)· g_t ²。

4) the first order and second order moments estimation after drift correction is calculated,

5) undated parameter vector,

This iterative process is until θ_tUntil convergence.Wherein, m_tAnd v_tCan respectively it regard as to g_tAbsolute value it is expected and g_tSquare The estimation of absolute value.In Tensorflow, it is known that learning rate just builds optimizer using Adam algorithms, trains each door The optimal weight of structure.

Beneficial effects of the present invention are as follows

Technical solution of the present invention can solve in cloud data center it is of all categories request reach irregularities and unit The problem that the otherness of request resource bid is brought to request scheduling so that cloud data center " can provide for a rainy day ", in advance very Later variation tendency is estimated for a long time, and then determines resource allocation mode and dispatching algorithm so that cloud data center is always It can be operated with state optimization, the most efficient mode of the utilization of resources.By the program and other similar scheme lateral comparisons, training institute It takes time shorter, learning efficiency higher, moreover it is possible to evade common local optimum, the defects of gradient disappears, and convergence rate is slow, these It is all of practical significance very much to the cloud data center for extremely emphasizing efficiency.

Description of the drawings

The specific embodiment of the present invention is described in further detail below in conjunction with the accompanying drawings：

Fig. 1 is the flow chart of the cloud data center load predicting method based on shot and long term memory network；

Fig. 2 is the internal structure chart of shot and long term memory network basic unit；

Fig. 3 is the complete training flow of LSTM networks；

Fig. 4 is the flow that prediction value sequence is obtained with test sample.

Specific embodiment

The implementing procedure and points for attention of the present invention are further elaborated below.As it was noted above, in cloud data center Index to be predicted shares six kinds, but most contents are suitable for predicting this six kinds of indexs in algorithm.If a certain step is to being directed to Different types of premeasuring has different treating methods, has special instruction.Algorithm is write with python language, imports Tensorflow, data analysis packet pandas, numerical computations expanding packet numpy and for drawing image matplotlib.pyplot.In this part, some index to be predicted is referred to always with " H ", the prediction of remaining five kinds of index Method is substantially the same therewith.

S1, with storage data creating historical time sequence hereof and data set；

Historical data is frequently stored in the file of csv forms.To predict H, the first step is exactly the reading H from file Historical data forms a time series.The total data of six kinds of indexs is by calling the method in pandas to obtain in csv files , the historical time sequence of H is created with numpy, and parameter is whole historical datas of H.

Start the making of data set below.Since the historical data of these magnanimity can be there are numerical value polarization, dimension is not With situations such as even gap is huge, and these will cause learning process to restrain not restrain even slowly, in order to which data is made more " to put down It " and under identical dimension analyzes, needs that initial data is normalized.First kind index (task requests amount) value Distribution it is more random, there are many influence factor, not necessarily approximate Gaussian distribution, therefore logarithm normalization method is used to it：Sample data =denary logarithm is taken to initial data；The distribution of the second class index (unit task average resource applications) value is approximate Gaussian Profile uses it standard deviation normalization method to be suitble to the most：Sample data=(initial data-data grand mean)/data are total Standard deviation.The subsequent step of algorithm will often use matrix calculating, be exactly to carry out increasing dimension to the sample sequence of H in next step, become Shape is the two-dimensional matrix of [n, 1]：

[[t₀] [t₁] ... [t_n-1]]^T。

To achieve the purpose that " in training optimizing ", data set is needed to be divided into two parts：First part is " tensor Collection " (is replaced) with " X subsets "：In this algorithm, the essence of tensor is also a time series, is inputted when being training or test The minimum unit of LSTM networks, length are num_step, and construction method is that continuous data is iteratively taken out from matrix, conversion Into list and insert：

Second part is " sub-set of tags " (being replaced with " Y subsets "), is made of the corresponding labels of each tensor.Y subsets Sequential value on each position is the subsequent value of X subset corresponding position sequential values：

S2, structure shot and long term neural network model, can be correct, effectively processing input and calculating output；

The partial parameters of LSTM models need pre-set value, they are referred to as hyper parameter.This algorithm is related to super Parameter has eight, is learning rate lr, input layer dimension input_size, output layer dimension output_size, single batch respectively Number of training batch_size, individual layer LSTM unit numbers num_unit, time step number num_step, repetitive exercise number Epochs and test sample number n_train.Since the prediction category single factor test to H future values is predicted, so LSTM networks The number of plies is 1, input_size and output_size is 1.Training sample is in large scale, and tensor is often thousands of.It is single Although the secondary sample for being input into network excessively can reduce iterations, convergence rate is slow, is easily trapped into local optimum；It is input into very few Although sample can accelerate convergence rate and improve precision, too small batch_size can not embody the advantage of parallel computation and Training stability is poor.So batch_size should not be too large or too small, the sample set of small batch is optimal.

In the usual course, the weight of each LSTM basic units input layer and output layer be set as meeting Gaussian Profile with Machine number, amount of bias are set as constant, and such as 0.1.Input layer weight matrix w_input, output layer weight matrix w_output, input Layer bias matrix b_input and output layer bias matrix b_output can be built according to this.

The construction work of model is encapsulated in a function, it is therefore an objective to describe a LSTM network using Tensorflow Abstract data flow diagram provides " raw material " for subsequent step.Due to the single batch sample needed for training stage and test phase Number is different, so the parameter of function is set as batch, if batch=batch_size, illustrates the network for training；If Batch=1 illustrates the network for testing (one tensor sample of test input every time).

Realize that this algorithm need to only use basic LSTM models, so it is basic to enable each network element BasicLSTMCell.Such cell mono- shares num_unit, without advanced variants such as clipping, peep-hole.Net Network is individual layer, therefore does not introduce the DropoutWrapper that multilayer LSTM is usually used in preventing over-fitting.

According to the regulation of Tensorflow handbooks, network model needs when handling a collection of sample first will be continuous from X subsets It takes out, into one " vertical bar ", (shape is [batch_size* to the matrix " elongation " being made of batch_size sample sequence Num_step, 1]), composing training sample matrix train_matrix.

Training sample matrix cannot come into LSTM networks directly as input, need same input layer weight matrix w_input Multiplication is added again with input layer bias matrix b_input is converted to shape as [batch_size, num_step, num_unit] Matrix r eal_input after can really be inputted as network, and be passed to tensorflow's as parameter (purpose for calling tf.nn.dynamic_rnn methods is instructed with given input and initial model to tf.nn.dynamic_rnn methods Practice class RNN networks and return to training result).

Real_input=train_matrix* [w₀w₁...w_{num_unit-1}]+[b₀b₁..b_{num_unit-1}]^T

The final output of S2 is [batch_size, a 1] shape matrix (being referred to below with " pred "), by LSTM networks " predicted value of lower a moment " obtained after calculating the sample sequence of all inputs in the lot sample sheet is formed.Because in training shape Every batch of sample has batch_size sequence under state, therefore has batch_size element in pred.It is the calculating step of pred below Suddenly：

1) tf.nn.dynamic_rnn, is run, takes out second return item [batch_size, 2*num_unit] type square Battle array, when inputting network by each sample sequence in the lot sample sheet, c values and h values that all LSTM units generate form, knot Structure is shown below：

2), remove all c value sequences, retain all h value sequences, generate a new matrix；

3), this new matrix is multiplied with output layer weight matrix w_out, then with output layer bias matrix b_out phases Add, obtain pred.

Training to LSTM networks is a cyclic process, and cycle can all traverse entire training sample set every time, and training is every During a collection of sample will counting loss amount, then do a suboptimization with Adam algorithms.Cycle-index is represented with variable epoch, is somebody's turn to do Value is adjustable, and when having performed epoch cycle, network model just trains.

S3, training LSTM networks：Iteratively import training sample into this model, counting loss function, and as Basis carries out global optimization, constantly extracts the feature of training sample and the relationship of numerical value and sequential, until iteration terminates；

The network model function designed in S2 is called before training first.Training to LSTM networks is that iteration carries out, altogether Have epoch times, be substantially exactly the following cycle operated：

2) x is inputted into LSTM models, acquires pred matrixes；

3) counting loss amount tensor (being referred to below with " loss ").Forecasting problem is substantially a regression problem, so Mean Square Error (MSE, mean squared error) should be used.Using Mean Square Error counting loss amount：

Wherein, pred_iFor " predicted value " of i-th of sample sequence subsequent time of the batch, tag_iBe in " sub-set of tags " with Corresponding " label " sequence of the sample sequence (in Tensorflow frames, can be added and subtracted between various sizes of matrix).

In entire training process, intermediate result is periodically saved as into check point file (binary file, its handle Variable name is mapped to corresponding tensor values, extends entitled .ckpt) it is stored under specified path.The complete training stream of LSTM networks Journey, as shown in Figure 3.

S4, after the training stage, test sample, the following several time step fingers to be predicted of iteration output are imported into model Target numerical value forms predicted value time series.

Into after forecast period, input LSTM networks are the test samples for being different from training sample, and each sequence is only From in batch.In this way, h value matrixs only there are one, be exactly the sequence subsequent time after w_output and b_output " processing " Predicted value.If wanting to obtain the predicted value at later n moment by cycle tests, should take " step-by-step method "：Every time by upper one Predicted value is connected to behind sequence, is removed leading element and is formed new sequence, is inputted LSTM networks again, is obtained predicted value, so Cycle.If predicted value accuracy rate is not high, test is unsuccessful, can regularized learning algorithm rate, LSTM units number and iterations repeatedly Hyper parameters are waited, later repeatedly step S1-S4, until until the prediction result finally generated is satisfied with.

By repetitive exercise and adjustment, LSTM networks have been adjusted to best, can carry out time series forecasting：First according to path The model of newest preservation in newest .ckpt files is read out, batch_size is set as 1, and input " first survey Try data.It is assumed that predicted value sequence length is also num_step, the forecast of enough " long-range " is not only provided for scheduling of resource in this way, but also Convenient for predicted value with the comparison of actual value.Prediction is rolling and stepping, flow are as shown in Figure 4.

The precise degrees of predicted value can intuitively be obtained by the degree of fitting of two sequence figure lines in plane coordinate system.It is if smart Spend it is not high enough, then constantly adjustment hyper parameter, repeat S1-S4, until two figure line degrees of fitting are satisfactory.

Obviously, above-mentioned implementation process of the invention be only to clearly illustrate example of the present invention, and not be Restriction to embodiments of the present invention.For those of ordinary skill in the art, on the basis of the above description also It can make other variations or changes in different ways.Here all embodiments can not be exhaustive, it is every to belong to this The obvious changes or variations that the technical solution of invention is extended out are still in the row of protection scope of the present invention.

Claims

1. a kind of cloud data center load predicting method based on shot and long term memory network, which is characterized in that this method is included such as Lower step：

S1, with storage data creating historical time sequence hereof and data set；

S2, structure shot and long term neural network model；

S3, training LSTM networks：Iteratively import training sample into shot and long term neural network model, counting loss function, and Global optimization is carried out based on this, constantly extracts the feature of training sample and the relationship of numerical value and sequential, until iteration knot Beam；

S4, after the training stage, to shot and long term neural network model；Middle importing test sample, when iteration output is following several The numerical value of spacer step index to be predicted forms predicted value time series.

2. the cloud data center load predicting method based on shot and long term memory network as described in claim 1, which is characterized in that Data set is divided into two parts in step 1：

First part is " tensor subsets ", is replaced with " X subsets "：The tensor is a time series, is training or test When input LSTM networks minimum unit, length is num_step, and construction method is that consecutive numbers is iteratively taken out from matrix According to being converted to list and insert：

Second part is " sub-set of tags ", is replaced, is made of the corresponding labels of each tensor, each position of Y subsets with " Y subsets " The sequential value put is the subsequent value of X subset corresponding position sequential values：

3. the cloud data center load predicting method based on shot and long term memory network as claimed in claim 2, which is characterized in that Step 2 is specially：

Step 2.1, each this input of lot sample LSTM networks, each unit can generate two after screening and handling sample in network A numerical value is simultaneously passed to next unit：One is referred to as " cell state ", abbreviation C, the other is " unit output ", abbreviation h；LSTM Network has num_unit unit, has batch_size sample sequence in every batch of sample, therefore a collection of sample can be generated successively Num_unit*batch_size C value and h values, i.e.,：

This new matrix is multiplied by step 2.3 with output layer weight matrix w_out, then with output layer bias matrix b_out phases Add, obtain pred.

4. the cloud data center load predicting method based on shot and long term memory network as claimed in claim 3, which is characterized in that Step 3 is specially：

2) x is inputted into LSTM models, acquires pred matrixes；

3) counting loss amount tensor loss.

4) using loss amount loss and learning rate lr as parameter, adaptability moments estimation (Adam) optimizer is initialized, is calculated The gradient of loss, is then applied to gradient on variable, updates the weight of door in all units of LSTM networks, returns to one The tensor of output result is operated comprising training；

5. the cloud data center load predicting method based on shot and long term memory network as described in claim 1, which is characterized in that In step 3, in the LSTM network training stages, after giving learning rate lr and calculating loss amount loss, using adaptability moments estimation (Adaptive moment estimation) method adjusts, and updates in LSTM network models and forgets door, input gate and out gate Weight.