CN104021045A

CN104021045A - CPU load multi-step prediction method based on mode fusion

Info

Publication number: CN104021045A
Application number: CN201410183205.8A
Authority: CN
Inventors: 曹健; 杨定裕; 顾骅; 沈琪骏; 王烺
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2014-05-04
Filing date: 2014-05-04
Publication date: 2014-09-03

Abstract

A CPU load multi-step prediction method based on mode fusion includes the steps of firstly, dividing a time sequence datum to form a set of multiple data modes, and counting the number of all the data modes; secondly, setting a filtering factor alpha for all the obtained modes and the number to filter out some unfrequent modes; thirdly, combining some modes with small differences into some universal tendency modes, conducting matching according to the universal tendency modes, measuring the direction distance between modes through the Hamming distance in the matching process, and measuring the actual distance through the Euclidean distance; fourthly, conducting multi-step prediction according to last values of the modes through the average rule strategy or the average drop strategy after some approximate modes are found; finally, guiding prediction through multiple mode lengths, conducting fusion according to prediction values of all the mode lengths, conducting synthesizing through the Adaboost algorithm, and obtaining the final result. The CPU load multi-step prediction method has the advantages of being high in accuracy and high in reliability.

Description

Cpu load multistep forecasting method based on schema merging

Technical field

The present invention relates to server cpu load electric powder prediction, be specifically related to a kind of cpu load multistep forecasting method based on schema merging.

Background technology

In distributed system, available resource is time dependent, and dispatching system also needs to make corresponding variation simultaneously.Due to geographic distribution situation, in reality, the data of monitoring and collection resource distribute to exist and postpone, and are difficult to the real-time current available resources of obtaining, and therefore can facilitate resource management and scheduling by performance prediction.

The monitoring of cpu load is the necessary condition of successful operation of application program with prediction, the performance state that cpu load can display device.If cpu load is too high, the performance of server will seriously reduce.The server exception state of the cpu load of high load capacity can cause system crash, and monitoring can help keeper to take corresponding countermeasure with prediction cpu load effectively, as closes or re-launching applications upgrading hardware etc.

The cpu load that studies have shown that in the past can be recorded as a time series data, by time series forecasting algorithm, cpu load is predicted, but these models can not effectively be supported multi-step prediction.Multi-step prediction has more challenge than Single-step Prediction, also more meaningful.Multi-step prediction can be known the trend of cpu load in the longer time in the future, can to keeper and dispatching system time enough to go to process anomalous event.

Existing multistep forecasting method is based on iteration Single-step Prediction, and this method is more applicable to the less prediction of step number, if but step number is larger, there will be the increasing problem of skew that predicts the outcome.Occur error as several steps above predict the outcome, prediction so below will be difficult to estimate exact value.

In recent years, in forecasting process, the mode based on pattern match also has popular, and they are to carry out similarity between metric data by the Euclidean distance between computation schema and pattern.The too matching of this matching way, is unable to estimate directivity or the tendency of pattern, has certain matching error, cannot accurately estimate to predict the outcome.

Summary of the invention

The object of the present invention is to provide a kind of cpu load multistep forecasting method based on schema merging, the invention provides the method for server cpu load being carried out to long-term forecasting, the method precision of prediction is high, has advantages of that accuracy is high, reliability is high.

For achieving the above object, the invention provides a kind of cpu load multistep forecasting method based on schema merging, comprise the following steps:

Step 1: pattern extraction

A time series data is cut into the set of multiple data patterns, and adds up the number of each data pattern;

Step 2: mode filtering

Obtain all patterns and number through step 1, filter the pattern that some seldom occur, to these pattern statistics, sort from big to small according to number, a given filterable agent α, makes the pattern after filtering can cover most of pattern;

Step 3: schema merging, coupling

The pattern being more or less the same for some patterns, be merged into some general Trend Patterns, and mate according to these general Trend Patterns, in matching process, adopt Hamming distance to carry out the direction distance between measurement pattern and pattern, then measure actual range by Euclidean distance;

Step 4: pattern weight estimation

After step 3 finds some approximate modes, adopt average rule and policy or the even strategy that declines to carry out multi-step prediction according to the value below of these patterns;

Step 5: fusion predicts the outcome

Adopt multiple modal lengths to instruct prediction, and merge according to the predicted value of each modal length, adopt machine learning Adaboost algorithm to synthesize, obtain net result.

According to the cpu load multistep forecasting method based on schema merging described in preferred embodiment of the present invention, in step 1:

Time series data is a series of data x ₁, x ₂, x ₃..., x _n, between these data, have sequence;

Data pattern is a given time series data, looks for and make a call to a sub-sequence C from time series data _p=x _p, x _p+1..., x _p+w-1, this subsequence often occurs in historical data.

According to the cpu load multistep forecasting method based on schema merging described in preferred embodiment of the present invention, in step 2, filterable agent α meets the following conditions:

\frac{number (filter (Q_{i}))}{number (Q_{i})} \leq 1 - α

Wherein, Qi is a set of modes that length is i;

Number (Qi) is the number of pattern in Qi;

Filter (Qi) is the number of pattern after filtering,

According to the cpu load multistep forecasting method based on schema merging described in preferred embodiment of the present invention, in step 3, pattern match comprises the following steps:

Step 31: trend coupling

By the trend distance between tolerance Hamming distance computation schema, adopt parameter lambda to assess:

λ &GreaterEqual; 1 - \frac{| | X_{i}^{'} - m_{j}^{'} | |}{length (X_{i}^{'})}

Wherein, || X _i'-m _j' || be the Hamming distance of two patterns of tolerance;

λ is matching parameter;

X _i' with m _j' be the trend direction of i and j pattern;

Step 32: Euclidean distance

Through the pattern match after step 31, can obtain some similar patterns, calculate Euclidean distance:

dist(i，j)＝|X _i-m _j|

Wherein i and j are the sequence numbers of pattern;

X _ii pattern;

M _jj pattern

| X _i-m _j| be two Euclidean distances between pattern;

Step 33, according to the Euclidean distance between each pattern of the rear calculating of step 32 and pattern, sorts to these distances, the immediate pattern of conduct of K Euclidean distance minimum before then therefrom selecting.

According to the cpu load multistep forecasting method based on schema merging described in preferred embodiment of the present invention, in step 4, average rule and policy is: from history, find after some approximate modes, from the direct weighted mean of these approximate mode successor value, as the end value of prediction, the value of predicting the outcome is as follows:

\{\begin{matrix} {\tilde{X}}_{n + 1} = \frac{1}{d} Σ_{i = 1}^{d} {CP}_{i} [n + 1] \\ {\tilde{X}}_{n + 2} = \frac{1}{d} Σ_{i = 1}^{d} {CP}_{i} [n + 2] \\ . \\ . \\ . \\ {\tilde{X}}_{n + h} = \frac{1}{d} Σ_{i = 1}^{d} {CP}_{i} [n + h] \end{matrix}

Wherein, h is the length of prediction step number;

D is the number that finds approximate mode from historical data;

N is the length of given data;

CP _isome candidate's approximate modes that find.

According to the cpu load multistep forecasting method based on schema merging described in preferred embodiment of the present invention, in step 4, evenly decline strategy is: in analytic process, find the length in time of correlativity of data and change, some nearest pattern is higher to the confidence level of prediction, adopt to each pattern the end value as prediction to weight, the value of predicting the outcome is as follows:

\{\begin{matrix} {\tilde{X}}_{n + 1} = Σ_{i = 1}^{d} ω_{i}^{'} {CP}_{i} [n + 1] \\ {\tilde{X}}_{n + 2} = Σ_{i = 1}^{d} ω_{i}^{'} {CP}_{i} [n + 2] \\ . \\ . \\ . \\ {\tilde{X}}_{n + h} = Σ_{i = 1}^{d} {ω_{i}^{'} CP}_{i} [n + h] \end{matrix}

Wherein,

ω_{i} = 1 l_{i .}, ω_{i}^{'} = ω_{i} / Σ_{i = 1}^{d} ω_{i .}, Σ_{i = 1}^{d} ω_{i}^{'} = 1 .;

D is the number that finds approximate mode from historical data;

N is the length of given data;

CP _isome candidate's approximate modes that find.

H is prediction length;

I is the sequence number of pattern;

L _iit is the time span between pattern and present mode.

According to the cpu load multistep forecasting method based on schema merging described in preferred embodiment of the present invention, the net result that step 5 obtains is:

\{\begin{matrix} {\tilde{X}}_{n + 1} = Σ_{i = 1}^{m} α_{i}^{'} {\tilde{X}}_{n + 1}^{i} \\ {\tilde{X}}_{n + 2} = Σ_{i = 1}^{m} α_{i}^{'} {\tilde{X}}_{n + 2}^{i} \\ . \\ . \\ . \\ {\tilde{X}}_{n + h} = Σ_{i = 1}^{m} α_{i}^{'} {\tilde{X}}_{n + h}^{i} \end{matrix}

Wherein, h is prediction length;

M is the number of preference pattern length;

α _ii the weight that modal length predicts the outcome;

N is the length of given data.

Beneficial effect of the present invention is: the feature that the present invention is directed to server cpu load, cpu load is carried out to multi-step prediction, by schema merging method, close pattern is fused into a general pattern, and mate according to these general patterns, the mode of coupling adopts the trend distance of pattern to combine with Euclidean distance, find relevant approximate mode, and instruct multi-step prediction according to these approximate modes, adopt a kind of synthesis mode based on weight to calculate multi-step prediction result, reach cpu load prediction accurately, improve accuracy and the reliability of server to scheduling of resource.

The present invention adopts single Euclidean distance coupling, propose from historical data, to find similar pattern by trend mode to Euclidean distance matching way, solve the unicity problem of time series data coupling, simultaneously, the synthesis mode the present invention proposes based on weight calculates multi-step prediction result, has solved the inaccuracy problem of long-term and multi-step prediction, therefore, compared with prior art, the present invention has advantages of that precision of prediction is high, accuracy is high, reliability is high.

Brief description of the drawings

Fig. 1 is the principle schematic of the cpu load multistep forecasting method based on schema merging of the present invention;

Fig. 2 is the principle schematic of schema merging of the present invention;

Fig. 3 is the principle schematic of pattern match of the present invention.

Embodiment

Below with reference to accompanying drawing of the present invention; technical scheme in the embodiment of the present invention is carried out to clear, complete description and discussion; obviously; as described herein is only a part of example of the present invention; it is not whole examples; based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under the prerequisite of not making creative work, belongs to protection scope of the present invention.

For the ease of the understanding to the embodiment of the present invention, be further explained as an example of specific embodiment example below in conjunction with accompanying drawing, and each embodiment does not form the restriction to the embodiment of the present invention.

Please refer to Fig. 1 to Fig. 3, a kind of cpu load multistep forecasting method based on schema merging, by schema merging method, close pattern is fused into a general pattern, and mate according to these general patterns, the mode of coupling adopts the trend distance of pattern to combine with Euclidean distance, find relevant approximate mode, and instruct multi-step prediction according to these approximate modes, adopt a kind of synthesis mode based on weight to calculate multi-step prediction result, specifically comprise the following steps:

S1, pattern extraction

A time series data is cut into the set of multiple patterns, and adds up the number of each pattern.

Wherein:

Time series data is a series of data x ₁, x ₂, x ₃..., x _n, between these data, have sequence.

Data pattern is a given time series data, looks for and make a call to a sub-sequence C from data _p=x _p, x _p+1..., x _p+w-1, this subsequence often occurs in historical data.

S2, mode filtering

Obtain all patterns and number through S1, filter the pattern that some seldom occur.The mode of filtering is to these pattern statistics, sorts from big to small according to number, and a given filterable agent α, allows the pattern after filtering can cover most of pattern.

\frac{number (filter (Q_{i}))}{number (Q_{i})} \leq 1 - α

Wherein Qi is a set of modes that length is i;

Number (Qi) is the number of pattern in Qi;

Filter (Qi) is the number of pattern after filtering;

S3, schema merging, coupling

Because the direction indication of a pattern becomes rise and fall, 1 and-1, trend can be expressed as 1 with-1 combination.Between some pattern, be more or less the same like this, the pattern being more or less the same for some patterns, can be merged into some general Trend Patterns.As two Mode As=[3,5,8,10,11,14,18,21,22] and B=[2,5,6,9,8,12,15,17,20].In Fig. 2, can find that two patterns are closely similar, their trend direction is very approaching, can be expressed as [↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑] and [↑ ↑ ↑ ↓ ↑ ↑ ↑ ↑].Therefore adopt amalgamation mode that they are merged into a common pattern [↑ ↑ ↑ * ↑ ↑ ↑ ↑].

Pattern match specifically comprises the following steps:

S31, trend coupling, by the trend distance between tolerance Hamming distance computation schema, in order to calculate better similarity, adopt a parameter lambda to assess:

λ &GreaterEqual; 1 - \frac{| | X_{i}^{'} - m_{j}^{'} | |}{length (X_{i}^{'})}

λ is matching parameter;

X _i' with m _j' be the trend direction of i and j pattern;

S32, Euclidean distance, the pattern match after step S31, can obtain some similar patterns, then calculates Euclidean distance:

dist(i，j)＝|X _i-m _j|

Wherein i and j are the sequence numbers of pattern;

X _ii pattern;

M _jj pattern;

| X _i-m _j| be two Euclidean distances between pattern;

S33, K the immediate pattern of conduct that distance is minimum before selecting.

According to the Euclidean distance between each pattern of calculating after step S32 and pattern, these distances are sorted, the immediate pattern of conduct of K Euclidean distance minimum before then therefrom selecting.Wherein K is User Defined parameter, is to define according to data cases, generally gets and makes to predict that value is determined the most accurately.

S4, pattern weight estimation

Through finding in step S3 after some approximate modes, carry out multi-step prediction according to the value below of these patterns.

In the time that approximate mode is a lot, need to carry out integrated merging by a kind of strategy, as shown in Figure 3.The present invention adopts average rule and policy or the strategy that evenly declines carries out multi-step prediction according to the value below of these patterns.Wherein,

Average rule and policy is: from history, find after some approximate modes, from the direct weighted mean of these approximate mode successor value, as the end value of prediction, the value of predicting the outcome is as follows:

\{\begin{matrix} {\tilde{X}}_{n + 1} = \frac{1}{d} Σ_{i = 1}^{d} {CP}_{i} [n + 1] \\ {\tilde{X}}_{n + 2} = \frac{1}{d} Σ_{i = 1}^{d} {CP}_{i} [n + 2] \\ . \\ . \\ . \\ {\tilde{X}}_{n + h} = \frac{1}{d} Σ_{i = 1}^{d} {CP}_{i} [n + h] \end{matrix}

Wherein, h is the length of prediction step number;

D is the number that finds approximate mode from historical data;

N is the length of given data;

CP _isome candidate's approximate modes that find.

Evenly decline strategy is: in analytic process, and length in time of the correlativity of finding data and changing, some nearest pattern is higher to the confidence level of prediction, more can instruct prediction.Adopt each pattern to the end value of a Weight prediction, the size of weight is length variations in time.The value of predicting the outcome is as follows:

\{\begin{matrix} {\tilde{X}}_{n + 1} = Σ_{i = 1}^{d} ω_{i}^{'} {CP}_{i} [n + 1] \\ {\tilde{X}}_{n + 2} = Σ_{i = 1}^{d} ω_{i}^{'} {CP}_{i} [n + 2] \\ . \\ . \\ . \\ {\tilde{X}}_{n + h} = Σ_{i = 1}^{d} {ω_{i}^{'} CP}_{i} [n + h] \end{matrix}

Wherein,

ω_{i} = 1 l_{i .}, ω_{i}^{'} = ω_{i} / Σ_{i = 1}^{d} ω_{i .}, Σ_{i = 1}^{d} ω_{i}^{'} = 1 .;

D is the number that finds approximate mode from historical data;

N is the length of given data;

CP _isome candidate's approximate modes that find;

H is prediction length;

I is the sequence number of pattern;

L _iit is the time span between pattern and present mode.

S5, fusion predicts the outcome

For a time series data, be very difficult if find a suitable modal length from data.Therefore adopt multiple modal lengths to instruct prediction, and merge according to the predicted value of each modal length, fusion method adopts machine learning Adaboost algorithm to synthesize, and finally obtains a final result as follows:

\{\begin{matrix} {\tilde{X}}_{n + 1} = Σ_{i = 1}^{m} α_{i}^{'} {\tilde{X}}_{n + 1}^{i} \\ {\tilde{X}}_{n + 2} = Σ_{i = 1}^{m} α_{i}^{'} {\tilde{X}}_{n + 2}^{i} \\ . \\ . \\ . \\ {\tilde{X}}_{n + h} = Σ_{i = 1}^{m} α_{i}^{'} {\tilde{X}}_{n + h}^{i} \end{matrix}

Wherein, h is prediction length;

M is the number of preference pattern length;

α _ii the weight that modal length predicts the outcome;

N is the length of given data.

The present invention is directed to the feature of server cpu load, cpu load is carried out to multi-step prediction, by schema merging method, close pattern is fused into a general pattern, and mate according to these general patterns, the mode of coupling adopts the trend distance of pattern to combine with Euclidean distance, find relevant approximate mode, and instruct multi-step prediction according to these approximate modes, adopt a kind of synthesis mode based on weight to calculate multi-step prediction result, reach cpu load prediction accurately, improve accuracy and the reliability of server to scheduling of resource.

Disclosed is above only several specific embodiment of the present invention, but the present invention is not limited thereto, and the changes that any person skilled in the art can think of all should drop in protection scope of the present invention.

Claims

1. the cpu load multistep forecasting method based on schema merging, is characterized in that, comprises the following steps:

Step 1: pattern extraction

Step 2: mode filtering

Step 3: schema merging, coupling

Step 4: pattern weight estimation

Step 5: fusion predicts the outcome

2. the cpu load multistep forecasting method based on schema merging according to claim 1, is characterized in that, in step 1:

Described time series data is a series of data x ₁, x ₂, x ₃..., x _n, between these data, have sequence;

Described data pattern is a given time series data, looks for and make a call to a sub-sequence C from time series data _p=x _p, x _p+1..., x _p+w-1, this subsequence often occurs in historical data.

3. the cpu load multistep forecasting method based on schema merging according to claim 2, is characterized in that, in step 2, filterable agent α meets the following conditions:

Wherein, Qi is a set of modes that length is i;

Number (Qi) is the number of pattern in Qi;

Filter (Qi) is the number of pattern after filtering.

4. the cpu load multistep forecasting method based on schema merging according to claim 2, is characterized in that, in step 3, pattern match comprises the following steps:

Step 31: trend coupling

λ is matching parameter;

X _i' with m _j' be the trend direction of i and j pattern;

Step 32: Euclidean distance

dist(i，j)＝|X _i-m _j|

Wherein i and j are the sequence numbers of pattern;

X _ii pattern;

M _jj pattern;

| X _i-m _j| be two Euclidean distances between pattern;

5. the cpu load multistep forecasting method based on schema merging according to claim 4, it is characterized in that, described in step 4, average rule and policy is: from history, find after some approximate modes, from the direct weighted mean of these approximate mode successor value, as the end value of prediction, the value of predicting the outcome is as follows:

Wherein, h is the length of prediction step number;

D is the number that finds approximate mode from historical data;

N is the length of given data;

CP _isome candidate's approximate modes that find.

6. the cpu load multistep forecasting method based on schema merging according to claim 4, it is characterized in that, described in step 4, evenly decline strategy is: in analytic process, find the length in time of correlativity of data and change, some nearest pattern is higher to the confidence level of prediction, adopt to each pattern the end value as prediction to weight, the value of predicting the outcome is as follows:

Wherein,

D is the number that finds approximate mode from historical data;

N is the length of given data;

CP _isome candidate's approximate modes that find;

H is prediction length;

I is the sequence number of pattern;

L _iit is the time span between pattern and present mode.

7. according to the cpu load multistep forecasting method based on schema merging described in claim 5 or 6, it is characterized in that, the net result that step 5 obtains is:

Wherein, h is prediction length;

M is the number of preference pattern length;

α _ii the weight that modal length predicts the outcome;

N is the length of given data.