CN103747477A - Network flow analysis and prediction method and device - Google Patents

Network flow analysis and prediction method and device Download PDF

Info

Publication number
CN103747477A
CN103747477A CN201410019136.7A CN201410019136A CN103747477A CN 103747477 A CN103747477 A CN 103747477A CN 201410019136 A CN201410019136 A CN 201410019136A CN 103747477 A CN103747477 A CN 103747477A
Authority
CN
China
Prior art keywords
time series
flow
seasonal effect
computing formula
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410019136.7A
Other languages
Chinese (zh)
Other versions
CN103747477B (en
Inventor
杜翠凤
陆蕊
蒋仕宝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GCI Science and Technology Co Ltd
Original Assignee
GCI Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GCI Science and Technology Co Ltd filed Critical GCI Science and Technology Co Ltd
Priority to CN201410019136.7A priority Critical patent/CN103747477B/en
Publication of CN103747477A publication Critical patent/CN103747477A/en
Application granted granted Critical
Publication of CN103747477B publication Critical patent/CN103747477B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a network flow analysis and prediction method and device. The method comprises the steps of extracting the overall features of flow time sequences of each substation to be measured; clustering according to the extracted overall features; collecting the attribute features of the flow data according to the clustering result; finally performing flow prediction according o the attribute features of the flow data and the flow at the last moment. According to the network flow analysis and prediction method and device, the overall features of the time sequences are extracted, the similarity of the time sequences is reflected by using the similarity of the overall features, the dynamic features of time sequences changed along with time are mastered to obtain a more reasonable result, and meanwhile, the large-sized time sequence is described by less features, thus improving the robustness for judging the similarity result and reducing the complexity in the clustering operation process. Various attribute features related to the flow data are collected according to the clustering result, the flow data are predicted by the flow and the attribute features together, the prediction information amount is large, the prediction precision is correspondingly improved and the reasonable resource configuring is performed on a network.

Description

Network traffic analysis and Forecasting Methodology and device
Technical field
The present invention relates to communication technical field, particularly relate to a kind of network traffic analysis and Forecasting Methodology and device.
Background technology
In communication network optimize, network traffic analysis and prediction are very important links, significant to distributing rationally of Internet resources.Accurately whether volume forecasting, whether the interpretation predicting the outcome and predicting the outcome conforms to actual flow data, all directly affect investment and the construction scale of network, and to the preliminary analysis of flow, be the key of volume forecasting, directly affect the accuracy of volume forecasting.
In prior art, utilize original time series to analyze flow, adopt the similitude between Euclidean distance measuring period sequence, then according to this similitude, carry out cluster; Meanwhile, during predicted flow rate, by historical data on flows, predict unknown flow rate data, adopt traditional Regression Forecast, time series analysis etc.
Existing method is only paid attention to the difference of time series value on corresponding time point; Adopt the similitude between euclidean distance metric time series, thereby cause result to be vulnerable to the impact of value on indivedual time points, lost the robustness of result; Only utilize data on flows, thereby caused the result poor-performing of prediction.
Summary of the invention
Based on above-mentioned situation, the present invention proposes a kind of network traffic analysis and Forecasting Methodology, can improve precision of prediction, network is carried out to rational resource distribution.
To achieve these goals, technical scheme of the present invention is:
A kind of network traffic analysis and Forecasting Methodology, comprise the following steps:
Extract the flow seasonal effect in time series global characteristics of each base station to be measured;
According to extracted global characteristics, carry out cluster;
According to the result of institute's cluster, gather the attributive character of data on flows;
According to the attributive character of described data on flows and the flow in a upper moment, carried out volume forecasting.
For prior art problem, the invention allows for a kind of network traffic analysis and prediction unit, improve existing flow analysis robustness poor, the problem that volume forecasting precision is low, is applicable to practical application.
Specific implementation is: a kind of network traffic analysis and prediction unit, comprising:
Extraction module, for extracting the flow seasonal effect in time series global characteristics of each base station to be measured;
Cluster module, for carrying out cluster according to extracted global characteristics;
Acquisition module, for according to the result of institute's cluster, gathers the attributive character of data on flows;
Prediction module, for according to the attributive character of described data on flows and the flow in a upper moment, carried out volume forecasting.
Compared with prior art, beneficial effect of the present invention is: network traffic analysis of the present invention and Forecasting Methodology and device, first extract the flow seasonal effect in time series global characteristics of each base station to be measured; Then according to extracted global characteristics, carry out cluster; According to the result of institute's cluster, gather the attributive character of data on flows again; Last according to the attributive character of described data on flows and the flow in a upper moment, carry out volume forecasting.Use after technology of the present invention, extraction time sequence global characteristics, by global characteristics similitude, carry out the similitude of reflecting time sequence, catch the time dependent behavioral characteristics of time series, obtain more rational result, by describing large-scale time series by a small amount of feature, improve the robustness of judging analog result simultaneously, reduce the complexity in cluster calculation process; The various attributive character relevant to data on flows according to cluster result collection, according to flow and the common predicted flow rate data of attributive character, containing much information of prediction, has correspondingly improved precision of prediction, and network is carried out to rational resource distribution.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of network traffic analysis and Forecasting Methodology in an embodiment;
Fig. 2 is the structural representation of network traffic analysis and prediction unit in an embodiment.
Embodiment
For making object of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is described in further detail.Should be appreciated that embodiment described herein, only in order to explain the present invention, does not limit protection scope of the present invention.
Network traffic analysis and Forecasting Methodology in an embodiment, as shown in Figure 1, described method comprises:
Step S101: the flow seasonal effect in time series global characteristics that extracts each base station to be measured;
Step S102: carry out cluster according to extracted global characteristics;
Step S103: according to the result of institute's cluster, gather the attributive character of data on flows;
Step S104: according to the attributive character of described data on flows and the flow in a upper moment, carried out volume forecasting.
Known from the above description, this method, according to flow and the common predicted flow rate data of attributive character, improves predicting network flow precision, and network is carried out to rational resource distribution.
As an embodiment, described global characteristics comprises any one or multinomial in tendency feature or seasonal characteristics or kurtosis feature or degree of bias feature or auto-correlation coefficient feature or nonlinear characteristic or spectrum signature.
As an embodiment, described flow time series is by gathering a data on flows for each base station to be measured by sky, and continuous acquisition obtains half a year.
As an embodiment, described tendency feature is weighed by Z statistic, and Z statistic is greater than zero, is ascendant trend; Z statistic is less than zero, is downward trend; The computing formula of Z statistic is: Z = S - 1 Var ( S ) S > 0 0 S = 0 , S + 1 Var ( S ) S < 0 The statistic that wherein S is Normal Distribution, the variance that Var (S) is S, the computing formula of S is: the computing formula of Var (S) is: Var (S)=T (T-1) (2T+5)/18; Flow time series x t, t=1,2 ... T, T is flow seasonal effect in time series length, x jfor flow time series is at the value in j moment, x kfor flow time series is in the value in k moment, sign function sgn (x j-x k) computing formula be: sgn ( x j - x k ) = + 1 ( x j - x k ) > 0 0 ( x j - x k ) = 0 ; - 1 ( x j - x k ) < 0
Described seasonal characteristics is by reflecting average period, and the calculation procedure of average period is: to flow time series x tcarry out fast fourier transform, i.e. FFT conversion, t=1,2 ... T, T is flow seasonal effect in time series length, obtains: F ( k ) = &Sigma; t = 1 T x t e - j 2 &pi; ( k - 1 ) t - 1 T , The frequency of wherein using is: f t = 2 &pi; t - 1 T ; Further calculating average frequency is: f tm = &Sigma; t = 1 T f t T = 2 &pi; &Sigma; t = 1 T ( t - 1 ) T 2 , Calculating average period is: T tm = 1 f tm = T 2 2 &pi; &Sigma; t = 1 T ( t - 1 ) ;
In described kurtosis feature, the computing formula of kurtosis is:
Figure BDA0000457319510000043
wherein x tfor flow time series, t=1,2 ... T, T is flow seasonal effect in time series length,
Figure BDA0000457319510000044
for flow seasonal effect in time series average, σ is flow seasonal effect in time series sample standard deviation;
In described degree of bias feature, the computing formula of the degree of bias is:
Figure BDA0000457319510000045
wherein x tfor flow time series, t=1,2 ... T, T is flow seasonal effect in time series length,
Figure BDA0000457319510000049
for flow seasonal effect in time series average, σ is flow seasonal effect in time series sample standard deviation;
Described auto-correlation coefficient feature is weighed by Ljung-Box-Q statistic, and Ljung-Box-Q statistic detects whether flow time series is white-noise process, and the computing formula of Ljung-Box-Q statistic is:
Figure BDA0000457319510000046
wherein T is flow seasonal effect in time series length, and p is the maximum hysteresis exponent number being considered, and τ is hysteresis issue, r τfor flow seasonal effect in time series auto-correlation coefficient; r τcomputing formula be:
Figure BDA0000457319510000047
wherein x tfor flow time series, t=1,2 ... T,
Figure BDA0000457319510000048
for flow seasonal effect in time series average;
Described nonlinear characteristic reflects by BDS test statistics, and BDS test statistics detects whether flow time series is independent same distribution, for flow time series x t, t=1,2 ... T, moment s, the observed value of w is x sand x w, all observed value (x s, x w) by being configured to:
{ (x s, x w), (x s+1, x w+1), (x s+2, x w+2) ... (x s+m-1, x w+m-1), wherein m embeds interval; The computing formula of BDS statistic is: W ( N , m , r ) = N C ( N , m , r ) - C ( N , 1 , r ) m &sigma; &prime; ( N , m , r ) , Wherein r is interval size, and C (N, m, r) is correlation intergal, σ ' (N, m, r) be C (N, m, r)-C (N, 1, r) mthe estimation of progressive standard deviation; C (N, m, r) computing formula is: C ( N , m , r ) = 2 N ( N - 1 ) &Sigma; w < s H ( r - | | x w m - x s m | | ) , Wherein x w m , x s m All m dimensional vectors, x w m = ( x w - m + 1 , &CenterDot; &CenterDot; &CenterDot; , x w ) , x s m = ( x s - m + 1 , &CenterDot; &CenterDot; &CenterDot; , x s ) H ( r - | | x t m - x s m | | ) = 0 r - | | x t m - x s m | | &le; 0 1 r - | | x t m - x s m | | > 0 ;
Described spectrum signature is the front second order coefficient of the discrete Fourier transform (DFT) of extraction, the extraction of spectrum signature adopts discrete Fourier transform (DFT) coefficient, can extract front n rank coefficient as spectrum signature, because the HFS of a signal is unimportant, therefore most of concentration of energy in frequency domain space is on front several coefficients.
As an embodiment, described tendency characteristic use linear trend method obtains, and adopts linear trend method to isolate seasonal effect in time series trend components, and by the slope term of linear function as this seasonal effect in time series trend feature, settling time sequence x t, t=1,2 ... T is about the regression model of time t, x t=alpha+beta t+ ε t, wherein α is intercept, and β is slope, and ε is error, and the least-squares estimation of β is:
Figure BDA0000457319510000055
wherein
Figure BDA0000457319510000056
t represents flow seasonal effect in time series length;
Described seasonal characteristics utilizes H-P filter method to obtain, by computational minimization time series x twith Trend value y tbetween difference estimate trend components: min t = 1,2 , &CenterDot; &CenterDot; &CenterDot; T { &Sigma; t = 1 T ( x t - y t ) 2 + &lambda; [ ( y t + 1 - y t ) - ( y t - y t - 1 ) ] 2 } , Wherein, T is flow seasonal effect in time series length, and λ is the penalty factor to trend components fluctuation, can obtain thus periodic component: wherein, L is hysteresis operator, works as C tthere is obvious peak value, can judge time series x thave cyclic swing composition, the corresponding cycle of peak value is this seasonal effect in time series Cycle Length;
Described nonlinear characteristic adopts McLeod-Li-check or Bispectral check or RESET check or F checks or Neural Network Based Nonlinear test statistics reflects.
Do not get rid of other method in addition and can obtain above-mentioned global characteristics.
As an embodiment, described cluster comprises Kmeans cluster, and using extracted global characteristics as new characteristic vector, the corresponding new characteristic vector of the flow time series of each base station to be measured, carries out K-means cluster to new characteristic vector.
As an embodiment, described cluster comprises FCM cluster, and using extracted global characteristics as new characteristic vector, the corresponding new characteristic vector of the flow time series of each base station to be measured, carries out FCM cluster to new characteristic vector.
Do not get rid of other clustering method in addition.
In order to understand better this method, below elaborate the application example an of this method:
A, by day gather each prediction base station flow time series { x t, t=1,2 ... T}, continuous acquisition half a year;
B, extract the flow seasonal effect in time series global characteristics of each base station, comprise tendency feature, seasonal characteristics, kurtosis feature, degree of bias feature, auto-correlation coefficient feature, nonlinear characteristic and spectrum signature;
C, using the global characteristics of each base station extracting as new characteristic vector, now the corresponding new characteristic vector of the flow time series of each base station, carries out cluster to new characteristic vector application K-means clustering method;
D, to each the class base station data attribute suitable according to its feature selecting after cluster, if data on flows presents tendency feature, gather the ARPU value relevant to data on flows, 3G permeability; If data on flows presents periodically, gather the ARPU value relevant to data on flows, 3G permeability, total number of users;
E, set up one and there is the BP neural network structure that three-decker, transfer function are tansig and train;
F, the model that previous step is trained, attributive character and the flow in a upper moment of the data on flows that will predict that input gathers, calculate the flow that will predict, for example, input the attributive character of data on flows and the flow of yesterday of the today gathering, can dope the flow of today.
Wherein, in step B, extract flow seasonal effect in time series global characteristics, extract by the following method:
B1, described tendency feature are weighed by Z statistic, and the computing formula of Z statistic is: Z = S - 1 Var ( S ) S > 0 0 S = 0 , S + 1 Var ( S ) S < 0 The statistic that wherein S is Normal Distribution, the variance that Var (S) is S, the computing formula of S is:
Figure BDA0000457319510000072
the computing formula of Var (S) is: Var (S)=T (T-1) (2T+5)/18; Flow time series x t, t=1,2 ... T, T is flow seasonal effect in time series length, x jfor flow time series is at the value in j moment, x kfor flow time series is in the value in k moment, sign function sgn (x j-x k) computing formula be: sgn ( x j - x k ) = + 1 ( x j - x k ) > 0 0 ( x j - x k ) = 0 ; - 1 ( x j - x k ) < 0
B2, described seasonal characteristics are by reflecting average period, and the calculation procedure of average period is: to flow time series x tcarry out fast fourier transform, i.e. FFT conversion, t=1,2 ... T, T is flow seasonal effect in time series length, obtains: F ( k ) = &Sigma; t = 1 T x t e - j 2 &pi; ( k - 1 ) t - 1 T , The frequency of wherein using is: f t = 2 &pi; t - 1 T ; Further calculating average frequency is: f tm = &Sigma; t = 1 T f t T = 2 &pi; &Sigma; t = 1 T ( t - 1 ) T 2 , Calculating average period is: T tm = 1 f tm = T 2 2 &pi; &Sigma; t = 1 T ( t - 1 ) ;
In B3, described kurtosis feature, the computing formula of kurtosis is:
Figure BDA0000457319510000078
wherein x tfor flow time series, t=1,2 ... T, T is flow seasonal effect in time series length,
Figure BDA0000457319510000079
for flow seasonal effect in time series average, σ is flow seasonal effect in time series sample standard deviation;
In B4, described degree of bias feature, the computing formula of the degree of bias is:
Figure BDA00004573195100000710
wherein x tfor flow time series, t=1,2 ... T, T is flow seasonal effect in time series length,
Figure BDA00004573195100000711
for flow seasonal effect in time series average, σ is flow seasonal effect in time series sample standard deviation;
B5, described auto-correlation coefficient feature are weighed by Ljung-Box-Q statistic, and the computing formula of Ljung-Box-Q statistic is:
Figure BDA0000457319510000081
wherein T is flow seasonal effect in time series length, and p is the maximum hysteresis exponent number being considered, and τ is hysteresis issue, r τfor flow seasonal effect in time series auto-correlation coefficient; r τcomputing formula be:
Figure BDA0000457319510000082
wherein x tfor flow time series, t=1,2 ... T,
Figure BDA0000457319510000083
for flow seasonal effect in time series average;
B6, described nonlinear characteristic reflect by BDS test statistics, for flow time series x t, t=1,2 ... T, moment s, the observed value of w is x sand x w, all observed value (x s, x w) by being configured to: { (x s, x w), (x s+1, x w+1), (x s+2, x w+2) ... (x s+m-1, x w+m-1), wherein m embeds interval; The computing formula of BDS statistic is: W ( N , m , r ) = N C ( N , m , r ) - C ( N , 1 , r ) m &sigma; &prime; ( N , m , r ) , Wherein r is interval size, and C (N, m, r) is correlation intergal, σ ' (N, m, r) be C (N, m, r)-C (N, 1, r) mthe estimation of progressive standard deviation; C (N, m, r) computing formula is: C ( N , m , r ) = 2 N ( N - 1 ) &Sigma; w < s H ( r - | | x w m - x s m | | ) , Wherein x w m , x s m All m dimensional vectors, x w m = ( x w - m + 1 , &CenterDot; &CenterDot; &CenterDot; , x w ) , x s m = ( x s - m + 1 , &CenterDot; &CenterDot; &CenterDot; , x s ) H ( r - | | x t m - x s m | | ) = 0 r - | | x t m - x s m | | &le; 0 1 r - | | x t m - x s m | | > 0 ;
B7, described spectrum signature are the front second order coefficient of the discrete Fourier transform (DFT) of extraction.
Network traffic analysis and prediction unit in an embodiment, as shown in Figure 2, described device comprises:
Extraction module, for extracting the flow seasonal effect in time series global characteristics of each base station to be measured;
Cluster module, for carrying out cluster according to extracted global characteristics;
Acquisition module, for according to the result of institute's cluster, gathers the attributive character of data on flows;
Prediction module, for according to the attributive character of described data on flows and the flow in a upper moment, carried out volume forecasting.
As shown in Figure 2, this preferred embodiment of installing each module annexation is: extraction module, cluster module, acquisition module and prediction module are linked in sequence successively.
First extraction module extracts the flow seasonal effect in time series global characteristics of each base station to be measured; Then cluster module is carried out cluster according to extracted global characteristics; Again by acquisition module according to the result of institute's cluster, gather the attributive character of data on flows; Last prediction module, the flow input neural network structure in the attributive character of described data on flows and a upper moment, was carried out volume forecasting, and the flow analysis of this plant network is more reasonable, the containing much information of prediction, and precision is high, is applicable to applying.
As an embodiment, described global characteristics comprises any one or multinomial in tendency feature or seasonal characteristics or kurtosis feature or degree of bias feature or auto-correlation coefficient feature or nonlinear characteristic or spectrum signature.
As an embodiment, described flow time series is by gathering a data on flows for each base station to be measured by sky, and continuous acquisition obtains half a year.
As an embodiment, described tendency feature is weighed by Z statistic, and Z statistic is greater than zero, is ascendant trend; Z statistic is less than zero, is downward trend; The computing formula of Z statistic is: Z = S - 1 Var ( S ) S > 0 0 S = 0 , S + 1 Var ( S ) S < 0 The statistic that wherein S is Normal Distribution, the variance that Var (S) is S, the computing formula of S is: the computing formula of Var (S) is: Var (S)=T (T-1) (2T+5)/18; Flow time series x t, t=1,2 ... T, T is flow seasonal effect in time series length, x jfor flow time series is at the value in j moment, x kfor flow time series is in the value in k moment, sign function sgn (x j-x k) computing formula be: sgn ( x j - x k ) = + 1 ( x j - x k ) > 0 0 ( x j - x k ) = 0 ; - 1 ( x j - x k ) < 0
Described seasonal characteristics is by reflecting average period, and the calculation procedure of average period is: to flow time series x tcarry out fast fourier transform, i.e. FFT conversion, t=1,2 ... T, T is flow seasonal effect in time series length, obtains: F ( k ) = &Sigma; t = 1 T x t e - j 2 &pi; ( k - 1 ) t - 1 T , The frequency of wherein using is: f t = 2 &pi; t - 1 T ; Further calculating average frequency is: f tm = &Sigma; t = 1 T f t T = 2 &pi; &Sigma; t = 1 T ( t - 1 ) T 2 , Calculating average period is: T tm = 1 f tm = T 2 2 &pi; &Sigma; t = 1 T ( t - 1 ) ;
In described kurtosis feature, the computing formula of kurtosis is:
Figure BDA0000457319510000101
wherein x tfor flow time series, t=1,2 ... T, T is flow seasonal effect in time series length,
Figure BDA0000457319510000107
for flow seasonal effect in time series average, σ is flow seasonal effect in time series sample standard deviation;
In described degree of bias feature, the computing formula of the degree of bias is:
Figure BDA0000457319510000102
wherein x tfor flow time series, t=1,2 ... T, T is flow seasonal effect in time series length,
Figure BDA0000457319510000108
for flow seasonal effect in time series average, σ is flow seasonal effect in time series sample standard deviation;
Described auto-correlation coefficient feature is weighed by Ljung-Box-Q statistic, and Ljung-Box-Q statistic detects whether flow time series is white-noise process, and the computing formula of Ljung-Box-Q statistic is:
Figure BDA0000457319510000103
wherein T is flow seasonal effect in time series length, and p is the maximum hysteresis exponent number being considered, and τ is hysteresis issue, r τfor flow seasonal effect in time series auto-correlation coefficient; r τcomputing formula be:
Figure BDA0000457319510000104
wherein xt is flow time series, t=1, and 2 ... T,
Figure BDA0000457319510000105
for flow seasonal effect in time series average;
Described nonlinear characteristic reflects by BDS test statistics, and BDS test statistics detects whether flow time series is independent same distribution, for flow time series x t, t=1,2 ... T, moment s, the observed value of w is x sand x w, all observed value (x s, x w) by being configured to:
{ (x s, x w), (x s+1, x w+1), (x s+2, x w+2) ... (x s+m-1, x w+m-1), wherein m embeds interval; The computing formula of BDS statistic is: W ( N , m , r ) = N C ( N , m , r ) - C ( N , 1 , r ) m &sigma; &prime; ( N , m , r ) , Wherein r is interval size, and C (N, m, r) is correlation intergal, σ ' (N, m, r) be C (N, m, r)-C (N, 1, r) mthe estimation of progressive standard deviation; C (N, m, r) computing formula is: C ( N , m , r ) = 2 N ( N - 1 ) &Sigma; w < s H ( r - | | x w m - x s m | | ) , Wherein x w m , x s m All m dimensional vectors, x w m = ( x w - m + 1 , &CenterDot; &CenterDot; &CenterDot; , x w ) , x s m = ( x s - m + 1 , &CenterDot; &CenterDot; &CenterDot; , x s ) H ( r - | | x t m - x s m | | ) = 0 r - | | x t m - x s m | | &le; 0 1 r - | | x t m - x s m | | > 0 ;
Described spectrum signature is the front second order coefficient of the discrete Fourier transform (DFT) of extraction, the extraction of spectrum signature adopts discrete Fourier transform (DFT) coefficient, can extract front n rank coefficient as spectrum signature, because the HFS of a signal is unimportant, therefore most of concentration of energy in frequency domain space is on front several coefficients.
As an embodiment, described tendency characteristic use linear trend method obtains, and adopts linear trend method to isolate seasonal effect in time series trend components, and by the slope term of linear function as this seasonal effect in time series trend feature, settling time sequence x t, t=1,2 ... T is about the regression model of time t, x t=alpha+beta t+ ε t, wherein α is intercept, and β is slope, and ε is error, and the least-squares estimation of β is:
Figure BDA0000457319510000114
wherein t represents flow seasonal effect in time series length;
Described seasonal characteristics utilizes H-P filter method to obtain, by computational minimization time series x twith Trend value y tbetween difference estimate trend components: min t = 1,2 , &CenterDot; &CenterDot; &CenterDot; T { &Sigma; t = 1 T ( x t - y t ) 2 + &lambda; [ ( y t + 1 - y t ) - ( y t - y t - 1 ) ] 2 } , Wherein, T is flow seasonal effect in time series length, and λ is the penalty factor to trend components fluctuation, can obtain thus periodic component:
Figure BDA0000457319510000117
wherein, L is hysteresis operator, works as C tthere is obvious peak value, can judge time series x thave cyclic swing composition, the corresponding cycle of peak value is this seasonal effect in time series Cycle Length;
Described nonlinear characteristic adopts McLeod-Li-check or Bispectral check or RESET check or F checks or Neural Network Based Nonlinear test statistics reflects.
Do not get rid of other method in addition and can obtain above-mentioned global characteristics.
As an embodiment, described cluster comprises Kmeans cluster, and using extracted global characteristics as new characteristic vector, the corresponding new characteristic vector of the flow time series of each base station to be measured, carries out K-means cluster to new characteristic vector.
As an embodiment, described cluster comprises FCM cluster, and using extracted global characteristics as new characteristic vector, the corresponding new characteristic vector of the flow time series of each base station to be measured, carries out FCM cluster to new characteristic vector.
Do not get rid of other clustering method in addition.
The above embodiment has only expressed several execution mode of the present invention, and it describes comparatively concrete and detailed, but can not therefore be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection range of patent of the present invention should be as the criterion with claims.

Claims (10)

1. network traffic analysis and a Forecasting Methodology, is characterized in that, comprises the following steps:
Extract the flow seasonal effect in time series global characteristics of each base station to be measured;
According to extracted global characteristics, carry out cluster;
According to the result of institute's cluster, gather the attributive character of data on flows;
According to the attributive character of described data on flows and the flow in a upper moment, carried out volume forecasting.
2. network traffic analysis according to claim 1 and Forecasting Methodology, it is characterized in that, described global characteristics comprises any one or multinomial in tendency feature or seasonal characteristics or kurtosis feature or degree of bias feature or auto-correlation coefficient feature or nonlinear characteristic or spectrum signature.
3. network traffic analysis according to claim 1 and Forecasting Methodology, is characterized in that, described flow time series is by gathering a data on flows for each base station to be measured by sky, and continuous acquisition obtains half a year.
4. network traffic analysis according to claim 2 and Forecasting Methodology, is characterized in that, described tendency feature is weighed by Z statistic, and the computing formula of Z statistic is: Z = S - 1 Var ( S ) S > 0 0 S = 0 , S + 1 Var ( S ) S < 0 The statistic that wherein S is Normal Distribution, the variance that Var (S) is S, the computing formula of S is:
Figure FDA0000457319500000012
the computing formula of Var (S) is: Var (S)=T (T-1) (2T+5)/18; Flow time series x t, t=1,2 ... T, T is flow seasonal effect in time series length, x jfor flow time series is at the value in j moment, x kfor flow time series is in the value in k moment, sign function sgn (x j-x k) computing formula be: sgn ( x j - x k ) = + 1 ( x j - x k ) > 0 0 ( x j - x k ) = 0 ; - 1 ( x j - x k ) < 0
Described seasonal characteristics is by reflecting average period, and the calculation procedure of average period is: to flow time series x tcarry out fast fourier transform, i.e. FFT conversion, t=1,2 ... T, T is flow seasonal effect in time series length, obtains: F ( k ) = &Sigma; t = 1 T x t e - j 2 &pi; ( k - 1 ) t - 1 T , The frequency of wherein using is: f t = 2 &pi; t - 1 T ; Further calculating average frequency is: f tm = &Sigma; t = 1 T f t T = 2 &pi; &Sigma; t = 1 T ( t - 1 ) T 2 , Calculating average period is: T tm = 1 f tm = T 2 2 &pi; &Sigma; t = 1 T ( t - 1 ) ;
In described kurtosis feature, the computing formula of kurtosis is:
Figure FDA0000457319500000023
wherein x tfor flow time series, t=1,2 ... T, T is flow seasonal effect in time series length, for flow seasonal effect in time series average, σ is flow seasonal effect in time series sample standard deviation;
In described degree of bias feature, the computing formula of the degree of bias is:
Figure FDA0000457319500000024
wherein x tfor flow time series, t=1,2 ... T, T is flow seasonal effect in time series length,
Figure FDA00004573195000000210
for flow seasonal effect in time series average, σ is flow seasonal effect in time series sample standard deviation;
Described auto-correlation coefficient feature is weighed by Ljung-Box-Q statistic, and the computing formula of Ljung-Box-Q statistic is:
Figure FDA0000457319500000025
wherein T is flow seasonal effect in time series length, and p is the maximum hysteresis exponent number being considered, and τ is hysteresis issue, r τfor flow seasonal effect in time series auto-correlation coefficient; r τcomputing formula be:
Figure FDA0000457319500000026
wherein x tfor flow time series, t=1,2 ... T,
Figure FDA0000457319500000027
for flow seasonal effect in time series average;
Described nonlinear characteristic reflects by BDS test statistics, for flow time series x t, t=1,2 ... T, moment s, the observed value of w is x sand x w, all observed value (x s, x w) by being configured to: { (x s, x w), (x s+1, x w+1), (x s+2, x w+2) ... (x s+m-1, x w+m-1), wherein m embeds interval; The computing formula of BDS statistic is: W ( N , m , r ) = N C ( N , m , r ) - C ( N , 1 , r ) m &sigma; &prime; ( N , m , r ) , Wherein r is interval size, and C (N, m, r) is correlation intergal, σ ' (N, m, r) be C (N, m, r)-C (N, 1, r) mthe estimation of progressive standard deviation; C (N, m, r) computing formula is: C ( N , m , r ) = 2 N ( N - 1 ) &Sigma; w < s H ( r - | | x w m - x s m | | ) , Wherein x w m , x s m All m dimensional vectors, x w m = ( x w - m + 1 , &CenterDot; &CenterDot; &CenterDot; , x w ) , x s m = ( x s - m + 1 , &CenterDot; &CenterDot; &CenterDot; , x s ) H ( r - | | x t m - x s m | | ) = 0 r - | | x t m - x s m | | &le; 0 1 r - | | x t m - x s m | | > 0 ;
Described spectrum signature is the front second order coefficient of the discrete Fourier transform (DFT) of extraction.
5. network traffic analysis according to claim 1 and Forecasting Methodology, it is characterized in that, described cluster comprises Kmeans cluster, using extracted global characteristics as new characteristic vector, the corresponding new characteristic vector of flow time series of each base station to be measured, carries out K-means cluster to new characteristic vector.
6. network traffic analysis and a prediction unit, is characterized in that, comprising:
Extraction module, for extracting the flow seasonal effect in time series global characteristics of each base station to be measured;
Cluster module, for carrying out cluster according to extracted global characteristics;
Acquisition module, for according to the result of institute's cluster, gathers the attributive character of data on flows;
Prediction module, for according to the attributive character of described data on flows and the flow in a upper moment, carried out volume forecasting.
7. network traffic analysis according to claim 6 and prediction unit, it is characterized in that, described global characteristics comprises any one or multinomial in tendency feature or seasonal characteristics or kurtosis feature or degree of bias feature or auto-correlation coefficient feature or nonlinear characteristic or spectrum signature.
8. network traffic analysis according to claim 6 and prediction unit, is characterized in that, described flow time series is by gathering a data on flows for each base station to be measured by sky, and continuous acquisition obtains half a year.
9. network traffic analysis according to claim 7 and prediction unit, is characterized in that, described tendency feature is weighed by Z statistic, and the computing formula of Z statistic is: Z = S - 1 Var ( S ) S > 0 0 S = 0 , S + 1 Var ( S ) S < 0 The statistic that wherein S is Normal Distribution, the variance that Var (S) is S, the computing formula of S is:
Figure FDA0000457319500000035
the computing formula of Var (S) is: Var (S)=T (T-1) (2T+5)/18; Flow time series x t, t=1,2 ... T, T is flow seasonal effect in time series length, x jfor flow time series is at the value in j moment, x kfor flow time series is in the value in k moment, sign function sgn (x j-x k) computing formula be: sgn ( x j - x k ) = + 1 ( x j - x k ) > 0 0 ( x j - x k ) = 0 ; - 1 ( x j - x k ) < 0
Described seasonal characteristics is by reflecting average period, and the calculation procedure of average period is: to flow time series x tcarry out fast fourier transform, i.e. FFT conversion, t=1,2 ... T, T is flow seasonal effect in time series length, obtains: F ( k ) = &Sigma; t = 1 T x t e - j 2 &pi; ( k - 1 ) t - 1 T , The frequency of wherein using is: f t = 2 &pi; t - 1 T ; Further calculating average frequency is: f tm = &Sigma; t = 1 T f t T = 2 &pi; &Sigma; t = 1 T ( t - 1 ) T 2 , Calculating average period is: T tm = 1 f tm = T 2 2 &pi; &Sigma; t = 1 T ( t - 1 ) ;
In described kurtosis feature, the computing formula of kurtosis is:
Figure FDA0000457319500000046
wherein x tfor flow time series, t=1,2 ... T, T is flow seasonal effect in time series length,
Figure FDA0000457319500000047
for flow seasonal effect in time series average, σ is flow seasonal effect in time series sample standard deviation;
In described degree of bias feature, the computing formula of the degree of bias is:
Figure FDA0000457319500000048
wherein x tfor flow time series, t=1,2 ... T, T is flow seasonal effect in time series length,
Figure FDA00004573195000000412
for flow seasonal effect in time series average, σ is flow seasonal effect in time series sample standard deviation;
Described auto-correlation coefficient feature is weighed by Ljung-Box-Q statistic, and the computing formula of Ljung-Box-Q statistic is: wherein T is flow seasonal effect in time series length, and p is the maximum hysteresis exponent number being considered, and τ is hysteresis issue, r τfor flow seasonal effect in time series auto-correlation coefficient; r τcomputing formula be:
Figure FDA00004573195000000410
wherein x tfor flow time series, t=1,2 ... T,
Figure FDA00004573195000000411
for flow seasonal effect in time series average;
Described nonlinear characteristic reflects by BDS test statistics, for flow time series x t, t=1,2 ... T, moment s, the observed value of w is x sand x w, all observed value (x s, x w) by being configured to: { (x s, x w), (x s+1, x w+1), (x s+2, x w+2) ... (x s+m-1, x w+m-1), wherein m embeds interval; The computing formula of BDS statistic is: W ( N , m , r ) = N C ( N , m , r ) - C ( N , 1 , r ) m &sigma; &prime; ( N , m , r ) , Wherein r is interval size, and C (N, m, r) is correlation intergal, σ ' (N, m, r) be C (N, m, r)-C (N, 1, r) mthe estimation of progressive standard deviation; C (N, m, r) computing formula is: C ( N , m , r ) = 2 N ( N - 1 ) &Sigma; w < s H ( r - | | x w m - x s m | | ) , Wherein x w m , x s m All m dimensional vectors, x w m = ( x w - m + 1 , &CenterDot; &CenterDot; &CenterDot; , x w ) , x s m = ( x s - m + 1 , &CenterDot; &CenterDot; &CenterDot; , x s ) H ( r - | | x t m - x s m | | ) = 0 r - | | x t m - x s m | | &le; 0 1 r - | | x t m - x s m | | > 0 ;
Described spectrum signature is the front second order coefficient of the discrete Fourier transform (DFT) of extraction.
10. network traffic analysis according to claim 6 and prediction unit, it is characterized in that, described cluster comprises Kmeans cluster, using extracted global characteristics as new characteristic vector, the corresponding new characteristic vector of flow time series of each base station to be measured, carries out K-means cluster to new characteristic vector.
CN201410019136.7A 2014-01-15 2014-01-15 Network traffic analysis and Forecasting Methodology and device Active CN103747477B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410019136.7A CN103747477B (en) 2014-01-15 2014-01-15 Network traffic analysis and Forecasting Methodology and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410019136.7A CN103747477B (en) 2014-01-15 2014-01-15 Network traffic analysis and Forecasting Methodology and device

Publications (2)

Publication Number Publication Date
CN103747477A true CN103747477A (en) 2014-04-23
CN103747477B CN103747477B (en) 2017-08-25

Family

ID=50504455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410019136.7A Active CN103747477B (en) 2014-01-15 2014-01-15 Network traffic analysis and Forecasting Methodology and device

Country Status (1)

Country Link
CN (1) CN103747477B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016193851A1 (en) * 2015-05-29 2016-12-08 International Business Machines Corporation Estimating computational resources for running data-mining services
CN107135126A (en) * 2017-05-22 2017-09-05 安徽师范大学 Flow on-line identification method based on subflow fractal index
CN107517166A (en) * 2016-06-16 2017-12-26 中兴通讯股份有限公司 Flow control methods, device and access device
CN108770002A (en) * 2018-04-27 2018-11-06 广州杰赛科技股份有限公司 Base station flow analysis method, device, equipment and storage medium
TWI641251B (en) * 2016-11-18 2018-11-11 財團法人工業技術研究院 Method and system for monitoring network flow
CN108960537A (en) * 2018-08-17 2018-12-07 安吉汽车物流股份有限公司 The prediction technique and device of logistics order, readable medium
CN110098944A (en) * 2018-01-29 2019-08-06 中国科学院声学研究所 A method of protocol data flow is predicted based on FP-Growth and RNN
CN111935766A (en) * 2020-09-15 2020-11-13 之江实验室 Wireless network flow prediction method based on global spatial dependency
CN112235152A (en) * 2020-09-04 2021-01-15 北京邮电大学 Flow size estimation method and device
CN113037577A (en) * 2019-12-09 2021-06-25 中国电信股份有限公司 Network traffic prediction method, device and computer readable storage medium
CN113225824A (en) * 2021-04-28 2021-08-06 辽宁邮电规划设计院有限公司 Device and method for automatically allocating bandwidths with different service requirements based on 5G technology
CN114330145A (en) * 2022-03-01 2022-04-12 北京蚂蚁云金融信息服务有限公司 Method and device for analyzing sequence based on probability map model
CN114793197A (en) * 2022-03-29 2022-07-26 广州杰赛科技股份有限公司 Network resource configuration method, device, equipment and storage medium based on NFV
CN115949891A (en) * 2023-03-09 2023-04-11 天津佰焰科技股份有限公司 Intelligent control system and control method for LNG (liquefied Natural gas) filling station

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050213514A1 (en) * 2004-03-23 2005-09-29 Ching-Fong Su Estimating and managing network traffic
CN101252541A (en) * 2008-04-09 2008-08-27 中国科学院计算技术研究所 Method for establishing network flow classified model and corresponding system thereof
CN103227999A (en) * 2013-05-02 2013-07-31 中国联合网络通信集团有限公司 Network traffic prediction method and device
CN103368811A (en) * 2012-04-06 2013-10-23 华为终端有限公司 Bandwidth distribution method and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050213514A1 (en) * 2004-03-23 2005-09-29 Ching-Fong Su Estimating and managing network traffic
CN101252541A (en) * 2008-04-09 2008-08-27 中国科学院计算技术研究所 Method for establishing network flow classified model and corresponding system thereof
CN103368811A (en) * 2012-04-06 2013-10-23 华为终端有限公司 Bandwidth distribution method and equipment
CN103227999A (en) * 2013-05-02 2013-07-31 中国联合网络通信集团有限公司 Network traffic prediction method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
何建: "基于时间序列的网络流量分析与预测", 《中国科技信息》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2554323A (en) * 2015-05-29 2018-03-28 Ibm Estimating computational resources for running data-mining services
US11138193B2 (en) 2015-05-29 2021-10-05 International Business Machines Corporation Estimating the cost of data-mining services
WO2016193851A1 (en) * 2015-05-29 2016-12-08 International Business Machines Corporation Estimating computational resources for running data-mining services
US10417226B2 (en) 2015-05-29 2019-09-17 International Business Machines Corporation Estimating the cost of data-mining services
US10585885B2 (en) 2015-05-29 2020-03-10 International Business Machines Corporation Estimating the cost of data-mining services
CN107517166A (en) * 2016-06-16 2017-12-26 中兴通讯股份有限公司 Flow control methods, device and access device
TWI641251B (en) * 2016-11-18 2018-11-11 財團法人工業技術研究院 Method and system for monitoring network flow
US10153952B2 (en) 2016-11-18 2018-12-11 Industrial Technology Research Institute Network traffic monitoring system and method thereof
CN107135126A (en) * 2017-05-22 2017-09-05 安徽师范大学 Flow on-line identification method based on subflow fractal index
CN107135126B (en) * 2017-05-22 2020-03-24 安徽师范大学 Flow online identification method based on sub-flow fractal index
CN110098944B (en) * 2018-01-29 2020-09-08 中国科学院声学研究所 Method for predicting protocol data traffic based on FP-Growth and RNN
CN110098944A (en) * 2018-01-29 2019-08-06 中国科学院声学研究所 A method of protocol data flow is predicted based on FP-Growth and RNN
CN108770002B (en) * 2018-04-27 2021-08-10 广州杰赛科技股份有限公司 Base station flow analysis method, device, equipment and storage medium
CN108770002A (en) * 2018-04-27 2018-11-06 广州杰赛科技股份有限公司 Base station flow analysis method, device, equipment and storage medium
CN108960537B (en) * 2018-08-17 2020-10-13 安吉汽车物流股份有限公司 Logistics order prediction method and device and readable medium
CN108960537A (en) * 2018-08-17 2018-12-07 安吉汽车物流股份有限公司 The prediction technique and device of logistics order, readable medium
CN113037577A (en) * 2019-12-09 2021-06-25 中国电信股份有限公司 Network traffic prediction method, device and computer readable storage medium
CN112235152A (en) * 2020-09-04 2021-01-15 北京邮电大学 Flow size estimation method and device
CN111935766A (en) * 2020-09-15 2020-11-13 之江实验室 Wireless network flow prediction method based on global spatial dependency
CN113225824A (en) * 2021-04-28 2021-08-06 辽宁邮电规划设计院有限公司 Device and method for automatically allocating bandwidths with different service requirements based on 5G technology
CN114330145A (en) * 2022-03-01 2022-04-12 北京蚂蚁云金融信息服务有限公司 Method and device for analyzing sequence based on probability map model
CN114330145B (en) * 2022-03-01 2022-07-12 北京蚂蚁云金融信息服务有限公司 Method and device for analyzing sequence based on probability map model
CN114793197A (en) * 2022-03-29 2022-07-26 广州杰赛科技股份有限公司 Network resource configuration method, device, equipment and storage medium based on NFV
CN114793197B (en) * 2022-03-29 2023-09-19 广州杰赛科技股份有限公司 Network resource allocation method, device, equipment and storage medium based on NFV
CN115949891A (en) * 2023-03-09 2023-04-11 天津佰焰科技股份有限公司 Intelligent control system and control method for LNG (liquefied Natural gas) filling station

Also Published As

Publication number Publication date
CN103747477B (en) 2017-08-25

Similar Documents

Publication Publication Date Title
CN103747477A (en) Network flow analysis and prediction method and device
Hu et al. A hybrid model based on CNN and Bi-LSTM for urban water demand prediction
CN108022001A (en) Short term probability density Forecasting Methodology based on PCA and quantile estimate forest
CN104933622A (en) Microblog popularity degree prediction method based on user and microblog theme and microblog popularity degree prediction system based on user and microblog theme
CN105260803A (en) Power consumption prediction method for system
CN103514369A (en) Regression analysis system and method based on active learning
CN106600037B (en) Multi-parameter auxiliary load prediction method based on principal component analysis
CN103942457A (en) Water quality parameter time series prediction method based on relevance vector machine regression
Cheng et al. Enhanced state estimation and bad data identification in active power distribution networks using photovoltaic power forecasting
Liu et al. Heating load forecasting for combined heat and power plants via strand-based LSTM
CN103268519A (en) Electric power system short-term load forecast method and device based on improved Lyapunov exponent
CN104408667A (en) Method and system for comprehensively evaluating power quality
CN103678869A (en) Prediction and estimation method of flight parameter missing data
CN110991263A (en) Non-invasive load identification method and system for resisting background load interference
CN113283155B (en) Near-surface air temperature estimation method, system, storage medium and equipment
CN108734216A (en) Classification of power customers method, apparatus and storage medium based on load curve form
CN116128141B (en) Storm surge prediction method and device, storage medium and electronic equipment
Zhao et al. Short-term microgrid load probability density forecasting method based on k-means-deep learning quantile regression
CN114692981B (en) Method and system for forecasting medium-long-term runoff based on Seq2Seq model
CN102305792A (en) Nonlinear partial least square optimizing model-based forest carbon sink remote sensing evaluation method
CN111428421A (en) Rainfall runoff simulation method for deep learning guided by physical mechanism
CN102156641A (en) Prediction method and system for confidence interval of software cost
CN103353295A (en) Method for accurately predicating vertical deformation of dam body
Jiang et al. Medium-long term load forecasting method considering industry correlation for power management
CN103020733A (en) Method and system for predicting single flight noise of airport based on weight

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant