CN110210658A - Prophet and Gaussian process user network method for predicting based on wavelet transformation - Google Patents
Prophet and Gaussian process user network method for predicting based on wavelet transformation Download PDFInfo
- Publication number
- CN110210658A CN110210658A CN201910427803.8A CN201910427803A CN110210658A CN 110210658 A CN110210658 A CN 110210658A CN 201910427803 A CN201910427803 A CN 201910427803A CN 110210658 A CN110210658 A CN 110210658A
- Authority
- CN
- China
- Prior art keywords
- test
- user network
- frequency subsequence
- gaussian process
- prophet
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
本发明提供一种基于小波变换的Prophet与高斯过程用户网络流量预测方法。针对用户网络流量时间序列的非平稳性、时变性等复杂特性,采用小波变换对用户网络流量时间序列进行预处理分析。经过小波变换后得到高频子序列与低频子序列,其中高频子序列反映了用户网络流量时间序列的突变性与无规律的波动性特征,而低频子序列则反映了用户网络流量时间序列的周期性与长期依赖特性。本发明针对高频子序列与低频子序列的特点,分别应用Prophet模型预测低频子序列,用高斯过程回归模型预测高频子序列,最后再进行离散小波逆变换,重构得到最终的网络流量预测结果。本发明所提出的预测方法,可以有效提高用户网络流量预测准确度。
The invention provides a wavelet transform-based Prophet and Gaussian process user network traffic prediction method. Aiming at the complex characteristics of user network traffic time series such as non-stationarity and time-varying, wavelet transform is used to preprocess and analyze the user network traffic time series. After wavelet transform, high-frequency subsequences and low-frequency subsequences are obtained. The high-frequency subsequences reflect the abruptness and irregular fluctuation characteristics of the user network traffic time series, while the low-frequency subsequences reflect the user network traffic time series. Cyclical and long-term dependencies. Aiming at the characteristics of high-frequency sub-sequences and low-frequency sub-sequences, the present invention uses the Prophet model to predict the low-frequency sub-sequences, uses the Gaussian process regression model to predict the high-frequency sub-sequences, and finally performs inverse discrete wavelet transform to reconstruct the final network traffic prediction. result. The prediction method proposed by the present invention can effectively improve the prediction accuracy of user network traffic.
Description
技术领域technical field
本发明属于无线通信技术领域,具体涉及一种基于小波变换的Prophet与高斯过程用户网络数据流量预测方法。The invention belongs to the technical field of wireless communication, and in particular relates to a method for predicting the data flow of a user network based on wavelet transform of Prophet and Gaussian process.
背景技术Background technique
近年来,移动通信行业飞速发展,使得用户无线接入需求大幅增长。用户数激增使得网络流量需求升高,用户的使用观感随之下降。如何保证整体网络稳定,保障QoE(Quality of Experience,用户体验)成为了移动通信网络运营商面临的巨大挑战。对个人用户未来流量的准确预测有助于实现无线网络的自优化,提高效率,提供最优质的用户体验。现有对用户流量预测方法较多为自回归滑动平均模型。然而该方法更适用平稳序列的短期预测,个人用户网络流量数据常常存在突发性特征,造成预测的准确度不是很高。In recent years, the rapid development of the mobile communication industry has resulted in a substantial increase in the demand for wireless access of users. The surge in the number of users increases the demand for network traffic, and the user's perception of use decreases. How to ensure overall network stability and QoE (Quality of Experience, user experience) has become a huge challenge for mobile communication network operators. Accurate prediction of future traffic of individual users helps to achieve self-optimization of wireless networks, improve efficiency, and provide the best user experience. Most of the existing methods for predicting user traffic are autoregressive moving average models. However, this method is more suitable for short-term prediction of stationary series, and the network traffic data of individual users often has sudden characteristics, resulting in a low prediction accuracy.
发明内容SUMMARY OF THE INVENTION
本发明所要解决的技术问题在于针对背景技术所指出的不足,提供一种基于小波变换的Prophet与高斯过程的个人用户网络流量预测方法,针对高频子序列与低频子序列的特点,分别应用Prophet模型预测低频子序列,用高斯过程回归模型预测高频子序列,最后再进行离散小波逆变换,重构得到最终的网络流量预测结果,可以有效提高用户网络流量预测准确度。The technical problem to be solved by the present invention is to provide a method for predicting network traffic of individual users based on Prophet and Gaussian process based on wavelet transform, aiming at the deficiencies pointed out in the background technology. The model predicts low-frequency subsequences, uses Gaussian process regression model to predict high-frequency subsequences, and finally performs inverse discrete wavelet transform to reconstruct the final network traffic prediction result, which can effectively improve the accuracy of user network traffic prediction.
本发明为解决上述技术问题而采用以下技术方案实现:The present invention adopts the following technical scheme to realize in order to solve the above-mentioned technical problems:
一种基于小波变换的Prophet与高斯过程用户网络流量预测方法,步骤如下:A wavelet transform-based Prophet and Gaussian process user network traffic prediction method, the steps are as follows:
步骤1、采用小波变换对用户网络流量时间序列进行数据预处理分析,得到高频子序列与低频子序列,其中高频子序列用于反映用户网络流量时间序列的突变性与无规律的波动性特征,低频子序列用于反映用户网络流量时间序列的周期性与长期依赖特性;Step 1. Use wavelet transform to perform data preprocessing analysis on the user network traffic time series to obtain high frequency subsequences and low frequency subsequences, where the high frequency subsequences are used to reflect the abruptness and irregular volatility of the user network traffic time series The low-frequency subsequence is used to reflect the periodicity and long-term dependence of the user network traffic time series;
步骤2、用Prophet模型预测低频子序列,用高斯过程回归模型预测高频子序列;Step 2. Use the Prophet model to predict the low-frequency subsequence, and use the Gaussian process regression model to predict the high-frequency subsequence;
步骤3、进行离散小波逆变换,重构得到最终的网络流量预测结果。Step 3: Perform inverse discrete wavelet transform and reconstruct to obtain the final network traffic prediction result.
进一步的,本发明所提出的一种基于小波变换的Prophet与高斯过程用户网络流量预测方法,步骤1具体包括如下步骤:Further, in a method for predicting user network traffic based on wavelet transform based on Prophet and Gaussian process, step 1 specifically includes the following steps:
(1)获取单个用户网络业务流量数据时间序列,统计其在每个时隙内使用的流量,得到用户网络流量时间序列u(t),t=1,2,...,L,其中u(t)为用户在时隙t中使用的流量;(1) Obtain the time series of network traffic data of a single user, count the traffic used in each time slot, and obtain the user network traffic time series u(t), t=1,2,...,L, where u (t) is the flow used by the user in time slot t;
(2)对用户流量数据进行尺度压缩,即对u(t)进行以下处理:(2) Scale the user traffic data, that is, perform the following processing on u(t):
z(t)=log10(u(t)+1) (1)z(t)=log 10 (u(t)+1) (1)
式中,z(t)为尺度压缩后的用户网络流量时间序列;In the formula, z(t) is the time series of user network traffic after scale compression;
(3)对时间序列z(t)进行预处理使均值为0:(3) Preprocess the time series z(t) to make the mean 0:
式中,z′(t)为零均值化处理后的时间序列,为时间序列z(t)的平均值, In the formula, z'(t) is the time series after zero-average processing, is the mean value of the time series z(t),
(4)对z′(t)做离散小波变换,得到低频子序列c(n)与高频子序列d(n);(4) Do discrete wavelet transform on z'(t) to obtain low-frequency subsequence c(n) and high-frequency subsequence d(n);
对预处理后的用户网络流量数据序列z′(t),t=1,2,...,L进行一阶小波分解,得到子序列c(n)和d(n):First-order wavelet decomposition is performed on the preprocessed user network traffic data sequence z'(t), t=1, 2,...,L, and subsequences c(n) and d(n) are obtained:
其中和ψ(t)分别是尺度函数和小波函数,由小波基决定;c(n)为低频子序列,包含序列的低频信息,称为近似系数;d(n)为高频子序列,包含信号的高频信息,称为细节系数;经过一阶小波分解后得到的高频子序列与低频子序列长度均为L/2,即n=1,2,...,L/2。in and ψ(t) are the scale function and the wavelet function respectively, which are determined by the wavelet basis; c(n) is the low-frequency subsequence, which contains the low-frequency information of the sequence, which is called the approximate coefficient; d(n) is the high-frequency subsequence, which contains the signal The high-frequency information of , is called the detail coefficient; the length of the high-frequency subsequence and the low-frequency subsequence obtained after the first-order wavelet decomposition is L/2, that is, n=1,2,...,L/2.
进一步的,本发明所提出的一种基于小波变换的Prophet与高斯过程用户网络流量预测方法,步骤2对高频子序列d(n)进行高斯过程回归和预测,应用高斯过程回归模型对其建模,得到高频子序列的预测结果包括如下流程:Further, in a method for predicting user network traffic based on wavelet transform based on Prophet and Gaussian process, step 2 performs Gaussian process regression and prediction on the high-frequency subsequence d(n), and applies the Gaussian process regression model to construct it. modulo to get the prediction result of the high frequency subsequence Including the following processes:
(1)回归模型样本数据构建:对任意i=1,2,...,L/2,回归模型的输入样本为xi={d(i-m),d(i-m+1),...,d(i-1)}T,回归模型的第i个输出样本为d(i);其中m取值视具体数据而定;由此,构建样本数据集合D=(X,d),其中X为输入样本集合,d为输出样本集合;(1) Construction of regression model sample data: For any i=1,2,...,L/2, the input sample of the regression model is x i ={d(im),d(i-m+1),. ..,d(i-1)} T , the ith output sample of the regression model is d(i); the value of m depends on the specific data; thus, construct the sample data set D=(X,d) , where X is the input sample set, and d is the output sample set;
式中T表示转置操作;where T represents the transpose operation;
(2)划分训练样本集合与测试样本集合:将D=(X,d)前80%的数据作为训练样本集合Dtrain=(Xtrain,dtrain),后20%作为测试样本集合Dtest=(Xtest,dtest);(2) Divide the training sample set and the test sample set: take the first 80% of the data of D=(X,d) as the training sample set D train =(X train ,d train ), and the last 20% as the test sample set D test = (X test ,d test );
(3)高斯回归模型参数确定:选取平方指数协方差函数SE作为高斯过程协方差函数,如下所示:(3) Determination of Gaussian regression model parameters: Select the square exponential covariance function SE as the Gaussian process covariance function, as shown below:
其中θ为超参数,可用最大似然法获得最优超参数θML:Where θ is a hyperparameter, and the optimal hyperparameter θ ML can be obtained by the maximum likelihood method:
θML=argmin(-logp(dtrain|Xtrain,θ)) (8)θ ML = argmin(-logp(d train |X train ,θ)) (8)
(4)建立高斯过程回归模型:可认为dtrain服从高斯过程,表示为:(4) Establish a Gaussian process regression model: it can be considered that d train obeys a Gaussian process, which is expressed as:
其中,GP表示高斯过程,为噪声方差,δij为克罗内克函数,当i=j时,δij=1;测试集合输出样本dtest的后验分布服从高斯分布:where GP stands for a Gaussian process, is the noise variance, δ ij is the Kronecker function, when i=j, δ ij =1; the posterior distribution of the test set output sample d test obeys the Gaussian distribution:
dtest|Xtrain,dtrain,Xtest~N(μtest,Σtest)(10)d test |X train ,d train ,X test ~N(μtest,Σ test )(10)
其中μtest为测试集合输出样本的均值,选其作为测试集合输出样本的估计值;Σtest为测试集合输出样本的方差,分别为:Among them μ test is the mean value of the output samples of the test set, which is selected as the estimated value of the output samples of the test set; Σ test is the variance of the output samples of the test set, respectively:
式中K(Xtrain,Xtest)=K(Xtest,Xtrain)T为测试集合输入样本与训练集合输入样本之间的协方差矩阵,K(Xtest,Xtest)为Xtest自身的协方差,In为单位矩阵;A-1表示对矩阵A求逆矩阵;In the formula, K(X train , X test )=K(X test , X train ) T is the covariance matrix between the input samples of the test set and the input samples of the training set, and K(X test , X test ) is the value of X test itself Covariance, In is the identity matrix; A -1 represents the inverse matrix of matrix A;
(5)训练集合与测试集合输出样本的预测值分别为:(5) The predicted values of the training set and the test set output samples are:
因此,得到高频子序列的预测值Therefore, the predicted value of the high frequency subsequence is obtained
进一步的,本发明所提出的一种基于小波变换的Prophet与高斯过程用户网络流量预测方法,步骤2对于小波分解后得到的低频子序列c(n),应用Prophet模型建模并预测,得到低频子序列预测结果 Further, in a method for predicting user network traffic based on Prophet and Gaussian process based on wavelet transform proposed by the present invention, in step 2, for the low-frequency subsequence c(n) obtained after wavelet decomposition, the Prophet model is used to model and predict, and the low-frequency sub-sequence c(n) is obtained. Subsequence prediction results
对低频子序列c(n)分解成g(n),s(n),h(n)之和,即:Decompose the low-frequency subsequence c(n) into the sum of g(n), s(n), and h(n), namely:
c(n)=g(n)+s(n)+h(n)+εn (15)c(n)=g(n)+s(n)+h( n )+εn (15)
其中c(n)表示原有低频子序列,g(n)是用户网络流量时间序列中的趋势项,表示用户网络流量时间序列非周期性的变化,周期项s(n)刻画用户网络流量时间序列周期性变化,h(n)代表特殊节假日对用户网络流量时间序列值的影响,错误项εn代表模型无法捕捉的特殊变化,并假设其服从正态分布;Among them, c(n) represents the original low-frequency subsequence, g(n) is the trend item in the user network traffic time series, which represents the aperiodic change of the user network traffic time series, and the periodic term s(n) describes the user network traffic time The sequence changes periodically, h(n) represents the impact of special holidays on the time series value of user network traffic, and the error term εn represents the special changes that cannot be captured by the model, and it is assumed that it obeys a normal distribution;
其中趋势项g(n)如下公式所示:The trend term g(n) is shown in the following formula:
其中C为承载能力,指时间序列曲线的最大渐进值,由市场规模的数据或者专业领域知识来决定;k表示曲线的增长速率,p为偏移量参数;Among them, C is the carrying capacity, which refers to the maximum asymptotic value of the time series curve, which is determined by the market scale data or professional domain knowledge; k is the growth rate of the curve, and p is the offset parameter;
周期项s(n)由下式给出:The periodic term s(n) is given by:
其中P代表目标序列的周期,cl为模型要估计的参数,2N为设定的近似项个数,用于控制滤波程度;Among them, P represents the period of the target sequence, c l is the parameter to be estimated by the model, and 2N is the number of approximate items set to control the degree of filtering;
节假日项h(n)可表示为:The holiday item h(n) can be expressed as:
其中,对于第i个节假日而言,Di表示该节假日产生影响的时间段,定义一个指示性函数1,表示时刻n是否处于节假日i的影响时段内;如果n∈Di,否则为0;并且为每个节假日设置一个参数κi来表示节假日的影响范围,κi∈N(0,υi 2);假设存在M个节假日, Among them, for the ith holiday, D i represents the time period during which the holiday has an impact, and an indicative function 1 is defined to indicate whether time n is within the impact period of holiday i; If n∈D i , otherwise it is 0; and set a parameter κ i for each holiday to represent the influence range of the holiday, κ i ∈ N(0,υ i 2 ); assuming there are M holidays,
利用Prophet算法分别拟合趋势项、周期项与节假日项中的参数,然后将拟合结果求和得到用户网络流量低频子序列的预测值即:Use the Prophet algorithm to fit the parameters in the trend item, the period item and the holiday item respectively, and then fit the fitting results. Summation to get the predicted value of the low-frequency subsequence of user network traffic which is:
分别表示。由于Prophet模型的拟合过程为一个现有成熟技术,在此不再赘述。Respectively. Since the fitting process of the Prophet model is an existing mature technology, it will not be repeated here.
进一步的,本发明所提出的一种基于小波变换的Prophet与高斯过程用户网络流量预测方法,对上述高频子序列与低频子序列的预测结果进行离散小波逆变换,重构得到 Further, a method for predicting user network traffic based on Prophet and Gaussian process based on wavelet transform proposed by the present invention performs discrete wavelet inverse transform on the prediction results of the above-mentioned high-frequency sub-sequences and low-frequency sub-sequences, and reconstructs the result.
式中和分别表示低频子序列与高频子序列的预测值;对消除零均值化影响并取指数,恢复到网络流量原来的尺度,得到用户网络流量时间序列最终的预测结果 in the formula and represent the predicted values of low-frequency subsequences and high-frequency subsequences, respectively; Eliminate the effect of zero averaging and take the index, restore the original scale of network traffic, and obtain the final prediction result of the user network traffic time series
本发明采用以上技术方案,与现有技术相比具有以下有益效果:The present invention adopts the above technical scheme, and has the following beneficial effects compared with the prior art:
本发明提出了一种基于小波变换得Prophet与高斯过程回归组合预测模型,采用小波分解将用户网络流量数据时间序列分解为表征长期趋势变化的低频部分和表征随机突变的高频部分,进一步分别采用Prophet模型以及高斯过程回归模型进行预测建模,具有较好的预测效果。与传统的自回归滑动平均模型相比,该方法能更好地捕捉时间序列的突变特性,更适用于用户的流量预测。The present invention proposes a combined prediction model of Prophet and Gaussian process regression based on wavelet transform. The time series of user network traffic data is decomposed into low frequency parts representing long-term trend changes and high frequency parts representing random mutations by using wavelet decomposition. Prophet model and Gaussian process regression model are used for predictive modeling, which have good predictive effects. Compared with the traditional autoregressive moving average model, this method can better capture the mutation characteristics of time series, and is more suitable for user traffic forecasting.
附图说明Description of drawings
图1是本发明的方法流程图。FIG. 1 is a flow chart of the method of the present invention.
具体实施方式Detailed ways
下面结合附图及具体实施例,对本发明作进一步的详细描述。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
本发明提供一种基于小波变换的Prophet与高斯过程用户流量预测方法。针对用户网络流量时间序列的非平稳性、时变性等复杂特性,采用小波变换对用户网络流量时间序列进行预处理分析。经过小波变换后得到高频子序列与低频子序列,其中高频子序列反映了用户网络流量时间序列的突变性与无规律的波动性特征,而低频子序列则反映了用户网络流量时间序列的周期性与长期依赖特性。本发明针对高频子序列与低频子序列的特点,分别应用Prophet模型预测低频子序列,用高斯过程回归模型预测高频子序列,最后再进行离散小波逆变换,重构得到最终的网络流量预测结果。The present invention provides a method for predicting user traffic based on Prophet and Gaussian process based on wavelet transform. Aiming at the complex characteristics of user network traffic time series such as non-stationarity and time-varying, wavelet transform is used to preprocess and analyze the user network traffic time series. After wavelet transform, high-frequency subsequences and low-frequency subsequences are obtained. The high-frequency subsequences reflect the abruptness and irregular fluctuation characteristics of the user network traffic time series, while the low-frequency subsequences reflect the user network traffic time series. Cyclical and long-term dependencies. According to the characteristics of high frequency subsequence and low frequency subsequence, the present invention uses Prophet model to predict low frequency subsequence, uses Gaussian process regression model to predict high frequency subsequence, and finally performs inverse discrete wavelet transform to reconstruct the final network traffic prediction. result.
如图1所示,本发明的方法具体包括如下步骤:As shown in Figure 1, the method of the present invention specifically comprises the following steps:
第一步:数据预处理,本步骤包括如下流程:The first step: data preprocessing, this step includes the following processes:
(1)获取单个用户网络业务流量数据时间序列。例如以一小时或一天为一时隙,统计其在每个时隙内使用的流量,得到用户网络流量时间序列u(t),t=1,2,...,L,其中u(t)为用户在时隙t中使用的流量。(1) Obtain the time series of network traffic data of a single user. For example, take one hour or one day as a time slot, count the traffic used in each time slot, and obtain the user network traffic time series u(t), t=1, 2,...,L, where u(t) is the traffic used by the user in time slot t.
(2)对用户流量数据进行尺度压缩,即对u(t)进行以下处理:(2) Scale the user traffic data, that is, perform the following processing on u(t):
z(t)=log10(u(t)+1) (1)z(t)=log 10 (u(t)+1) (1)
式中,z(t)为尺度压缩后的用户网络流量时间序列。In the formula, z(t) is the time series of user network traffic after scale compression.
(3)对时间序列z(t)进行预处理使均值为0:(3) Preprocess the time series z(t) to make the mean 0:
式中,z′(t)为零均值化处理后的时间序列,为时间序列z(t)的平均值, In the formula, z'(t) is the time series after zero-average processing, is the mean value of the time series z(t),
第二步:对z′(t)做离散小波变换,得到低频子序列c(n)与高频子序列d(n)。Step 2: Do discrete wavelet transform on z'(t) to obtain low-frequency subsequence c(n) and high-frequency subsequence d(n).
对预处理后的用户网络流量数据序列z′(t),t=1,2,...,L进行一阶小波分解,得到子序列c(n)和d(n):First-order wavelet decomposition is performed on the preprocessed user network traffic data sequence z'(t), t=1, 2,...,L, and subsequences c(n) and d(n) are obtained:
其中和ψ(t)分别是尺度函数和小波函数,由小波基决定。考虑到对称性与正则性,选择db4小波作为小波基。c(n)为低频子序列,包含序列的低频信息,称为近似系数。d(n)为高频子序列,包含信号的高频信息,称为细节系数。经过一阶小波分解后得到的高频子序列与低频子序列长度均为L/2,即n=1,2,...,L/2。in and ψ(t) are the scale function and the wavelet function, respectively, determined by the wavelet basis. Considering the symmetry and regularity, the db4 wavelet is chosen as the wavelet base. c(n) is a low-frequency subsequence, which contains the low-frequency information of the sequence, and is called an approximation coefficient. d(n) is the high frequency subsequence, which contains the high frequency information of the signal, which is called the detail coefficient. The length of the high frequency subsequence and the low frequency subsequence obtained after the first-order wavelet decomposition is L/2, that is, n=1,2,...,L/2.
第三步:对高频子序列d(n)进行高斯过程回归和预测。高频子序列d(n)可以认为该时间序列值均服从高斯分布,因此,可应用高斯过程回归模型对其建模,得到高频子序列的预测结果本步骤包括如下流程:Step 3: Perform Gaussian process regression and prediction on the high-frequency subsequence d(n). The high-frequency subsequence d(n) can be considered to obey the Gaussian distribution. Therefore, the Gaussian process regression model can be used to model it, and the prediction result of the high-frequency subsequence can be obtained. This step includes the following processes:
(1)回归模型样本数据构建。对任意i=1,2,...,L/2,回归模型的输入样本为xi={d(i-7),d(i-6),...,d(i-1)}T,回归模型的第i个输出样本为d(i)。由此,构建样本数据集合D=(X,d),其中X为输入样本集合,d为输出样本集合。(1) Regression model sample data construction. For any i=1,2,...,L/2, the input samples of the regression model are x i ={d(i-7),d(i-6),...,d(i-1) } T , the ith output sample of the regression model is d(i). Thus, the sample data set D=(X, d) is constructed, where X is the input sample set, and d is the output sample set.
式中T表示转置操作。where T represents the transpose operation.
(2)划分训练样本集合与测试样本集合。将D=(X,d)前80%的数据作为训练样本集合Dtrain=(Xtrain,dtrain),后20%作为测试样本集合Dtest=(Xtest,dtest)。(2) Divide the training sample set and the test sample set. The first 80% of the data of D=(X, d) are used as a training sample set D train = (X train , d train ), and the last 20% are used as a test sample set D test = (X test , d test ).
(3)高斯回归模型参数确定。选取平方指数协方差函数(Squared exponentialcovariance function,SE)作为高斯过程协方差函数,如下所示:(3) The parameters of the Gaussian regression model are determined. Select the squared exponential covariance function (SE) as the Gaussian process covariance function as follows:
其中θ为超参数,可用最大似然法获得最优超参数θML:Where θ is a hyperparameter, and the optimal hyperparameter θ ML can be obtained by the maximum likelihood method:
θML=argmin(-logp(dtrain|Xtrain,θ)) (8)θ ML = argmin(-logp(d train |X train ,θ)) (8)
(4)建立高斯过程回归模型。可以认为dtrain服从高斯过程,表示为:(4) Establish a Gaussian process regression model. It can be considered that d train obeys a Gaussian process, expressed as:
其中,GP表示高斯过程,为噪声方差。δij为克罗内克函数,当i=j时,δij=1。where GP stands for a Gaussian process, is the noise variance. δ ij is a Kronecker function, and when i=j, δ ij =1.
测试集合输出样本dtest的后验分布服从高斯分布:d test|Xtrain,dtrain,Xtest~N(μtest,Σtest) (10)The posterior distribution of the test set output sample d test follows a Gaussian distribution: d test |X train ,d train ,X test ~N(μ test ,Σ test ) (10)
其中μtest为测试集合输出样本的均值,一般选其作为测试集合输出样本的估计值。Among them μ test is the mean value of the output samples of the test set, which is generally selected as the estimated value of the output samples of the test set.
Σtest为测试集合输出样本的方差,分别为:Σ test is the variance of the output samples of the test set, which are:
式中K(Xtrain,Xtest)=K(Xtest,Xtrain)T为测试集合输入样本与训练集合输入样本之间的协方差矩阵,K(Xtest,Xtest)为Xtest自身的协方差,In为单位矩阵。A-1表示对矩阵A求逆矩阵。In the formula, K(X train , X test )=K(X test , X train ) T is the covariance matrix between the input samples of the test set and the input samples of the training set, and K(X test , X test ) is the value of X test itself covariance, In is the identity matrix. A -1 means to inverse matrix A.
(5)训练集合与测试集合输出样本的预测值分别为:(5) The predicted values of the training set and the test set output samples are:
因此,得到高频子序列的预测值Therefore, the predicted value of the high frequency subsequence is obtained
第四步:对于小波分解后得到的低频子序列c(n),应用Prophet模型建模并预测,得到低频子序列预测结果 Step 4: For the low-frequency subsequence c(n) obtained after wavelet decomposition, use the Prophet model to model and predict, and obtain the prediction result of the low-frequency subsequence
对低频子序列c(n)分解成g(n),s(n),h(n)之和,即:Decompose the low-frequency subsequence c(n) into the sum of g(n), s(n), and h(n), namely:
c(n)=g(n)+s(n)+h(n)+εn (15)c(n)=g(n)+s(n)+h( n )+εn (15)
其中c(n)表示原有低频子序列,g(n)是用户网络流量时间序列中的趋势项,表示用户网络流量时间序列非周期性的变化,周期项s(n)刻画用户网络流量时间序列周期性变化,h(n)代表特殊节假日对用户网络流量时间序列值的影响。错误项εn代表模型无法捕捉的特殊变化,可以假设其服从正态分布。Among them, c(n) represents the original low-frequency subsequence, g(n) is the trend item in the user network traffic time series, which represents the aperiodic change of the user network traffic time series, and the periodic term s(n) describes the user network traffic time The sequence changes periodically, and h(n) represents the impact of special holidays on the time series value of user network traffic. The error term εn represents a special variation that cannot be captured by the model and can be assumed to follow a normal distribution.
其中趋势项g(n)如下公式所示:The trend term g(n) is shown in the following formula:
其中C为承载能力,指时间序列曲线的最大渐进值,例如总市场规模,总人口数等。通常这个值由市场规模的数据或者专业领域知识来决定。k表示曲线的增长速率,p为偏移量参数,该参数一般由Prophet算法自动拟合得出。设定承载能力C为用户历史使用流量最大值的5倍。Among them, C is the carrying capacity, which refers to the maximum asymptotic value of the time series curve, such as the total market size, the total population, etc. Usually this value is determined by market size data or domain expertise. k represents the growth rate of the curve, and p is the offset parameter, which is generally obtained by automatic fitting of the Prophet algorithm. Set the bearer capacity C to 5 times the maximum historical traffic used by the user.
周期项s(n)由下式给出:The periodic term s(n) is given by:
其中P代表目标序列的周期,cl为模型要估计的参数,2N为设定的近似项个数,用于控制滤波程度。将P设置为7,对应N取值通常为3。Among them, P represents the period of the target sequence, cl is the parameter to be estimated by the model, and 2N is the number of approximate items set to control the degree of filtering. Set P to 7, and the corresponding N value is usually 3.
节假日项h(n)可以表示为:The holiday item h(n) can be expressed as:
其中,对于第i个节假日而言,Di表示该节假日产生影响的时间段。定义一个指示性函数1,表示时刻n是否处于节假日i的影响时段内。如果n∈Di,否则为0。并且为每个节假日设置一个参数κi来表示节假日的影响范围,κi∈N(0,υi 2)。假设存在M个节假日, Among them, for the ith holiday, D i represents the time period during which the holiday has an impact. Define an indicative function 1, indicating whether time n is within the influence period of holiday i. If n∈D i , 0 otherwise. And set a parameter κ i for each holiday to represent the influence range of the holiday, κ i ∈ N(0,υ i 2 ). Suppose there are M holidays,
利用Prophet算法分别拟合趋势项、周期项与节假日项中的参数,然后将拟合结果求和得到用户网络流量低频子序列的预测值即:Use the Prophet algorithm to fit the parameters in the trend item, the period item and the holiday item respectively, and then fit the fitting results. Summation to get the predicted value of the low-frequency subsequence of user network traffic which is:
分别表示。由于Prophet模型的拟合过程为一个现有成熟技术,在此不再赘述。Respectively. Since the fitting process of the Prophet model is an existing mature technology, it will not be repeated here.
第五步:对上述高频子序列与低频子序列的预测结果进行离散小波逆变换,重构得到 Step 5: Perform inverse discrete wavelet transform on the prediction results of the above high-frequency subsequences and low-frequency subsequences, and reconstruct to obtain
式中和分别表示低频子序列与高频子序列的预测值。对消除零均值化影响并取指数,恢复到网络流量原来的尺度,得到用户网络流量时间序列最终的预测结果 in the formula and represent the predicted values of the low-frequency subsequence and the high-frequency subsequence, respectively. right Eliminate the effect of zero averaging and take the index, restore the original scale of network traffic, and obtain the final prediction result of the user network traffic time series
本文中所描述的具体实施例仅仅是对本发明精神作举例说明。本发明所属技术领域的技术人员可以对所描述的具体实施例做各种各样的修改或补充或采用类似的方式替代,但并不会偏离本发明的精神或者超越所附权利要求书所定义的范围。The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the present invention pertains can make various modifications or additions to the described specific embodiments or substitute in similar manners, but will not deviate from the spirit of the present invention or go beyond the definition of the appended claims range.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910427803.8A CN110210658B (en) | 2019-05-22 | 2019-05-22 | Prophet and Gaussian process user network flow prediction method based on wavelet transformation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910427803.8A CN110210658B (en) | 2019-05-22 | 2019-05-22 | Prophet and Gaussian process user network flow prediction method based on wavelet transformation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110210658A true CN110210658A (en) | 2019-09-06 |
CN110210658B CN110210658B (en) | 2023-10-03 |
Family
ID=67788164
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910427803.8A Expired - Fee Related CN110210658B (en) | 2019-05-22 | 2019-05-22 | Prophet and Gaussian process user network flow prediction method based on wavelet transformation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110210658B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110768825A (en) * | 2019-10-16 | 2020-02-07 | 电子科技大学 | A business traffic prediction method based on network big data analysis |
CN110839016A (en) * | 2019-10-18 | 2020-02-25 | 平安科技(深圳)有限公司 | Abnormal flow monitoring method, device, equipment and storage medium |
CN110839253A (en) * | 2019-11-08 | 2020-02-25 | 西北工业大学青岛研究院 | Method for determining wireless grid network flow |
CN111537938A (en) * | 2020-03-31 | 2020-08-14 | 国网江西省电力有限公司电力科学研究院 | Error short-time prediction method for electronic transformer based on intelligent algorithm |
CN111563776A (en) * | 2020-05-08 | 2020-08-21 | 国网江苏省电力有限公司扬州供电分公司 | Electric quantity decomposition and prediction method based on K neighbor anomaly detection and Prophet model |
CN111884854A (en) * | 2020-07-29 | 2020-11-03 | 中国人民解放军空军工程大学 | Virtual network traffic migration method based on multi-mode hybrid prediction |
CN112232604A (en) * | 2020-12-09 | 2021-01-15 | 南京信息工程大学 | Prediction method of network traffic extraction based on Prophet model |
CN112436975A (en) * | 2020-10-09 | 2021-03-02 | 北京邮电大学 | Method and device for predicting heaven-earth integrated information network flow |
CN112994921A (en) * | 2019-12-17 | 2021-06-18 | 华为数字技术(苏州)有限公司 | Flow prediction method and related device |
CN113037531A (en) * | 2019-12-25 | 2021-06-25 | 中兴通讯股份有限公司 | Flow prediction method, device and storage medium |
CN113472551A (en) * | 2020-03-30 | 2021-10-01 | 中国电信股份有限公司 | Network flow prediction method, device and storage medium |
CN113849374A (en) * | 2021-09-28 | 2021-12-28 | 平安科技(深圳)有限公司 | CPU occupancy rate prediction method, system, electronic device and storage medium |
CN114756604A (en) * | 2022-06-13 | 2022-07-15 | 西南交通大学 | A prediction method of monitoring time series data based on Prophet combination model |
CN119299238A (en) * | 2024-12-12 | 2025-01-10 | 浙江大学 | A network traffic anomaly detection method based on online continuous learning |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109002904A (en) * | 2018-06-21 | 2018-12-14 | 中南大学 | A kind of medical amount prediction technique of the hospital outpatient based on Prophet-ARMA |
-
2019
- 2019-05-22 CN CN201910427803.8A patent/CN110210658B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109002904A (en) * | 2018-06-21 | 2018-12-14 | 中南大学 | A kind of medical amount prediction technique of the hospital outpatient based on Prophet-ARMA |
Non-Patent Citations (2)
Title |
---|
LAISEN NIE等: "Network Traffic Prediction Based on Deep Belief Network in Wireless Mesh Backbone Networks", 《2017 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE》 * |
SEAN J. TAYLOR等: "Forecasting at Scale", 《THE AMERICAN STATISTICIAN》 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110768825A (en) * | 2019-10-16 | 2020-02-07 | 电子科技大学 | A business traffic prediction method based on network big data analysis |
CN110839016A (en) * | 2019-10-18 | 2020-02-25 | 平安科技(深圳)有限公司 | Abnormal flow monitoring method, device, equipment and storage medium |
WO2021072887A1 (en) * | 2019-10-18 | 2021-04-22 | 平安科技(深圳)有限公司 | Abnormal traffic monitoring method and apparatus, and device and storage medium |
CN110839253A (en) * | 2019-11-08 | 2020-02-25 | 西北工业大学青岛研究院 | Method for determining wireless grid network flow |
CN112994921A (en) * | 2019-12-17 | 2021-06-18 | 华为数字技术(苏州)有限公司 | Flow prediction method and related device |
US11863397B2 (en) | 2019-12-25 | 2024-01-02 | Zte Corporation | Traffic prediction method, device, and storage medium |
CN113037531A (en) * | 2019-12-25 | 2021-06-25 | 中兴通讯股份有限公司 | Flow prediction method, device and storage medium |
CN113472551A (en) * | 2020-03-30 | 2021-10-01 | 中国电信股份有限公司 | Network flow prediction method, device and storage medium |
CN113472551B (en) * | 2020-03-30 | 2022-11-18 | 中国电信股份有限公司 | Network flow prediction method, device and storage medium |
CN111537938A (en) * | 2020-03-31 | 2020-08-14 | 国网江西省电力有限公司电力科学研究院 | Error short-time prediction method for electronic transformer based on intelligent algorithm |
CN111537938B (en) * | 2020-03-31 | 2022-12-09 | 国网江西省电力有限公司电力科学研究院 | A Short-term Prediction Method of Electronic Transformer Error Based on Intelligent Algorithm |
CN111563776A (en) * | 2020-05-08 | 2020-08-21 | 国网江苏省电力有限公司扬州供电分公司 | Electric quantity decomposition and prediction method based on K neighbor anomaly detection and Prophet model |
CN111884854A (en) * | 2020-07-29 | 2020-11-03 | 中国人民解放军空军工程大学 | Virtual network traffic migration method based on multi-mode hybrid prediction |
CN111884854B (en) * | 2020-07-29 | 2022-09-02 | 中国人民解放军空军工程大学 | Virtual network traffic migration method based on multi-mode hybrid prediction |
CN112436975B (en) * | 2020-10-09 | 2022-09-13 | 北京邮电大学 | A method and device for predicting the flow of information network in the integration of space and earth |
CN112436975A (en) * | 2020-10-09 | 2021-03-02 | 北京邮电大学 | Method and device for predicting heaven-earth integrated information network flow |
CN112232604B (en) * | 2020-12-09 | 2021-06-11 | 南京信息工程大学 | Prediction method for extracting network traffic based on Prophet model |
CN112232604A (en) * | 2020-12-09 | 2021-01-15 | 南京信息工程大学 | Prediction method of network traffic extraction based on Prophet model |
CN113849374A (en) * | 2021-09-28 | 2021-12-28 | 平安科技(深圳)有限公司 | CPU occupancy rate prediction method, system, electronic device and storage medium |
CN113849374B (en) * | 2021-09-28 | 2023-06-20 | 平安科技(深圳)有限公司 | CPU occupancy rate prediction method, system, electronic device and storage medium |
CN114756604A (en) * | 2022-06-13 | 2022-07-15 | 西南交通大学 | A prediction method of monitoring time series data based on Prophet combination model |
CN119299238A (en) * | 2024-12-12 | 2025-01-10 | 浙江大学 | A network traffic anomaly detection method based on online continuous learning |
Also Published As
Publication number | Publication date |
---|---|
CN110210658B (en) | 2023-10-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110210658A (en) | Prophet and Gaussian process user network method for predicting based on wavelet transformation | |
CN110232203A (en) | Knowledge distillation optimization RNN has a power failure prediction technique, storage medium and equipment in short term | |
Tripathi et al. | An efficient data characterization and reduction scheme for smart metering infrastructure | |
CN117278073B (en) | Automatic adjustment method for ultra-wideband antenna signals | |
CN108573323B (en) | Method, system, equipment and storage medium for predicting power consumption of energy internet user | |
CN110502806A (en) | A wireless spectrum occupancy prediction method based on LSTM network | |
CN113284001B (en) | Power consumption prediction method and device, computer equipment and storage medium | |
CN112532746B (en) | Cloud edge cooperative sensing method and system | |
CN111371626B (en) | A Bandwidth Prediction Method Based on Neural Network | |
CN112668611B (en) | Kmeans and CEEMD-PE-LSTM-based short-term photovoltaic power generation power prediction method | |
CN117591942B (en) | A method, system, medium and device for detecting abnormality of power load data | |
CN108197795B (en) | Malicious group account identification method, device, terminal and storage medium | |
CN112232604A (en) | Prediction method of network traffic extraction based on Prophet model | |
CN110516792A (en) | Non-stationary Time Series Forecasting Method Based on Wavelet Decomposition and Shallow Neural Network | |
CN104640137B (en) | A kind of QoE optimization methods accessing selection based on wireless ubiquitous network | |
CN111506813A (en) | Remote sensing information accurate recommendation method based on user portrait | |
CN110740063A (en) | Network flow characteristic index prediction method based on signal decomposition and periodic characteristics | |
CN114462306A (en) | Non-intrusive electricity load decomposition method based on variable weight time-domain convolutional network | |
CN118427778A (en) | Water quality index prediction method based on frequency domain time domain conversion and seasonal decomposition | |
CN118195138A (en) | A distributed photovoltaic power prediction method based on echo state network | |
Lim et al. | Long-term time series forecasting based on decomposition and neural ordinary differential equations | |
CN117354846A (en) | A 5G power network slicing traffic prediction method | |
CN117354172A (en) | Network traffic prediction method and system | |
CN116842829A (en) | Knowledge extraction and modeling method based on multi-source data analysis for electric power marketing | |
CN114638703A (en) | A Short-Term Forecast Method for Financial Time Series Based on Grey Mixed Model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20231003 |
|
CF01 | Termination of patent right due to non-payment of annual fee |