CN112929215B - Network flow prediction method, system, computer equipment and storage medium - Google Patents
Network flow prediction method, system, computer equipment and storage medium Download PDFInfo
- Publication number
- CN112929215B CN112929215B CN202110152403.8A CN202110152403A CN112929215B CN 112929215 B CN112929215 B CN 112929215B CN 202110152403 A CN202110152403 A CN 202110152403A CN 112929215 B CN112929215 B CN 112929215B
- Authority
- CN
- China
- Prior art keywords
- model
- time sequence
- trend
- prediction
- stationary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/147—Network analysis or design for predicting network behaviour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a network flow prediction method, a system, computer equipment and a storage medium, wherein the network flow prediction method comprises the following steps: firstly, abnormal value processing is carried out on an original time sequence, and trend fitting is carried out through a first model; performing trend removing processing to obtain a stationary time sequence, and performing learning prediction on the stationary time sequence through a second model; and finally, combining the outputs of the first model and the second model to predict future flow data. The method constructs a high-precision model through the combination of two simple models, decomposes the time sequence, respectively predicts the time sequence trend and periodicity through the two models, and then combines the results to obtain the prediction of the network flow at the future moment. The invention has high prediction precision, low model complexity, easy engineering realization and deployment and low training cost.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and a system for predicting network traffic, a computer device, and a storage medium.
Background
The network flow prediction model has very important research significance in the aspects of network security and network planning, and is particularly important for operators, because users in different regions have different preferences, some often view short videos, and some often play games. Therefore, the traffic of the base stations at different positions also has great difference, some base stations may have redundancy all the time, some base stations may be in imminent danger or exceed a peak value all the time, the delay of a network is increased, and the internet surfing experience of a user is seriously influenced. Therefore, a traffic prediction model is needed, which can learn the behavior pattern of the user according to the historical data of the user, predict the traffic of one area with small granularity, and adjust the configuration of network devices in different areas in time, thereby providing a reliable reference for the reduction and expansion of the network devices, improving the utilization rate of the devices, and ensuring the internet experience of the user.
The flow data is a time series, and for the prediction of the time series, the traditional model is generally based on a continuous or discrete time process, the models mainly focus on the short-term correlation of the series, but the flow time series data has a specific growth trend besides the short-term correlation and has strong correlation with a specific date. Therefore, the difference between the prediction result of the traditional model and the actual network flow is large. Recently, a plurality of deep learning-based flow prediction methods are also proposed, which basically combine a long-short term memory network or a convolutional neural network, and also combine the two methods at the same time to learn the space-time characteristics of the network flow data.
The deep learning method has good effects on the time sequence problem, but the deep learning model has high complexity, a large number of hidden variables, low interpretability, increased calculated amount, high consumption on the memory and high model training and deployment cost. The deep learning method is used for simultaneously learning the characteristics of time sequence data such as trend, period and the like, and needs to design a specific network structure aiming at different data, carry out multiple rounds of training and adjust parameters. The training process is relatively time-consuming, and the parameter adjustment cost is high.
Disclosure of Invention
In order to solve the problems, the invention provides a network flow prediction method, a system, computer equipment and a storage medium from the perspective of practical engineering application, and reduces the calculation of a model while ensuring the accuracy, thereby reducing the practice of model training and updating and saving the hardware cost.
The invention discloses a network flow prediction method, which comprises the steps of carrying out abnormal value processing on an original time sequence, and carrying out trend fitting through a first model; trend removing processing is carried out to obtain a stationary time sequence, and learning prediction is carried out on the stationary time sequence through a second model; combining the outputs of the first and second models to predict future flow data.
Further, the outlier processing comprises: and smoothing the abnormal values in the original time sequence, and filling the abnormal values and the missing values by a sliding mean.
Further, the first model adopts a linear regression model to perform trend fitting, and learns the overall trend of the original time sequence; the linear regression model is: y is t = ax + b, where a and b are regression coefficients of the model;
further, the de-trending process includes: and (3) subtracting the trend from the original sequence subjected to abnormal value processing to obtain a new sequence which is a stable time sequence, namely removing the obvious increasing trend.
Further, the second model adopts a random forest model to perform learning prediction on the stationary time sequence: training a plurality of decision tree models on a data set by a resampling method, and voting or calculating the mean value of the prediction result of each decision tree model to reduce the prediction variance of the decision tree model; the decision tree continually slices the dataset until the target variables are all the same or cannot be sliced.
Further, the stationary time series is a continuous target variable, and the target value is fitted by a regression tree, and the creation of the regression tree includes the following steps:
s1, calculating a characteristic splitting attribute index:wherein y is k Is the kth target value, μ is the mean of the k target values;
s2, calculating each candidate segmentation point: sigma A,i (S)=σ(S 1 )+σ(S 2 ) Wherein A is a candidate feature, and i is the ith candidate segmentation point of the candidate feature A;
s3, finding a segmentation point with the minimum variance: min (sigma) A,i (S)), if the division point, i.e., the node, cannot be subdivided, storing the node as a leaf node; performing binary segmentation, and repeating the steps S2 and S3 on the segmented data set;
and S4, obtaining a regression tree model T (x).
Further, resampling is carried out on the data set for N times, and N regression tree models are trained to obtain a stable time sequence prediction result
Further, said combining the outputs of said first model and said second model comprises: predicting the result of the trendAnd stationary time series prediction resultsAnd summing to obtain a future flow data prediction result:
the invention relates to a network flow prediction system, comprising:
the preprocessing module is used for processing abnormal values of the original time sequence;
the first model is used for performing trend fitting on the original time sequence after the abnormal value processing;
the de-trend processing module is used for performing de-trend processing on the result of the first model trend fitting to obtain a stable time sequence;
the second model is used for learning and predicting the stationary time series;
and the prediction module is used for combining the outputs of the first model and the second model to predict future flow data.
The computer equipment comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the network flow prediction method when executing the computer program.
A computer-readable storage medium of the present invention stores a computer program which, when executed by a processor, implements the network traffic prediction method.
The invention has the beneficial effects that:
the method constructs a high-precision model through the combination of two simple models, decomposes the time sequence, predicts the time sequence trend and periodicity respectively through the two models, and then combines the results to obtain the prediction of the network flow at the future time. The invention has high prediction precision, low model complexity, easy engineering realization and deployment and low training cost.
The invention reduces the calculation of the model while confirming the precision, thereby reducing the practice of model training and updating and saving the hardware cost; the structural design is optimized, and hardware resources are saved; the calculation complexity is low, and the engineering implementation is easy.
Compared with the traditional time sequence prediction method, the method has higher precision; compared with a deep learning method, the method has the advantages of smaller calculated amount, low complexity and stronger model interpretability, and obtains good effect on real data.
Drawings
FIG. 1 is a block diagram of a method for predicting network traffic in accordance with an embodiment of the present invention;
FIG. 2 shows flow data over a certain period of time and a smoothed time series;
FIG. 3 illustrates an original time series after outlier processing and a stationary time series after decomposition;
FIG. 4 shows actual predicted results for an embodiment of the present invention.
Detailed Description
In order to more clearly understand the technical features, objects, and effects of the present invention, specific embodiments of the present invention will now be described. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a network flow prediction method, as shown in figure 1, firstly carrying out abnormal value processing on an original time sequence, and carrying out trend fitting through a first model; performing trend removing processing to obtain a stationary time sequence, and performing learning prediction on the stationary time sequence through a second model; and finally, combining the outputs of the first model and the second model to predict future flow data.
In a preferred embodiment of the present invention, in the outlier processing, the outliers in the original time series are smoothed, and the outliers and the missing values are filled by a sliding mean. As shown in fig. 2, the flow data of a certain period of time, the left side is the original collected data, and there are obvious abnormal values, which greatly deviate from the distribution of the data; the right side is a time series after the smoothing process.
In another preferred embodiment of the present invention, the first model uses a linear regression model for trend fitting to learn the overall trend of the original time series. Fig. 3 shows the original time series after the outlier processing and the stationary time series after the decomposition.
The linear regression model is:
y t =ax+b
where a and b are the regression coefficients of the model, once the regression coefficients are determined, and then new inputs are given, it is very easy to make predictions. Specifically, the regression coefficient is multiplied by the input value, and the results are added together to obtain the predicted value.
In yet another preferred embodiment of the present invention, the detrending process comprises: and (3) subtracting the trend from the original sequence subjected to abnormal value processing to obtain a new sequence which is a stable time sequence, namely removing the obvious increasing trend. As shown in fig. 3, the detrended stationary time series graphically exhibits a process that fluctuates around its mean value. The detrending process is formulated as follows:
y d =y-y t
y d stationary time series after trending;
y: original time series after abnormal value processing;
y t : trend of original time series after abnormal value processing.
In a further preferred embodiment of the invention, the second model employs a random forest model for learning prediction of stationary time series. The flow data is data strongly correlated with the time characteristics, and therefore the present embodiment constructs time-related characteristics including month information, time information, weekday information, holidays, and the like. The time series is then predicted by a random forest model.
The random forest is an integration method based on decision tree models, a plurality of decision tree models are trained on a data set through a resampling method, and then the prediction variance of each decision tree model is reduced by voting or calculating the mean value of the prediction result of each decision tree model; the decision tree continually slices the dataset until the target variables are all the same or cannot be sliced.
The stationary time series is a continuous target variable, the target value is fitted by a regression tree, and the creation of the regression tree comprises the following steps:
s1, calculating a characteristic splitting attribute index:wherein y is k Is the kth target value, μ is the mean of the k target values;
s2, calculating each candidate segmentation point: sigma A,i (S)=σ(S 1 )+σ(S 2 ) Where A is the candidate feature and i is the ith candidate segmentation point of the candidate feature A;
S3, finding a segmentation point with the minimum variance: min (sigma) A,i (S)), if the division point, i.e., the node, cannot be subdivided, storing the node as a leaf node; performing binary segmentation, and repeating the steps S2 and S3 on the segmented data set;
and S4, obtaining a regression tree model T (x).
Resampling the data set for N times, training N regression tree models, and obtaining a steady time sequence prediction result
Predicting the result of the trendAnd stationary time series prediction resultsAnd summing to obtain the future flow data prediction result:
fig. 4 shows the actual prediction result of the present embodiment, in which the left side of the graph is the historical data, the right side is the predicted value for the future flow, and the root mean square error RMSE is 0.02.
The invention also provides a network flow prediction system, which comprises:
the preprocessing module is used for processing abnormal values of the original time sequence;
the first model is used for performing trend fitting on the original time sequence after the abnormal value processing;
the trend removing processing module is used for removing the trend of the result of the first model trend fitting to obtain a stable time sequence;
the second model is used for learning and predicting the stationary time series;
and the prediction module is used for combining the outputs of the first model and the second model to predict the future flow data.
In addition, the invention also provides a computer device and a computer readable storage medium, the computer device comprises a memory and a processor, the memory stores a computer program, and the processor realizes the network flow prediction method when executing the computer program. The computer-readable storage medium stores a computer program that, when executed by a processor, implements a network traffic prediction method.
In summary, the invention constructs a high-precision model by combining two simple models, decomposes the time sequence, predicts the time sequence trend and periodicity respectively by the two models, and then combines the results to obtain the prediction of the network traffic at the future time. The method has the advantages of high prediction precision, low model complexity, easy engineering realization and deployment and low training cost.
The foregoing is illustrative of the preferred embodiments of the present invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and is not to be construed as limited to the exclusion of other embodiments, and that various other combinations, modifications, and environments may be used and modifications may be made within the scope of the concepts described herein, either by the above teachings or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
In the description of the present invention, it should be noted that the terms "first", "second", "third", and the like are used only for distinguishing the description, and are not intended to indicate or imply relative importance.
Claims (8)
1. A network flow prediction method is characterized in that abnormal value processing is carried out on an original time sequence, and trend fitting is carried out through a first model; performing trend removing processing to obtain a stationary time sequence, and performing learning prediction on the stationary time sequence through a second model; merging the outputs of the first model and the second model to predict future flow data;
the first model adopts a linear regression model to perform trend fitting, and learns the overall trend of the original time sequence; the linear regression model is: y is t = ax + b, where a and b are regression coefficients of the model;
the de-trending process includes: the original sequence after abnormal value processing is used for carrying out subtraction with the trend to obtain a new sequence which is a stable time sequence, namely, the obvious increasing trend is removed;
the second model adopts a random forest model to learn and predict the stationary time sequence: training a plurality of decision tree models on a data set by a resampling method, and voting or calculating the mean value of the prediction result of each decision tree model to reduce the prediction variance of the decision tree model; the decision tree continually slices the dataset until the target variables are all the same or cannot be sliced.
2. The method of claim 1, wherein the outlier processing comprises: and smoothing the abnormal values in the original time sequence, and filling the abnormal values and the missing values by a sliding mean.
3. The method of claim 1, wherein the sequence of stationary times is a continuous target variable, and the target value is fitted by a regression tree, the creation of the regression tree comprising the steps of:
s1, calculating a characteristic splitting attribute index:wherein y is k Is the kth target value, μ is the mean of the k target values;
s2, calculating each candidate segmentation point: sigma A,i (S)=σ(S 1 )+σ(S 2 ) Wherein A is a candidate feature, and i is the ith candidate segmentation point of the candidate feature A;
s3, finding a segmentation point with the minimum variance: min (sigma) A,i (S)) if the cut point, i.e., the node, cannot be subdivided, storing the node as a leaf node; performing binary segmentation, and repeating the steps S2 and S3 on the segmented data set;
and S4, obtaining a regression tree model T (x).
6. a network traffic prediction system based on the network traffic prediction method according to any one of claims 1 to 5, characterized by comprising:
the preprocessing module is used for processing abnormal values of the original time sequence;
the first model is used for performing trend fitting on the original time sequence after the abnormal value processing;
the trend removing processing module is used for removing the trend of the result of the first model trend fitting to obtain a stationary time sequence;
the second model is used for learning and predicting the stationary time series;
and the prediction module is used for combining the outputs of the first model and the second model to predict future flow data.
7. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the network traffic prediction method of any one of claims 1-5 when executing the computer program.
8. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the network traffic prediction method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110152403.8A CN112929215B (en) | 2021-02-04 | 2021-02-04 | Network flow prediction method, system, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110152403.8A CN112929215B (en) | 2021-02-04 | 2021-02-04 | Network flow prediction method, system, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112929215A CN112929215A (en) | 2021-06-08 |
CN112929215B true CN112929215B (en) | 2022-10-21 |
Family
ID=76170221
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110152403.8A Active CN112929215B (en) | 2021-02-04 | 2021-02-04 | Network flow prediction method, system, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112929215B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112929214A (en) * | 2021-02-02 | 2021-06-08 | 北京明朝万达科技股份有限公司 | Model construction method, device, equipment and storage medium |
CN115473821B (en) * | 2021-06-11 | 2023-09-08 | 中国移动通信集团广东有限公司 | Network capacity prediction method and device, electronic equipment and storage medium |
CN113726557A (en) * | 2021-08-09 | 2021-11-30 | 国网福建省电力有限公司 | Network transmission control optimization method based on flow demand |
CN113726558A (en) * | 2021-08-09 | 2021-11-30 | 国网福建省电力有限公司 | Network equipment flow prediction system based on random forest algorithm |
CN115454807A (en) * | 2022-11-11 | 2022-12-09 | 云智慧(北京)科技有限公司 | Capacity prediction method, device and equipment of operation and maintenance system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109587713A (en) * | 2018-12-05 | 2019-04-05 | 广州数锐智能科技有限公司 | A kind of network index prediction technique, device and storage medium based on ARIMA model |
CN109977098A (en) * | 2019-03-08 | 2019-07-05 | 北京工商大学 | Non-stationary time-series data predication method, system, storage medium and computer equipment |
CN110879921A (en) * | 2019-11-25 | 2020-03-13 | 大连大学 | Satellite network flow prediction method based on time-space correlation |
CN111539764A (en) * | 2020-04-17 | 2020-08-14 | 南京邮电大学 | Big data multiple access selection method based on submodular function |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102369689B (en) * | 2011-08-11 | 2014-05-07 | 华为技术有限公司 | Long-term forecasting method and device of network flow |
GB2547712A (en) * | 2016-02-29 | 2017-08-30 | Fujitsu Ltd | Method and apparatus for generating time series data sets for predictive analysis |
US10715393B1 (en) * | 2019-01-18 | 2020-07-14 | Goldman Sachs & Co. LLC | Capacity management of computing resources based on time series analysis |
CN111428789A (en) * | 2020-03-25 | 2020-07-17 | 广东技术师范大学 | Network traffic anomaly detection method based on deep learning |
-
2021
- 2021-02-04 CN CN202110152403.8A patent/CN112929215B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109587713A (en) * | 2018-12-05 | 2019-04-05 | 广州数锐智能科技有限公司 | A kind of network index prediction technique, device and storage medium based on ARIMA model |
CN109977098A (en) * | 2019-03-08 | 2019-07-05 | 北京工商大学 | Non-stationary time-series data predication method, system, storage medium and computer equipment |
CN110879921A (en) * | 2019-11-25 | 2020-03-13 | 大连大学 | Satellite network flow prediction method based on time-space correlation |
CN111539764A (en) * | 2020-04-17 | 2020-08-14 | 南京邮电大学 | Big data multiple access selection method based on submodular function |
Non-Patent Citations (4)
Title |
---|
一种自适应时间粒度的网络流量单步预测方法;李绍庆;《网络新媒体技术》;20150915(第05期);全文 * |
于静等.基于组合模型的网络流量预测.《计算机工程与应用》.2012,(第08期), * |
基于组合模型的网络流量预测;于静等;《计算机工程与应用》;20120116(第08期);第1-3页,图2 * |
小波分解和组合模型相融合的网络流量网络预测;包萍;《激光杂志》;20141225(第12期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112929215A (en) | 2021-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112929215B (en) | Network flow prediction method, system, computer equipment and storage medium | |
CN108924198B (en) | Data scheduling method, device and system based on edge calculation | |
Kim et al. | Predicting the household power consumption using CNN-LSTM hybrid networks | |
Jiang et al. | Scenario generation for wind power using improved generative adversarial networks | |
JP6756048B2 (en) | Predictive asset optimization for computer resources | |
CN112633316B (en) | Load prediction method and device based on boundary estimation theory | |
Hassan et al. | A HMM-based adaptive fuzzy inference system for stock market forecasting | |
Davis et al. | Grids versus graphs: Partitioning space for improved taxi demand-supply forecasts | |
CN112884237A (en) | Power distribution network prediction auxiliary state estimation method and system | |
CN112187554B (en) | Operation and maintenance system fault positioning method and system based on Monte Carlo tree search | |
Rivero et al. | Energy associated tuning method for short-term series forecasting by complete and incomplete datasets | |
Shariffdeen et al. | Adaptive workload prediction for proactive auto scaling in PaaS systems | |
Lei et al. | Adaptive multiple non-negative matrix factorization for temporal link prediction in dynamic networks | |
CN110942142A (en) | Neural network training and face detection method, device, equipment and storage medium | |
You et al. | An improved collaborative filtering recommendation algorithm combining item clustering and Slope One scheme | |
CN115829024A (en) | Model training method, device, equipment and storage medium | |
Ding et al. | Backpropagation of pseudo-errors: neural networks that are adaptive to heterogeneous noise | |
CN114338416B (en) | Space-time multi-index prediction method and device and storage medium | |
CN113128666A (en) | Mo-S-LSTMs model-based time series multi-step prediction method | |
Thathachar et al. | Parallel algorithms for modules of learning automata | |
Peng et al. | Bayesian network revision with probabilistic constraints | |
CN114021776A (en) | Material combination selection method and device and electronic equipment | |
CN114662658A (en) | On-chip optical network hot spot prediction method based on LSTM neural network | |
CN113763710A (en) | Short-term traffic flow prediction method based on nonlinear adaptive system | |
Singh et al. | Exact analysis of the state-dependent polling model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |