CN112929215B

CN112929215B - Network flow prediction method, system, computer equipment and storage medium

Info

Publication number: CN112929215B
Application number: CN202110152403.8A
Authority: CN
Inventors: 王敏; 王京辉; 程涛木; 陈鑫; 黄强; 李钢
Original assignee: Broid Technology Co ltd
Current assignee: Broid Technology Co ltd
Priority date: 2021-02-04
Filing date: 2021-02-04
Publication date: 2022-10-21
Anticipated expiration: 2041-02-04
Also published as: CN112929215A

Abstract

The invention discloses a network flow prediction method, a system, computer equipment and a storage medium, wherein the network flow prediction method comprises the following steps: firstly, abnormal value processing is carried out on an original time sequence, and trend fitting is carried out through a first model; performing trend removing processing to obtain a stationary time sequence, and performing learning prediction on the stationary time sequence through a second model; and finally, combining the outputs of the first model and the second model to predict future flow data. The method constructs a high-precision model through the combination of two simple models, decomposes the time sequence, respectively predicts the time sequence trend and periodicity through the two models, and then combines the results to obtain the prediction of the network flow at the future moment. The invention has high prediction precision, low model complexity, easy engineering realization and deployment and low training cost.

Description

Network flow prediction method, system, computer equipment and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method and a system for predicting network traffic, a computer device, and a storage medium.

Background

The network flow prediction model has very important research significance in the aspects of network security and network planning, and is particularly important for operators, because users in different regions have different preferences, some often view short videos, and some often play games. Therefore, the traffic of the base stations at different positions also has great difference, some base stations may have redundancy all the time, some base stations may be in imminent danger or exceed a peak value all the time, the delay of a network is increased, and the internet surfing experience of a user is seriously influenced. Therefore, a traffic prediction model is needed, which can learn the behavior pattern of the user according to the historical data of the user, predict the traffic of one area with small granularity, and adjust the configuration of network devices in different areas in time, thereby providing a reliable reference for the reduction and expansion of the network devices, improving the utilization rate of the devices, and ensuring the internet experience of the user.

The flow data is a time series, and for the prediction of the time series, the traditional model is generally based on a continuous or discrete time process, the models mainly focus on the short-term correlation of the series, but the flow time series data has a specific growth trend besides the short-term correlation and has strong correlation with a specific date. Therefore, the difference between the prediction result of the traditional model and the actual network flow is large. Recently, a plurality of deep learning-based flow prediction methods are also proposed, which basically combine a long-short term memory network or a convolutional neural network, and also combine the two methods at the same time to learn the space-time characteristics of the network flow data.

The deep learning method has good effects on the time sequence problem, but the deep learning model has high complexity, a large number of hidden variables, low interpretability, increased calculated amount, high consumption on the memory and high model training and deployment cost. The deep learning method is used for simultaneously learning the characteristics of time sequence data such as trend, period and the like, and needs to design a specific network structure aiming at different data, carry out multiple rounds of training and adjust parameters. The training process is relatively time-consuming, and the parameter adjustment cost is high.

Disclosure of Invention

In order to solve the problems, the invention provides a network flow prediction method, a system, computer equipment and a storage medium from the perspective of practical engineering application, and reduces the calculation of a model while ensuring the accuracy, thereby reducing the practice of model training and updating and saving the hardware cost.

The invention discloses a network flow prediction method, which comprises the steps of carrying out abnormal value processing on an original time sequence, and carrying out trend fitting through a first model; trend removing processing is carried out to obtain a stationary time sequence, and learning prediction is carried out on the stationary time sequence through a second model; combining the outputs of the first and second models to predict future flow data.

Further, the outlier processing comprises: and smoothing the abnormal values in the original time sequence, and filling the abnormal values and the missing values by a sliding mean.

Further, the first model adopts a linear regression model to perform trend fitting, and learns the overall trend of the original time sequence; the linear regression model is: y is _t = ax + b, where a and b are regression coefficients of the model;

further, the de-trending process includes: and (3) subtracting the trend from the original sequence subjected to abnormal value processing to obtain a new sequence which is a stable time sequence, namely removing the obvious increasing trend.

Further, the second model adopts a random forest model to perform learning prediction on the stationary time sequence: training a plurality of decision tree models on a data set by a resampling method, and voting or calculating the mean value of the prediction result of each decision tree model to reduce the prediction variance of the decision tree model; the decision tree continually slices the dataset until the target variables are all the same or cannot be sliced.

Further, the stationary time series is a continuous target variable, and the target value is fitted by a regression tree, and the creation of the regression tree includes the following steps:

s1, calculating a characteristic splitting attribute index:

wherein y is _k Is the kth target value, μ is the mean of the k target values;

s2, calculating each candidate segmentation point: sigma _A，i (S)＝σ(S ₁ )+σ(S ₂ ) Wherein A is a candidate feature, and i is the ith candidate segmentation point of the candidate feature A;

s3, finding a segmentation point with the minimum variance: min (sigma) _A，i (S)), if the division point, i.e., the node, cannot be subdivided, storing the node as a leaf node; performing binary segmentation, and repeating the steps S2 and S3 on the segmented data set;

and S4, obtaining a regression tree model T (x).

Further, resampling is carried out on the data set for N times, and N regression tree models are trained to obtain a stable time sequence prediction result

Further, said combining the outputs of said first model and said second model comprises: predicting the result of the trend

And stationary time series prediction results

And summing to obtain a future flow data prediction result:

the invention relates to a network flow prediction system, comprising:

the preprocessing module is used for processing abnormal values of the original time sequence;

the first model is used for performing trend fitting on the original time sequence after the abnormal value processing;

the de-trend processing module is used for performing de-trend processing on the result of the first model trend fitting to obtain a stable time sequence;

the second model is used for learning and predicting the stationary time series;

and the prediction module is used for combining the outputs of the first model and the second model to predict future flow data.

The computer equipment comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the network flow prediction method when executing the computer program.

A computer-readable storage medium of the present invention stores a computer program which, when executed by a processor, implements the network traffic prediction method.

The invention has the beneficial effects that:

the method constructs a high-precision model through the combination of two simple models, decomposes the time sequence, predicts the time sequence trend and periodicity respectively through the two models, and then combines the results to obtain the prediction of the network flow at the future time. The invention has high prediction precision, low model complexity, easy engineering realization and deployment and low training cost.

The invention reduces the calculation of the model while confirming the precision, thereby reducing the practice of model training and updating and saving the hardware cost; the structural design is optimized, and hardware resources are saved; the calculation complexity is low, and the engineering implementation is easy.

Compared with the traditional time sequence prediction method, the method has higher precision; compared with a deep learning method, the method has the advantages of smaller calculated amount, low complexity and stronger model interpretability, and obtains good effect on real data.

Drawings

FIG. 1 is a block diagram of a method for predicting network traffic in accordance with an embodiment of the present invention;

FIG. 2 shows flow data over a certain period of time and a smoothed time series;

FIG. 3 illustrates an original time series after outlier processing and a stationary time series after decomposition;

FIG. 4 shows actual predicted results for an embodiment of the present invention.

Detailed Description

In order to more clearly understand the technical features, objects, and effects of the present invention, specific embodiments of the present invention will now be described. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a network flow prediction method, as shown in figure 1, firstly carrying out abnormal value processing on an original time sequence, and carrying out trend fitting through a first model; performing trend removing processing to obtain a stationary time sequence, and performing learning prediction on the stationary time sequence through a second model; and finally, combining the outputs of the first model and the second model to predict future flow data.

In a preferred embodiment of the present invention, in the outlier processing, the outliers in the original time series are smoothed, and the outliers and the missing values are filled by a sliding mean. As shown in fig. 2, the flow data of a certain period of time, the left side is the original collected data, and there are obvious abnormal values, which greatly deviate from the distribution of the data; the right side is a time series after the smoothing process.

In another preferred embodiment of the present invention, the first model uses a linear regression model for trend fitting to learn the overall trend of the original time series. Fig. 3 shows the original time series after the outlier processing and the stationary time series after the decomposition.

The linear regression model is:

y _t ＝ax+b

where a and b are the regression coefficients of the model, once the regression coefficients are determined, and then new inputs are given, it is very easy to make predictions. Specifically, the regression coefficient is multiplied by the input value, and the results are added together to obtain the predicted value.

In yet another preferred embodiment of the present invention, the detrending process comprises: and (3) subtracting the trend from the original sequence subjected to abnormal value processing to obtain a new sequence which is a stable time sequence, namely removing the obvious increasing trend. As shown in fig. 3, the detrended stationary time series graphically exhibits a process that fluctuates around its mean value. The detrending process is formulated as follows:

y _d ＝y-y _t

y _d stationary time series after trending;

y: original time series after abnormal value processing;

y _t : trend of original time series after abnormal value processing.

In a further preferred embodiment of the invention, the second model employs a random forest model for learning prediction of stationary time series. The flow data is data strongly correlated with the time characteristics, and therefore the present embodiment constructs time-related characteristics including month information, time information, weekday information, holidays, and the like. The time series is then predicted by a random forest model.

The random forest is an integration method based on decision tree models, a plurality of decision tree models are trained on a data set through a resampling method, and then the prediction variance of each decision tree model is reduced by voting or calculating the mean value of the prediction result of each decision tree model; the decision tree continually slices the dataset until the target variables are all the same or cannot be sliced.

The stationary time series is a continuous target variable, the target value is fitted by a regression tree, and the creation of the regression tree comprises the following steps:

s1, calculating a characteristic splitting attribute index:

wherein y is _k Is the kth target value, μ is the mean of the k target values;

s2, calculating each candidate segmentation point: sigma _A，i (S)＝σ(S ₁ )+σ(S ₂ ) Where A is the candidate feature and i is the ith candidate segmentation point of the candidate feature A；

and S4, obtaining a regression tree model T (x).

Resampling the data set for N times, training N regression tree models, and obtaining a steady time sequence prediction result

Predicting the result of the trend

And stationary time series prediction results

And summing to obtain the future flow data prediction result:

fig. 4 shows the actual prediction result of the present embodiment, in which the left side of the graph is the historical data, the right side is the predicted value for the future flow, and the root mean square error RMSE is 0.02.

The invention also provides a network flow prediction system, which comprises:

the trend removing processing module is used for removing the trend of the result of the first model trend fitting to obtain a stable time sequence;

and the prediction module is used for combining the outputs of the first model and the second model to predict the future flow data.

In addition, the invention also provides a computer device and a computer readable storage medium, the computer device comprises a memory and a processor, the memory stores a computer program, and the processor realizes the network flow prediction method when executing the computer program. The computer-readable storage medium stores a computer program that, when executed by a processor, implements a network traffic prediction method.

In summary, the invention constructs a high-precision model by combining two simple models, decomposes the time sequence, predicts the time sequence trend and periodicity respectively by the two models, and then combines the results to obtain the prediction of the network traffic at the future time. The method has the advantages of high prediction precision, low model complexity, easy engineering realization and deployment and low training cost.

The foregoing is illustrative of the preferred embodiments of the present invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and is not to be construed as limited to the exclusion of other embodiments, and that various other combinations, modifications, and environments may be used and modifications may be made within the scope of the concepts described herein, either by the above teachings or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

In the description of the present invention, it should be noted that the terms "first", "second", "third", and the like are used only for distinguishing the description, and are not intended to indicate or imply relative importance.

Claims

1. A network flow prediction method is characterized in that abnormal value processing is carried out on an original time sequence, and trend fitting is carried out through a first model; performing trend removing processing to obtain a stationary time sequence, and performing learning prediction on the stationary time sequence through a second model; merging the outputs of the first model and the second model to predict future flow data;

the first model adopts a linear regression model to perform trend fitting, and learns the overall trend of the original time sequence; the linear regression model is: y is _t = ax + b, where a and b are regression coefficients of the model;

the de-trending process includes: the original sequence after abnormal value processing is used for carrying out subtraction with the trend to obtain a new sequence which is a stable time sequence, namely, the obvious increasing trend is removed;

the second model adopts a random forest model to learn and predict the stationary time sequence: training a plurality of decision tree models on a data set by a resampling method, and voting or calculating the mean value of the prediction result of each decision tree model to reduce the prediction variance of the decision tree model; the decision tree continually slices the dataset until the target variables are all the same or cannot be sliced.

2. The method of claim 1, wherein the outlier processing comprises: and smoothing the abnormal values in the original time sequence, and filling the abnormal values and the missing values by a sliding mean.

3. The method of claim 1, wherein the sequence of stationary times is a continuous target variable, and the target value is fitted by a regression tree, the creation of the regression tree comprising the steps of:

s1, calculating a characteristic splitting attribute index:

wherein y is _k Is the kth target value, μ is the mean of the k target values;

s3, finding a segmentation point with the minimum variance: min (sigma) _A，i (S)) if the cut point, i.e., the node, cannot be subdivided, storing the node as a leaf node; performing binary segmentation, and repeating the steps S2 and S3 on the segmented data set;

and S4, obtaining a regression tree model T (x).

4. The method of claim 3, wherein N resampling is performed on the data set, and N regression tree models are trained to obtain a stationary time series prediction result

5. The method of claim 4, wherein the combining the outputs of the first model and the second model comprises: predicting the result of the trend

And stationary time series prediction results

And summing to obtain a future flow data prediction result:

6. a network traffic prediction system based on the network traffic prediction method according to any one of claims 1 to 5, characterized by comprising:

the trend removing processing module is used for removing the trend of the result of the first model trend fitting to obtain a stationary time sequence;

7. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the network traffic prediction method of any one of claims 1-5 when executing the computer program.

8. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the network traffic prediction method of any of claims 1-5.