CN111030889B

CN111030889B - Network traffic prediction method based on GRU model

Info

Publication number: CN111030889B
Application number: CN201911343425.1A
Authority: CN
Inventors: 赵炜; 尚立; 杨会峰; 李井泉; 江明亮; 王旭蕊; 刘惠; 纪春华; 杨杨; 郭少勇; 喻鹏
Original assignee: State Grid Corp of China SGCC; Beijing University of Posts and Telecommunications; Information and Telecommunication Branch of State Grid Hebei Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Beijing University of Posts and Telecommunications; Information and Telecommunication Branch of State Grid Hebei Electric Power Co Ltd
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2022-11-01
Anticipated expiration: 2039-12-24
Also published as: CN111030889A

Abstract

The invention discloses a network flow prediction method based on a GRU model, relating to the technical field of information communication; inputting a network flow data sequence into a GRU neural network model and completing the prediction of network flow; the accuracy and the effect of network flow prediction are improved by inputting the network flow data sequence into the GRU neural network model and completing the prediction of the network flow and the like.

Description

Network flow prediction method based on GRU model

Technical Field

The invention relates to the technical field of information communication, in particular to a network traffic prediction method based on a GRU model.

Background

The electric power data communication network is a comprehensive wide area network transmission platform and is also an important component of an electric power information infrastructure. With the rapid development of power data networks, the network scale is continuously enlarged, and sufficient and reliable information support is increasingly required to ensure the safe and reliable operation of the power data networks. The prediction of the network flow of the power data network can provide important information for the safe operation of the power data network, and particularly, the method can sense the flow abnormality and the operation state abnormality of the power data network in advance so as to guarantee the operation of the power data network, so that the method has important research value and application prospect. In general, network traffic data is affected by various complex and random factors, but is nonlinear time-series data in nature.

The characteristics of modern internet make network traffic prediction important in improving network efficiency, reliability and adaptability. In recent years, many researchers have made studies on network traffic prediction, and many network traffic prediction methods have been proposed. Currently, prediction models for network traffic prediction include time series models, neural network models, and the like. However, because the network flow data sequence is influenced by various uncertain factors and the data of the influencing factors is difficult to express, the network flow sequence has the complex characteristics of high nonlinearity and non-stationarity, and the traditional time sequence model and the neural network model are difficult to process, so that the accuracy of the network flow prediction by adopting a simple prediction model is low, and the reasonable planning and distribution of the network are influenced.

Therefore, how to improve the accuracy of network traffic prediction to improve the reliability of the network is a problem to be solved by those skilled in the art.

In order to solve the development state of the prior art, the existing patents and documents are searched, compared and analyzed, and the following technical information with high relevance to the invention is screened out:

patent scheme 1:201510793377.1 network traffic prediction method based on traffic trend

The invention provides a method for processing faults of a wireless sensor network, which aims at solving the problems that the wireless sensor network adopting a centralized fault processing mode occupies network resources and the wireless sensor network adopting a distributed fault processing mode consumes sensor resources and energy. The method comprises the following steps: extracting a network traffic trend in a time period before a current time period; predicting the network traffic trend at a future moment according to the extracted network traffic trend; calculating the error between the extracted network flow value and the network flow trend thereof, and predicting the flow error; and predicting the predicted value of the network flow at the future moment according to the predicted network flow trend and the predicted flow error. The invention greatly reduces the number of training samples required by flow error prediction and flow estimation, and saves training time; and the extracted network traffic trend not only highlights the periodic characteristics of the traffic in each time period, but also maintains the local structural characteristics of the traffic.

Patent scheme 2:201611249158.8 neural network-based network traffic prediction system and traffic prediction method thereof

The invention provides a network traffic prediction method based on a BP (back propagation) neural network, which is based on the principle that data is normalized to enable a sample data value to be between 0 and 1, parameters of the BP neural network are initialized, the BP neural network is pre-trained and optimized by using a BP algorithm, and finally the trained BP neural network is used for prediction to obtain a prediction result. The method can not only extract the characteristics of the data, but also optimize the network by using the BP algorithm, thereby solving the problem of complex network structure and difficult training and improving the accuracy of flow prediction to a certain extent. The invention can monitor, detect and analyze various backbone networks, monitor and detect network abnormal events in the backbone networks in real time, and realize early warning of network abnormal conditions.

Patent scheme 3:201810011664.6 traffic prediction method based on neural network

The invention provides a neural network-based traffic prediction method, which is characterized in that computer data are sampled according to a set sampling time period, the window length of a training set is determined, and the abnormal traffic can be prevented and detected by matching the use of data sampling, data set setting, LSTM model training and data judgment. The method comprises the following steps: sampling computer data according to a set sampling time period; dividing a training set and a verification set; substituting into the LSTM model for model training and verification; and sampling the computer flow to be predicted and then bringing the sampled computer flow into a well-trained LSTM model for prediction. The invention can realize the prevention and detection of abnormal flow by matching the data sampling, the data set setting, the LSTM model training and the data judgment, and has the characteristics of high automatic degree, high detection speed and wide application range.

The defects of the above patent scheme 1: the scheme extracts the network traffic trend of a period before the current moment and predicts the network traffic trend of a period in the future according to real-time network traffic data; then, calculating the error of the network flow and the network flow trend in the past period and predicting the future network flow error; finally, predicting a predicted value of future network flow according to the predicted network flow trend and the predicted network flow error; in the scheme, the predefined cycle time has an important influence on the prediction of the future network traffic, so that the accuracy of the predicted value is influenced, and the prediction scheme is more difficult to express a highly complex nonlinear sequence, so that the universality of the scheme is not high.

The defects of the above patent scheme 2: the scheme provides a network flow prediction method based on a BP neural network. The BP neural network is easy to establish and train, has certain expression capacity on complex data sequences, firstly performs data normalization, then performs pre-training on the BP neural network, optimizes the BP neural network by using a BP algorithm, and finally performs prediction by using the trained BP neural network to obtain prediction. In the scheme, the BP neural network is mainly adopted to predict the network traffic data, but the BP neural network has poor memorability to the traffic data, and the improvement of the traffic prediction precision is limited.

The defect of the above patent scheme 3: the scheme provides a flow prediction method based on a neural network, which is used for sampling computer data according to a set sampling time period, training by using an LSTM model and predicting. However, the patent scheme only uses a single LSTM model, and although the LSTM model has better expressiveness in a nonlinear sequence, in practice, due to the defect of the gradient descent method, the learning rate is too fast to skip the optimal point, and there is a certain improvement space in prediction accuracy.

Problems with the prior art and considerations:

how to solve the technical problem of improving the accuracy and the effect of predicting the network flow.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a network traffic prediction method based on a GRU model, which improves the accuracy and effect of network traffic prediction by inputting a network traffic data sequence into the GRU neural network model and completing the prediction of network traffic and the like.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a network flow prediction method based on a GRU model inputs a network flow data sequence into a GRU neural network model and completes the prediction of network flow.

The further technical scheme is as follows: the GRU-Adam neural network model is a neural network model using an SGD gradient descent algorithm, the GRU-Adam model is a neural network model using an Adam gradient descent algorithm, the GRU-AdaGrad model is a neural network model using an AdaGrad gradient descent algorithm, the GRU-AdaGrad model is predicted by respectively using the GRU-SGD model, the GRU-Adam model and the GRU-AdaGrad model, and data predicted by each model are added to calculate an average value to obtain predicted network flow data.

The further technical scheme is as follows: specifically comprises the steps S1 to S5,

s1, obtaining historical network flow data;

s2, determining training data and verification data in historical network traffic data;

s3, bringing training data into each GRU neural network model for training;

s4, predicting the verification data through three GRU neural network models;

and S5, adding the predicted data, and averaging to obtain the predicted network flow data.

The further technical scheme is as follows: wherein the step S3 specifically comprises the step S31,

s31, three GRU neural network models are respectively a GRU-SGD model, a GRU-Adam model and a GRU-AdaGrad model, the GRU-SGD model is a neural network model using an SGD gradient descent algorithm, the GRU-Adam model is a neural network model using an Adam gradient descent algorithm, the GRU-AdaGrad model is a neural network model using an AdaGrad gradient descent algorithm, training data are respectively input into each GRU neural network model, and the training data are firstly transmitted in the GRU neural network model in a forward direction.

The further technical scheme is as follows: wherein the step S3 further comprises a step S32,

and S32, calculating a loss function of time.

The further technical scheme is as follows: wherein the step S3 further comprises a step S33,

and S33, sequentially iterating until the loss function is converged by using the reverse chain type derivation.

The further technical scheme is as follows: wherein the step S3 further comprises a step S34,

s34, the GRU-SGD model is updated by using an SGD gradient descent algorithm, the GRU-Adam model is updated by using an Adam gradient descent algorithm, and the GRU-AdaGrad model is updated by using an AdaGrad gradient descent algorithm.

The further technical scheme is as follows: wherein the step S3 further comprises a step S35,

and S35, repeating the steps from S31 to S34, continuously updating, stopping until the loss function is less than 0.2, and finishing the model training.

The further technical scheme is as follows: wherein the step S34 specifically comprises the steps S341 to S343,

s341, calculating the reduction amount of each parameter by using an SGD gradient reduction algorithm through the GRU-SGD model, and updating;

s342, calculating the descending amount of each parameter by using an Adam gradient descending algorithm through the GRU-Adam model, and updating;

and S343, calculating the reduction amount of each parameter by using an AdaGrad gradient reduction algorithm through the GRU-AdaGrad model, and updating.

The further technical scheme is as follows: the method is run on a server basis.

Adopt the produced beneficial effect of above-mentioned technical scheme to lie in:

firstly, the accuracy and the effect of network flow prediction are improved by inputting a network flow data sequence into a GRU neural network model and completing the prediction of network flow and the like.

Secondly, the GRU-SGD model, the GRU-Adam model and the GRU-AdaGrad model are used for prediction respectively, the data predicted by each model are added to obtain the average value to obtain the predicted network traffic data, and the accuracy and the effect of network traffic prediction are further improved.

See detailed description of the preferred embodiments.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a block diagram of a GRU neural network model in the present invention;

FIG. 3 is a graph comparing predicted flow rate data with actual flow rate data in the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the application, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways than those described herein, and it will be appreciated by those skilled in the art that the present application may be practiced without departing from the spirit and scope of the present application, and that the present application is not limited to the specific embodiments disclosed below.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present application unless specifically stated otherwise. Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description. Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be discussed further in subsequent figures.

In the description of the present application, it is to be understood that the orientation or positional relationship indicated by the directional terms such as "front, rear, upper, lower, left, right", "lateral, vertical, horizontal" and "top, bottom", etc., are generally based on the orientation or positional relationship shown in the drawings, and are used for convenience of description and simplicity of description only, and in the case of not making a reverse description, these directional terms do not indicate and imply that the device or element being referred to must have a particular orientation or be constructed and operated in a particular orientation, and therefore, should not be considered as limiting the scope of the present application; the terms "inner and outer" refer to the inner and outer relative to the profile of the respective component itself.

For ease of description, spatially relative terms such as "over 8230 \ 8230;,"' over 8230;, \8230; upper surface "," above ", etc. may be used herein to describe the spatial relationship of one device or feature to another device or feature as shown in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if a device in the figures is turned over, devices described as "above" or "on" other devices or configurations would then be oriented "below" or "under" the other devices or configurations. Thus, the exemplary terms "at 8230; \8230; 'above" may include both orientations "at 8230; \8230;' above 8230; 'at 8230;' below 8230;" above ". The device may be otherwise variously oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.

It should be noted that the terms "first", "second", and the like are used to define the components, and are only used for convenience of distinguishing the corresponding components, and the terms have no special meanings unless otherwise stated, and therefore, the scope of protection of the present application is not to be construed as being limited.

As shown in fig. 1, the present invention discloses a network traffic prediction method based on a GRU model, which includes steps S1 to S5, and the network traffic data sequence is input into a GRU neural network model and the prediction of the network traffic is completed, specifically as follows:

the GRU-Adam neural network model is a neural network model using an Adam gradient descent algorithm, the GRU-AdaGrad model is a neural network model using an AdaGrad gradient descent algorithm, the GRU-AdaGrad model is predicted by respectively using the GRU-SGD model, the GRU-AdaGrad model and the GRU-AdaGrad model, and data predicted by each model are added to calculate an average value to obtain predicted network flow data.

S1, obtaining historical network flow data.

And S2, determining training data and verification data in the historical network flow data.

And S3, bringing the training data into each GRU neural network model for training.

And S32, calculating a time loss function.

S34, updating the GRU-SGD model by using an SGD gradient descent algorithm, updating the GRU-Adam model by using an Adam gradient descent algorithm, and updating the GRU-AdaGrad model by using an AdaGrad gradient descent algorithm.

S341, calculating the reduction amount of each parameter by using an SGD gradient reduction algorithm through the GRU-SGD model, and updating.

And S342, calculating the reduction amount of each parameter by using an Adam gradient reduction algorithm through the GRU-Adam model, and updating the reduction amount.

And S4, predicting the verification data through three GRU neural network models.

The GRU neural network model, the SGD gradient descent algorithm, the Adam gradient descent algorithm, and the AdaGrad gradient descent algorithm are prior art and are not described herein again.

Description of the drawings:

first, the present invention needs to explain variables used in a GRU-based network traffic prediction method. The variables used were as follows:

z_t: an update gate at time t;

r_t: a reset gate at time t;

storage information at time t;

h_t: output information of the GRU unit at the time t;

y_t: final output information at time t;

Δθ_t: the value of the gradient decline of the parameter.

The GRU-based network traffic prediction method comprises the steps of inputting a network traffic data sequence into a GRU neural network, training different GRU neural network models by adopting different gradient descent algorithms, and finally adding and restoring data predicted by the models into predicted network traffic data. In accordance with the variables defined above, the solution of the invention is explained in detail below with reference to fig. 1.

As shown in fig. 1, the steps are described as follows:

s1, acquiring historical network flow data;

s2, determining training data and verification data in historical network flow data;

s3, bringing training data into a plurality of GRU models for training;

s4, predicting the verification data by a plurality of GRU neural network models;

and S5, adding the predicted data and averaging to obtain predicted network flow data.

Wherein, step S3 specifically includes:

s31, inputting the training data x (t) into a GRU neural network model, wherein three GRU models are provided, namely GRU-SGD, GRU-Adam and GRU-AdaGrad, and the difference is that the used gradient descent algorithms are different. Training data is first propagated forward in the GRU neural unit.

GRU is composed of update gate and reset gate, and at time step t, update gate z_tThe calculation formula of (2) is as follows:

z_t＝σ(W_hz*h_t-1+W_xz*x_t) (1)

wherein x_tAs input vector at the t-th time step, h_t-1For information of the previous time step t-1, W_hzIs a weight matrix, W_hzTo update the matrix, sigma is a sigmoid activation function, and information can be compressed to be between 0 and 1, and the formula and the derivative formula are as follows:

σ′(z)＝y(1-y) (3)

as shown in fig. 2, the refresh gate primarily determines how much information of the past time step can be retained until the subsequent time step.

The reset gate mainly determines how much past time step information is forgotten, and the reset gate r_tThe calculation formula of (c) is:

r_t＝σ(W_hr*h_t-1+W_xr*x_t) (4)

wherein, W_xrIs a weight matrix, W_hrTo update the matrix. Memorizing information

Storing information of past time steps by resetting gates, memorizing information

The calculation formula of (c) is:

wherein, W_xcIs a weight matrix, W_hcTo update the matrix, the line is Hadamard product,

for the tanh activation function, the tanh activation function formula and its derivative formula are as follows:

GRU output unit h_t：

Wherein h is_t-1Output cell information for last time step, z_tIn order to update the door information,

for memorizing information.

Final output value y_tPassing again through sigmoid activation function, W_oAs a weight matrix:

y_t＝σ(W_o*h_t) (9)

s32, the formula in the forward propagation process shows that the parameter to be learned has W_hz、W_xz、W_hr、W_xr、W_hc、W_xc、W_oThe final output of the output layer is y_tCalculating a loss function for a certain time:

wherein y is_dAre true values. The loss of a single sequence is then:

and S33, gradually solving the derivative of each loss function to each parameter W by using the reverse chain derivation:

wherein each intermediate parameter is:

δ_y,t＝(y_d-y_t)*σ′ (19)

δ_h,t＝δ_y,tW_o+δ_z,t+1W_hz+δ_t+1W_hc*r_t+1+δ_h,t+1W_hr+δ_h,t+1*(1-z_t+1) (20)

after the partial derivatives for each parameter are calculated, the parameters may be updated, and the iterations may be performed until the loss function converges.

And S34, updating parameters by adopting different gradient descent algorithms for the three different GRU network models respectively.

Wherein, S34 step includes:

s341, calculating the reduction quantity delta theta of each parameter by using an SGD gradient reduction algorithm through a GRU-SGD model_tThereby updating the parameter W.

Wherein, g_tIs the gradient of the weight(s),

for learning rate, Δ θ_tIs the amount by which the parameter W is decreased.

S342, calculating the quantity delta theta of reduction of each parameter by using an Adam gradient reduction algorithm through a GRU-Adam model_tThereby updating the parameter W.

The Adam algorithm adjusts the learning rate of each parameter by first order moment estimation and second order moment estimation of the gradient. Adam corrects the first moment estimation and the second moment estimation in an offset mode, so that the learning rate of each iteration has a stable range, and parameters are changed stably.

m_t＝μ*m_t-1+(1-μ)*g_t (26)

Wherein, g_tIs the gradient of the weight, m_t，n_tFirst and second order moment estimates of the parametric partial derivatives, respectively, mu and v are exponential decay rates, between 0, 1), typically 0.9 and 0.999,

is a correction value. Delta theta_tIs the amount by which the parameter W is decreased,

is the learning rate.

S343, GRU-AdaGrad model calculates the reduction quantity delta theta of each parameter by using an AdaGrad gradient reduction algorithm_tThereby updating the parameter W.

The AdaGrad algorithm forms a constraint term by recursion, the early stage g_tWhen the gradient is small, the constraint term is large, the gradient can be amplified, and the later period g_tWhen the gradient is larger, the constraint term is smaller, and the gradient can be constrained.

Wherein, g_tIs the weight gradient, n_tIs an estimate of the second moment of the weight gradient, Δ θ_tIs the amount by which the weight is decreased,

to learn the rate, e is used to guarantee that the denominator is not 0.

And S35, repeating the steps S31-S34, continuously updating the W parameter until the loss function E is less than 0.2, and finishing the model training.

Example data for the present invention are illustrated below:

s1, in the present embodiment, 14776 pieces of network traffic sequence data are collected as a data set.

S2, in the present embodiment, the first 12000 pieces of data in the network traffic sequence data set are used as a training set train (t), and the last 2776 pieces of data are used as a verification set val (t).

S3, in the embodiment of the invention, the training set sequence train (t) is input into three GRU neural network models for training, each GRU neural network comprises 32 GRU neural network units, the random batch size is set to be 128, and the training is carried out 100 times, wherein the GRU neural network units areThe SGD model adopts SGD to perform gradient descent, the GRU-Adam model adopts Adam to perform gradient descent, and the GRU-AdaGrad model adopts AdaGrad to perform gradient descent. Obtaining a well-trained GRU-SGD model after the training is finished_sgdGRU-Adam model_AdamGRU-AdaGrad model_AdaGrad. An Adam gradient descent algorithm is used.

S4, inputting the verification set data val (t) into the trained GRU neural network model_sgd、model_Adam、model_AdaGradIn (1), output predicted data pre_sgd(t)、pre_Adam(t)、pre_AdaGrad(t)。

S5, adding and summing the predicted sub-sequence data to obtain final predicted network flow data

As shown in fig. 3, the predicted flow value pre (t) is compared with the actual flow value val (t).

The purpose of the invention is as follows:

the network flow prediction aims at accurately predicting the flow change in a future network and providing reliable data for network planning and maintenance. Most of the existing network flow prediction models adopt a method for constructing a linear mathematical model or a neural network model, most of the existing network flow prediction methods are processed by adopting a linear or nonlinear method, and the accuracy and the real-time performance of prediction caused by the planarity are difficult to ensure. In order to solve the above problems, the present patent provides a network traffic prediction method based on GRU. The method comprises the steps of inputting a network flow data sequence into a plurality of built GRU network models, adopting different gradient descent algorithms for each GRU model, respectively predicting by utilizing GRU neural network models, and then adding predicted data to obtain the predicted network flow data by averaging. The invention selects to integrate the GRU neural network in the neural network model into the network flow prediction, and the GRU network has good memorability and expression capability to the time sequence. For network flow data under a common condition, the change of the network flow data is influenced by various factors which are difficult to express, the sequence of the network flow data has the complex characteristics of high nonlinearity and non-stationarity, and different characteristics exist under different network environment conditions. The invention discloses a GRU-based network traffic prediction method, which respectively predicts data by utilizing GRU neural network models adopting different gradient descent algorithms, and finally adds and calculates an average value to obtain predicted network traffic data. The invention aims to overcome the defects in the prior art on the basis of the prior art and further improve the accuracy of network flow data sequence prediction.

The technical contribution of the invention is as follows:

the network flow prediction is widely applied to various fields of networks, and a network flow data sequence of the network flow prediction is a nonlinear time sequence in nature, but has the characteristic of high instability due to the influence of various uncertain factors, so that the network flow data is difficult to express and express, and further planning and maintaining of future networks become difficult. For this reason, network traffic prediction is of paramount importance. The invention provides a network flow prediction method based on GRU. Compared with the prior work, the main contributions of the invention lie in the following aspects:

(1) The invention predicts the network flow sequence by utilizing the GRU neural network algorithm, and the GRU neural network has memorability and can make better prediction on the nonlinear time sequence.

(2) In order to enable the traffic prediction method provided by the invention to carry out network traffic prediction under different situations and keep better prediction accuracy, the network traffic prediction method provided by the invention adopts a method of multi-model prediction addition averaging, three GRU models adopt different gradient descent algorithms to carry out prediction, and finally the prediction data is a value obtained by summing and averaging.

Description of the effects of the invention:

the method utilizes the GRU neural network to predict the network traffic data sequence, simultaneously adopts three GRU models with different gradient descent algorithms, and finally obtains the final predicted network traffic data by summing and averaging the predicted values of the three models.

The invention can memorize the change rule of the past network flow data through the GRU neural network, the GRU neural network is relatively simple, the resource occupation is less, the capability of expressing the nonlinear network flow sequence is strong, and the prediction effect is improved by respectively predicting the subsequence.

The invention adopts three GRU models with different gradient descent methods, and aims to adapt to prediction in different scenes.

Claims

1. A network flow prediction method based on a GRU model is characterized in that: inputting a network flow data sequence into a GRU neural network model and finishing predicting network flow, wherein the number of the GRU neural network model is three, the GRU neural network model is a GRU-SGD model, a GRU-Adam model and a GRU-AdaGrad model, the GRU-SGD model is a neural network model using an SGD gradient descent algorithm, the GRU-Adam model is a neural network model using an Adam gradient descent algorithm, the GRU-AdaGrad model is a neural network model using an AdaGrad gradient descent algorithm, the GRU-SGD model, the GRU-Adam model and the GRU-AdaGrad model are used for predicting respectively, and data predicted by each model are added to obtain predicted network flow data through averaging; the method specifically comprises the steps of S1-S5, S1, obtaining historical network flow data; s2, determining training data and verification data in historical network flow data; s3, bringing training data into each GRU neural network model for training; s4, predicting the verification data through three GRU neural network models; s5, adding the predicted data to average to obtain predicted network flow data; the S3 specifically comprises S31-S35 steps, wherein S31 comprises three GRU neural network models which are a GRU-SGD model, a GRU-Adam model and a GRU-AdaGrad model, the GRU-SGD model is a neural network model using an SGD gradient descent algorithm, the GRU-Adam model is a neural network model using an Adam gradient descent algorithm, the GRU-AdaGrad model is a neural network model using an AdaGrad gradient descent algorithm, training data are input into each GRU neural network model respectively, and the training data are transmitted forward in the GRU neural network models; s32, calculating a loss function of time; s33, using reverse chain type derivation, and sequentially iterating until the loss function is converged; s34, updating the GRU-SGD model by using an SGD gradient descent algorithm, updating the GRU-Adam model by using an Adam gradient descent algorithm, and updating the GRU-AdaGrad model by using an AdaGrad gradient descent algorithm; s34, the steps include S341 to S343 specifically, S341 and GRU-SGD model use SGD gradient descent algorithm to calculate the descent quantity of each parameter, so as to update; s342, calculating the reduction amount of each parameter by using an Adam gradient reduction algorithm through the GRU-Adam model, and updating; s343, calculating the descending amount of each parameter by using an AdaGrad gradient descending algorithm through the GRU-AdaGrad model, and updating; and S35, repeating the steps from S31 to S34, continuously updating until the loss function is less than 0.2, and finishing the model training.

2. The method of claim 1, wherein the method comprises the following steps: the method is run on a server basis.