CN115499344B - Network traffic real-time prediction method and system - Google Patents

Network traffic real-time prediction method and system Download PDF

Info

Publication number
CN115499344B
CN115499344B CN202211029883.XA CN202211029883A CN115499344B CN 115499344 B CN115499344 B CN 115499344B CN 202211029883 A CN202211029883 A CN 202211029883A CN 115499344 B CN115499344 B CN 115499344B
Authority
CN
China
Prior art keywords
network
data
flow
time
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211029883.XA
Other languages
Chinese (zh)
Other versions
CN115499344A (en
Inventor
王经伟
黄宏钦
王雨
游侃民
林厚宏
杨涵铄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peng Cheng Laboratory
Original Assignee
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peng Cheng Laboratory filed Critical Peng Cheng Laboratory
Priority to CN202211029883.XA priority Critical patent/CN115499344B/en
Publication of CN115499344A publication Critical patent/CN115499344A/en
Application granted granted Critical
Publication of CN115499344B publication Critical patent/CN115499344B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence

Abstract

The invention provides a network traffic real-time prediction method and a system, which are characterized in that historical network traffic data and real-time network traffic data of each network node are obtained; wherein the historical network traffic data comprises: historical network flow data acquired in the data set and/or historical network flow data acquired by each network node are disclosed; training a preset neural network model by utilizing the historical network flow data and the real-time network flow data to obtain offline flow prediction data output by a trained offline flow prediction model; performing online learning on the historical network flow data and the real-time network flow data by using an online machine learning platform to obtain online flow prediction data; the method and the system combine the deep network learning of the historical network flow data with the online machine learning of the real-time network flow data, and effectively improve the prediction effect of the real-time changing network flow.

Description

Network traffic real-time prediction method and system
Technical Field
The present invention relates to the field of network traffic control technologies, and in particular, to a method and a system for predicting network traffic in real time.
Background
In network Traffic research, traffic Matrix (TM) is often used as an object, and a network Traffic prediction problem, i.e., a TM prediction problem, predicts a future time TM according to historical TM information, where a common method includes: linear prediction methods such as ARMA/ARIMA models, nonlinear methods such as neural networks, and the like.
In a real situation, a network often has a burst state, and has strong burst property and randomness, but the network traffic prediction methods all take offline historical data as an algorithm learning object, and in a burst shock situation, the network characteristics change greatly, so that the network traffic characteristics can only be reflected to a certain extent, and the burst real-time network traffic state cannot be accurately predicted.
Accordingly, there is a need for improvement and development in the art.
Disclosure of Invention
The invention aims to solve the technical problems that the network traffic is predicted based on offline data and the network traffic under emergency conditions cannot be predicted accurately.
The technical scheme adopted by the invention for solving the problems is as follows:
In a first aspect, this embodiment provides a method for predicting network traffic in real time, where the method includes:
acquiring historical network flow data and real-time network flow data of each network node; wherein the historical network traffic data comprises: historical network flow data acquired in the data set and/or historical network flow data acquired by each network node are disclosed;
training a preset neural network model by utilizing the historical network flow data and the real-time network flow data to obtain offline flow prediction data output by a trained offline flow prediction model;
performing online learning on the historical network flow data and the real-time network flow data by using an online machine learning platform to obtain online flow prediction data;
and obtaining real-time flow prediction data based on the offline flow prediction data and the online flow prediction data.
Optionally, the step of training the preset neural network model by using the historical network traffic data and the real-time network traffic data to obtain offline traffic prediction data output by the trained offline traffic prediction model includes:
presetting a neural network model, constructing a training set according to the historical network flow data, constructing a testing set according to the real-time network flow data, and training to obtain a preliminary offline flow prediction model; wherein the training set comprises: a network topology and a network traffic matrix;
The preliminary offline flow prediction model builds a training set and a testing set according to real-time network flow data, and the offline flow prediction model is obtained through training;
and inputting time sequence data formed by the network topological graph and the network flow matrix into the offline flow prediction model after training is completed, so as to obtain offline flow prediction data output by the offline flow prediction model.
Optionally, the preset neural network model includes: at least one graph convolution neural network and at least one 3D convolution network; the step of inputting the time sequence data formed by the network topological graph and the network flow matrix to the offline flow prediction model after training is completed, and obtaining the offline flow prediction data output by the offline flow prediction model comprises the following steps:
inputting time sequence data formed by the network topological graph and the network flow matrix into a graph convolution neural network in a preset neural network model to obtain spatial characteristics in the time sequence data extracted by the graph convolution neural network;
and inputting the time sequence data formed by the network topological graph and the network flow matrix into a 3D convolution network in a preset neural network model to obtain the time sequence characteristics in the time sequence data extracted by the 3D convolution network.
Optionally, each graph convolution neural network is connected with each 3D convolution network in sequence in a one-to-one correspondence manner, and each 3D convolution network is connected in sequence;
the step of inputting the time sequence data formed by the network topological graph and the network flow matrix to the offline flow prediction model after training is completed, and obtaining the offline flow prediction data output by the offline flow prediction model comprises the following steps:
sequentially inputting each data in each time sequence data to a first graph rolling neural network, a second graph rolling neural network and an Nth graph rolling neural network to obtain a first space feature, a second space feature and an Nth space feature which are output by each graph rolling neural network; wherein N is an integer greater than zero;
each graph convolution neural network sequentially inputs the first spatial feature, the second spatial feature and the N spatial feature into the connected 3D convolution network, and the former 3D convolution network inputs the time sequence feature output by the former 3D convolution network into the latter 3D convolution network to obtain offline flow prediction data output by the N3D convolution network.
Optionally, the step of obtaining online traffic prediction data by online learning the historical network traffic data and the real-time network traffic data using an online machine learning platform includes:
And performing online learning on the historical network flow data and the real-time network flow data by using an alink machine learning platform to obtain the online flow prediction data.
Optionally, the step of acquiring historical network traffic data and real-time network traffic data of each network node includes:
network flow data of each network node are collected according to a preset data collection strategy configuration; wherein the network traffic data comprises: network node equipment information and network link traffic information;
saving the network traffic data to a kafka cluster; wherein kafka is stored for said network traffic data in different headers;
the method comprises the steps of carrying out data aggregation on network flow data in a preset computing period, and storing the aggregated data into a hive table, wherein the aggregated data comprises the following steps: network traffic and node load data.
Optionally, the method further comprises:
acquiring state data according to the acquisition strategy configuration of the network flow data at the current moment; wherein the status data comprises: network link state information, network topology information, and network traffic information;
calculating a reward function according to the network node load at the current moment and the predicted real-time flow prediction data;
Determining the acquisition strategy configuration of the network flow data at the next moment according to the reward function and the state data;
and controlling flow data acquisition by using the determined acquisition strategy configuration of the network flow data at the next moment.
Optionally, the step of calculating the reward function includes:
calculating a first loss value loss of real-time traffic prediction data and network traffic data of the last time sequence t-1 And a second loss value loss of the real-time traffic prediction data and the real value of the next time series t Obtaining a flow standardized difference value between the first loss value and the second loss value;
calculating a standardized difference value between the total network load of the previous time sequence and the total network load of the current time sequence to obtain a load standardized difference value;
and calculating a reward function according to the flow standardization difference value and the load standardization difference value.
Optionally, the configuration of the collection policy of the network traffic data includes: a plurality of time settings for collecting network traffic data; each time setting is incremented in turn.
In a second aspect, the present embodiment further provides a system for online predicting network traffic in real time, where the system includes:
the data acquisition module is used for acquiring historical network flow data and real-time network flow data of each network node; wherein the historical network traffic data comprises: historical network flow data acquired in the data set and/or historical network flow data acquired by each network node are disclosed;
The offline data prediction module is used for training a preset neural network model by utilizing the historical network flow data and the real-time network flow data to obtain offline flow prediction data output by the offline flow prediction model after training;
the online data prediction module is used for online learning the historical network flow data and the real-time network flow data by utilizing an online machine learning platform to obtain online flow prediction data;
and the data integration module is used for obtaining real-time flow prediction data based on the offline flow prediction data and the online flow prediction data.
The invention has the beneficial effects that: the invention provides a network traffic real-time prediction method and a system, which are characterized in that historical network traffic data and real-time network traffic data of each network node are obtained; wherein the historical network traffic data comprises: historical network flow data acquired in the data set and/or historical network flow data acquired by each network node are disclosed; training a preset neural network model by utilizing the historical network flow data and the real-time network flow data to obtain offline flow prediction data output by a trained offline flow prediction model; performing online learning on the historical network flow data and the real-time network flow data by using an online machine learning platform to obtain online flow prediction data; the method and the system combine the deep network learning of the historical network flow data with the online machine learning of the real-time network flow data, and effectively improve the prediction effect of the real-time changing network flow.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.
Fig. 1 is a schematic diagram of steps of a method for predicting network traffic in real time according to an embodiment of the present invention;
fig. 2 is a flow chart of a method for predicting network traffic in real time according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating steps for collecting and storing traffic data according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a predetermined neural network in the implementation of the method according to the embodiment of the present invention
FIG. 5 is a flowchart of the learning steps of historical network traffic data in an embodiment of the invention;
FIG. 6 is a flowchart of the learning steps of real-time network traffic data in an embodiment of the present invention;
FIG. 7 is a flow chart of steps of a method for controlling network traffic in an embodiment of the present invention;
FIG. 8 is a flow chart of the steps of a specific application embodiment of the method of the present invention;
FIG. 9 is a schematic diagram of an embodiment of an application of network traffic collection in an embodiment of the invention;
fig. 10 is a schematic block diagram of a system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clear and clear, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present invention, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.
In a network, network traffic is an important index of network operation state, and accurate real-time network traffic prediction has important significance for network design, resource management and network safety protection. And effectively perceiving network flow, reasonably estimating the future network flow condition, and having important significance for configuring network route, perceiving network congestion in advance and other network planning tasks.
In the prior art, the prediction of the network traffic takes a traffic matrix as a research object, and the research traffic matrix is generally based on historical network traffic data as an algorithm learning object, but the network characteristics have strong variability, and the historical network traffic data cannot reflect the burstiness in the network traffic characteristics, so that the network traffic cannot be accurately predicted only based on the historical network traffic data.
In order to overcome the above problems in the prior art, this embodiment discloses a network traffic prediction method and system, and as shown in fig. 1, firstly, historical network traffic data and real-time network traffic data of each network node are acquired, the real-time traffic data is acquired and then stored in a kafka cluster, the data stored in the kafka cluster is uploaded to a link calculation engine, the uploaded acquired data is placed in a hive partition, and the historical network traffic data and the real-time network traffic data are learned by using a preset neural network model and output to obtain offline traffic prediction data; performing offline training on the historical network flow data to obtain parameter information of a local offline model, and performing online learning on the real-time network flow data according to the parameter information of the local offline model to obtain online flow prediction data; and finally, combining the offline flow prediction data with the online flow prediction data to obtain real-time flow prediction data.
When the real-time flow prediction data is obtained by combining the offline learning and online learning comprehensive results, the node acquisition strategy is dynamically adjusted in real time by utilizing the deep reinforcement learning, and the adjusted network node acquisition strategy is issued to each SDN network node by the network controller so as to realize the balance between the network node resource load and the prediction data accuracy.
The embodiment of the invention combines the historical network flow data and the real-time flow data to predict the network flow, improves the accuracy of flow prediction under the network emergency, and provides technical support for the technical fields related to the network running state, such as network resource management, network safety protection and the like.
The method and system of the present invention will be described in further detail below by taking application examples of the method and system of the present embodiment as examples.
Exemplary method
The embodiment provides a network traffic prediction method, as shown in fig. 2, including:
step S100, acquiring historical network flow data and real-time network flow data of each network node; wherein the historical network traffic data comprises: historical network traffic data acquired in the public data set and/or historical network traffic data acquired by each network node.
In the network, a plurality of network nodes are arranged, which means that a network device is connected to a network with independent addresses and with the function of transmitting or receiving data. The nodes may be workstations, clients, network users or personal computers, but also servers, printers and other network-connected devices. Each workstation, server, terminal equipment and network equipment, namely equipment with own unique network address, are network nodes. The whole network is composed of a plurality of network nodes, and the network nodes are connected by communication lines to form a certain geometrical relationship, namely a computer network topology.
In the step, firstly, historical network flow data and real-time network flow data of each network node to be predicted are obtained. In one implementation, as shown in connection with FIG. 3, historical network traffic data may be from hive store the public data set. There are three raw data tables in hive. The first table is a public data set table GEANT, and the partition fields of the table are date and acquisition time; the second table is a whole network flow information table, and the partition field of the table is date and flink data aggregation time; the third table is a full network node load information table, and the partition field of the table is date and flink data aggregation time.
The real-time network traffic data can analyze the acquisition strategy configuration on the current network node by calling the python script on each network node at regular time, and divide the acquired network node equipment information and link traffic information into different topic analyses and upload the different topic analyses into kafka. The network node equipment information comprises the utilization rate of a network node CPU and a memory. And the network link flow information comprises the flow passing through the node and the link flow information connected with the node.
Acquisition index table
Acquisition configuration table
And step 200, training a preset neural network model by utilizing the historical network flow data and the real-time network flow data to obtain offline flow prediction data output by the offline flow prediction model after training.
Training a preset neural network model by utilizing the historical network flow data and the real-time network flow data to obtain a trained historical data prediction model and offline flow prediction data output by the historical data prediction model.
Specifically, the step of obtaining offline flow prediction data by the preset network model according to the historical network flow data and the real-time network flow data includes:
presetting a neural network model, constructing a training set according to the historical network flow data, constructing a testing set according to the real-time network flow data, and training to obtain a preliminary offline flow prediction model; wherein the training set comprises: a network topology and a network traffic matrix;
The preliminary offline flow prediction model builds a training set and a testing set according to real-time network flow data, and the offline flow prediction model is obtained through training;
and inputting time sequence data formed by the network topological graph and the network flow matrix into the offline flow prediction model after training is completed, so as to obtain offline flow prediction data output by the offline flow prediction model.
Because the real-time variability of the network traffic is relatively strong, the main purpose of the invention is to effectively promote the prediction effect of the network traffic changing in real time, so the deep learning model has better prediction effect on the data with larger change, and in the initial stage of system operation, in one implementation mode, two sources of data are used: a data set GAENT is disclosed as a training set, and data is collected in real time as a test set; when the offline model test result is stable and the current data acquisition amount of the system reaches a certain requirement, the trained model parameters and the acquired data are used for training the model, and the latest acquired data are used as a test set. The model obtained in this way has better robustness.
As shown in fig. 4, in this embodiment, the preset neural network is composed of at least one graph convolution neural network and at least 3D convolution network structure, and the step of inputting time sequence data composed of the network topology graph and the network traffic matrix to the offline traffic prediction model after training is completed, to obtain offline traffic prediction data output by the offline traffic prediction model includes:
Inputting time sequence data formed by the network topological graph and the network flow matrix into a graph convolution neural network in a preset neural network model to obtain spatial characteristics in the time sequence data extracted by the graph convolution neural network;
and inputting the time sequence data formed by the network topological graph and the network flow matrix into a 3D convolution network in a preset neural network model to obtain the time sequence characteristics in the time sequence data extracted by the 3D convolution network.
Further, each graph convolution neural network is connected with each 3D convolution network in sequence in a one-to-one correspondence manner, and each 3D convolution network is connected in sequence;
the step of inputting the time sequence data formed by the network topological graph and the network flow matrix to the offline flow prediction model after training is completed, and obtaining the offline flow prediction data output by the offline flow prediction model comprises the following steps:
sequentially inputting each data in each time sequence data to a first graph rolling neural network, a second graph rolling neural network and an Nth graph rolling neural network to obtain a first space feature, a second space feature and an Nth space feature which are output by each graph rolling neural network; wherein N is an integer greater than zero;
Each graph convolution neural network sequentially inputs the first spatial feature, the second spatial feature and the N spatial feature into the connected 3D convolution network, and the former 3D convolution network inputs the time sequence feature output by the former 3D convolution network into the latter 3D convolution network to obtain offline flow prediction data output by the N3D convolution network.
Referring to fig. 3, the data input by the preset neural network are: network topology graph g= (V, E), where V is the set of network nodes and E is the connecting edge between network nodes; a network traffic matrix X; the network traffic prediction problem, namely the mapping function f based on the network topology graph G and the network traffic matrix X, is that:
[X t+1 ,...,X t+T ]=f(G,(X t-n+1 ,...,X t ));
the output of the preset neural network is a network flow predicted value at the moment T.
In combination with fig. 5, in a specific training process, in a first stage, a data set disclosed by GAENT is firstly utilized as a training set, real-time acquired data is used as a test set, when the real-time acquired data is enough or the model is already trained, a trained model and model parameters are acquired, the model is continuously trained by dividing the training set and the test set from the real-time acquired data until the model training is completed, and offline flow prediction data output by the model is obtained.
And step S300, online learning is carried out on the historical network flow data and the real-time network flow data by using an online machine learning platform, so as to obtain online flow prediction data.
After the training of the preset neural network is completed in the steps, the online machine learning platform is utilized to perform offline training on the historical network flow data to obtain a model and model parameters, and then online learning is performed on the real-time network flow data according to the obtained model parameters, so that online flow prediction data is obtained.
In combination with the illustration of fig. 6, the steps include offline training and online training, wherein the offline training comprises that offline tasks regularly pull data from a hive data table every day to perform training, and after the training is completed, model parameters are stored in a parameter controller. On-line training, namely reading sample data from kafka, training in small batches, such as pulling model parameters from a parameter controller every hour or every half hour, and updating the model parameters into the parameter controller after training is completed.
Because the real-time variability of the network flow is strong, in order to effectively promote the prediction effect of the network flow which changes in real time, the deep learning model has a good prediction effect on the data with larger changes, so in the initial stage of system operation, two sources of data are used, a public data set GAENT is used as a training set, and the real-time acquisition data is used as a test set; when the offline model test result is stable and the current data acquisition amount of the system reaches a certain requirement, the trained model parameters and the acquired data are used for training the model, and the latest acquired data are used as a test set. The model obtained in this way has better robustness.
And step 400, obtaining real-time flow prediction data based on the offline flow prediction data and the online flow prediction data.
The real-time network traffic prediction (pred _ real time) results from two parts,
the first part is a model obtained by combining a graph neural network with 3D-CNN and using historical network flow data, the model has a complex structure and strong time sequence feature extraction capability, and for the prediction pred_history of real-time data,
the second part is the predictive pred_alink of the alink on-line machine learning for real-time data, which is simpler in structure but better for real-time data prediction.
The final real-time traffic prediction result pred_real=a×pred_history+b×pred_alink, where the values of a and b are in the range of 0-1, the initialized value a=0.9, and b=0.1. And after the system is operated, carrying out parameter adjustment on the a and the b so as to achieve the optimal effect.
Further, during data collection, the step of obtaining the historical network traffic data and the real-time network traffic data of each network node includes:
collecting network flow data of each network node according to a preset data collection rule; wherein the network traffic data comprises: network node equipment information and network link traffic information;
Saving the network traffic data to a kafka cluster; wherein kafka is stored for said network traffic data in different headers;
the method comprises the steps of carrying out data aggregation on network flow data in a preset computing period, and storing the aggregated data into a hive table, wherein the aggregated data comprises the following steps: network traffic and node load data.
In order to reduce the load pressure of each network node, the embodiment further discloses an acquisition strategy configuration adjustment method for acquiring data of each network node based on the predicted real-time network traffic data on the basis of the method, as shown in fig. 7, wherein the method comprises the following steps:
h100, collecting state data according to the collection strategy configuration of the network flow data at the current moment; wherein the status data comprises: network link state information, network topology information, and network traffic information;
step H200, calculating a reward function according to the network node load at the current moment and the predicted real-time flow prediction data;
step H300, determining the acquisition strategy configuration of the network flow data at the next moment according to the rewarding function and the state data;
and step H400, controlling flow data acquisition by using the determined acquisition strategy configuration of the network flow data at the next moment.
Further, the step of calculating the bonus function includes:
calculating a first loss value loss of real-time traffic prediction data and network traffic data of the last time sequence t-1 And a second loss value loss of the real-time traffic prediction data and the real value of the next time series t Obtaining a flow standardized difference value between the first loss value and the second loss value;
calculating a standardized difference value between the total network load of the previous time sequence and the total network load of the current time sequence to obtain a load standardized difference value;
and calculating a reward function according to the flow standardization difference value and the load standardization difference value.
Specifically, the reward function of the present invention includes two parts:
prediction index of real-time prediction model: on the test set, the loss value loss of the model predicted value and the true value of the last time sequence t-1 Loss value loss of predicted value and true value of current model after current action is changed t If loss is low t <loss t-1 The prediction effect of the model is improved, and the difference between the two values is shown
Network node load index: network total load of last time sequence t-1 Network total load of current time sequence t If load t <load t-1 Illustrating that the network model load is diminishing, wherein the total network load
The total reward function is:
r=r 1 -a*r 2
the parameter alpha epsilon [0,1] is used for adjusting the influence of each component of the reward function, and the higher the alpha value is, the more importance is given to the network load, and the lower the alpha value is, the more importance is given to the effect of the prediction model. The purpose of the DDPG controller is to balance the prize value as large as possible, the network load and the predictive model effect.
In order to achieve balanced adjustment of the acquisition policy configuration, the acquisition policy configuration of the network traffic data comprises: a plurality of time settings for collecting network traffic data; each time setting is incremented in turn.
In one implementation, the configuration of the collection strategy includes that the collection strategy of each network node has 5 levels respectively, the collection strategy needs to be tested in a real environment to obtain a specific value of the reasonable collection strategy, and only ideas are described herein. Policy 0 (data uploaded every 30 minutes), policy 1 (data uploaded every 20 minutes), policy 2 (data uploaded every 10 minutes), policy 3 (data uploaded every 5 minutes), policy 4 (data uploaded every 3 minutes), policy 5 (data uploaded every 2 minutes), policy 6 (data uploaded every 1 minute). The acquisition strategy for each network node may be selected from the [0,1,2,3,4,5,6] array.
In the implementation, the kafka system can simultaneously support the processing of offline data and real-time data, and is used as a streaming processing tool, and the application purposes of the kafka on the system mainly have two aspects:
different topics store the acquisition information. The network node uploads 5 types of acquisition index data which are respectively stored in different topic partitions, so that the subsequent flink is beneficial to aggregation and classification of different topic data, network arrangement flow information required by a system and resource load conditions of each node of the network are obtained
Kafka can process data streams arriving in real time in time. When online real-time data prediction is performed, if the data flow reached at the current moment is too large, the message queue in kafka can persistence the data until the data is completely processed, so that the situation of data loss when the data processing fails is avoided. Meanwhile, kafka can ensure that the local order of data is ordered, which is useful for processing time-series data in real time, and the buffering processing and asynchronous processing mechanism of kafka can efficiently process data at the time of peak data.
And (3) processing the flank batch data, wherein in the data acquisition module, a flank calculation engine plays a role of data aggregation. In one embodiment, the aggregate network traffic data, GEANT, consists of 23 routers, containing 38 links. The whole data set is observed by more than 4 months, the acquisition frequency is once in 15 minutes, and the data set comprises 10773 acquisition samples, each sample has 23 x 23 data, and each data comprises node names and link flow values connected with the nodes. The method comprises the steps that three kinds of flow data in kafka are read by the Flink, whole network flow data in a fixed calculation period are aggregated into a GEANT data format, and the acquisition frequency of network nodes is half an hour at most, so that the calculation period of the Flink is set to be aggregated once in half an hour
Node load information is divided into three types by a simple mathematical formula, namely network node load = latest time CPU load multiplied by 50 and latest time memory load multiplied by 50. If the network node load <20, then the network node load condition = low; if the network node load is 20-50, then the network node load condition = normal; if the network node load exceeds 50, the network node load case = high. In this way, the load condition of the nodes of the whole network at the latest moment in each calculation period is obtained
The worse the network load index load is, the lower the acquisition frequency should be; the second is that on the test set, the real-time online prediction model predicts the quality of the result, the larger the difference, the higher the collection frequency should be, so the main purpose of the collection policy configuration is to reduce the load pressure of the network node as much as possible on the basis of achieving a better real-time flow prediction effect, and under the condition that the network model is strong enough, the size and data diversity of the network flow data volume are the key for determining the model effect, so the collection frequency of the network node is increased, more various flow data can be provided, but the load of the network node is also increased.
In a specific application embodiment, for a network system with strong dynamic property and real-time property, a strategy-based deep reinforcement learning method (Deep Deterministic Policy Gradient, DDPG) for continuous time control and optimization can be realized, so that the continuous time real-time control of a network can be realized, and the network node acquisition strategy can be optimized. As shown in connection with fig. 8 and 9, the method steps are as follows:
first, through a data acquisition moduleAcquiring network link state information, network topology information and network flow information at the current moment to generate state data S t
Step two, DDPG intelligent agent according to current input state S t And the last time of rewards R t-1 Determining a collection strategy of each node of a network, namely an output action a t I.e. [ freq ] 1 ,……,freq n ]Freq is the acquisition strategy of each network node, and is described in the data acquisition module as [0,1,2,3,4,5,6 ]]And then increasing and decreasing the strategy value according to the output action a to obtain a new acquisition strategy, and finally realizing acquisition strategy configuration issuing by the acquisition strategy through the network controller.
Thirdly, data acquisition is carried out according to a new data acquisition strategy, and the load condition of all nodes of the network at the moment and the test effect on the test set after the real-time data uploading at the moment are obtained. In the invention, when the model is complex enough, the data is a key factor for determining whether the model can be better, so that more frequent data acquisition increases the load condition of the network node, uploads more various data and enhances the capability of the model to extract time sequence characteristics, and under the premise, two indexes, namely network node load and real-time prediction model prediction result, are obtained.
And fourthly, carrying out deep reinforcement learning self-adaptive dynamic adjustment acquisition strategy configuration again according to the two feedback indexes obtained in the steps, namely the network node load and the real-time prediction model prediction result. The setting of the reward function can directly affect the policy selection of the deep reinforcement learning agent. Repeating the first step to the fourth step to realize the self-adaptive adjustment of the acquisition strategy configuration.
The method provided by the embodiment uses deep reinforcement learning DDPG, and according to the network node load, the prediction model effect and the current network flow state, the network node acquisition strategy is adaptively and dynamically adjusted, so that balance between the network load and the prediction model effect is achieved, the prediction accuracy is improved, and meanwhile, the pressure brought to the load by real-time network flow acquisition is balanced.
Exemplary apparatus
As shown in fig. 10, an embodiment of the present invention provides a network traffic real-time prediction system, which includes:
the data acquisition module 100 is configured to acquire historical network traffic data and real-time network traffic data of each network node; wherein the historical network traffic data comprises: historical network flow data acquired in the data set and/or historical network flow data acquired by each network node are disclosed; the function of which is as described in step S100.
The offline data prediction module 200 is configured to train a preset neural network model by using the historical network flow data and the real-time network flow data, so as to obtain offline flow prediction data output by the trained offline flow prediction model; the function of which is as described in step S200.
The online data prediction module 300 is configured to perform online learning on the historical network traffic data and the real-time network traffic data by using an online machine learning platform to obtain online traffic prediction data; the function of which is as described in step S300.
The data integration module 400 is configured to obtain real-time flow prediction data based on the offline flow prediction data and the online flow prediction data, and the function of the data integration module is as described in step S400.
It will be appreciated by those skilled in the art that the schematic block diagram shown in fig. 10 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the smart terminal to which the present inventive arrangements are applied, and that a particular smart terminal may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The invention provides a network traffic real-time prediction method and a system, which are characterized in that historical network traffic data and real-time network traffic data of each network node are obtained; wherein the historical network traffic data comprises: historical network flow data acquired in the data set and/or historical network flow data acquired by each network node are disclosed; training a preset neural network model by utilizing the historical network flow data and the real-time network flow data to obtain offline flow prediction data output by a trained offline flow prediction model; performing online learning on the historical network flow data and the real-time network flow data by using an online machine learning platform to obtain online flow prediction data; the method and the system combine the deep network learning of the historical network flow data with the online machine learning of the real-time network flow data, and effectively improve the prediction effect of the real-time changing network flow.
It is to be understood that the invention is not limited in its application to the examples described above, but is capable of modification and variation in light of the above teachings by those skilled in the art, and that all such modifications and variations are intended to be included within the scope of the appended claims.

Claims (9)

1. The network traffic real-time prediction method is characterized by comprising the following steps of:
acquiring historical network flow data and real-time network flow data of each network node; wherein the historical network traffic data comprises: historical network flow data acquired in the data set and/or historical network flow data acquired by each network node are disclosed;
training a preset neural network model by utilizing the historical network flow data and the real-time network flow data to obtain offline flow prediction data output by a trained offline flow prediction model;
performing online learning on the historical network flow data and the real-time network flow data by using an online machine learning platform to obtain online flow prediction data;
obtaining real-time flow prediction data based on the offline flow prediction data and the online flow prediction data;
the preset neural network model comprises the following steps: at least one graph convolution neural network and at least one 3D convolution network; each graph convolution neural network is connected with each 3D convolution network in a one-to-one correspondence manner, and each 3D convolution network is connected in turn;
the step of training the preset neural network model by utilizing the historical network flow data and the real-time network flow data to obtain offline flow prediction data output by the trained offline flow prediction model comprises the following steps:
Presetting a neural network model, constructing a training set according to the historical network flow data, constructing a testing set according to the real-time network flow data, and training to obtain a preliminary offline flow prediction model; wherein the training set comprises: a network topology and a network traffic matrix;
the preliminary offline flow prediction model builds a training set and a testing set according to real-time network flow data, and the offline flow prediction model is obtained through training;
and inputting the real-time network flow data into the offline flow prediction model after training is completed, and obtaining offline flow prediction data output by the offline flow prediction model.
2. The method for predicting network traffic in real time according to claim 1, wherein the time sequence data consisting of a network topology map and a network traffic matrix is constructed based on the real-time network traffic data;
the step of inputting the time sequence data formed by the network topological graph and the network flow matrix to the offline flow prediction model after training is completed, and obtaining the offline flow prediction data output by the offline flow prediction model comprises the following steps:
inputting time sequence data formed by the network topological graph and the network flow matrix into a graph convolution neural network in a preset neural network model to obtain spatial characteristics in the time sequence data extracted by the graph convolution neural network;
And inputting the time sequence data formed by the network topological graph and the network flow matrix into a 3D convolution network in a preset neural network model to obtain the time sequence characteristics in the time sequence data extracted by the 3D convolution network.
3. The method for predicting network traffic in real time according to claim 2, wherein the step of inputting the time series data composed of the network topology map and the network traffic matrix to the offline traffic prediction model after training to obtain the offline traffic prediction data output by the offline traffic prediction model comprises:
sequentially inputting each time sequence data into a first graph rolling neural network, a second graph rolling neural network and an N graph rolling neural network to obtain a first space feature, a second space feature and an N space feature which are output by each graph rolling neural network; wherein N is an integer greater than zero;
each graph convolution neural network sequentially inputs the first spatial feature, the second spatial feature and the N spatial feature into the connected 3D convolution network, and the former 3D convolution network inputs the time sequence feature output by the former 3D convolution network into the latter 3D convolution network to obtain offline flow prediction data output by the N3D convolution network.
4. The method for predicting network traffic in real time according to claim 2, wherein the step of obtaining online traffic prediction data by online learning the historical network traffic data and the real-time network traffic data using an online machine learning platform comprises:
and performing online learning on the historical network flow data and the real-time network flow data by using an alink machine learning platform to obtain the online flow prediction data.
5. The method according to claim 1, wherein the step of acquiring the historical network traffic data and the real-time network traffic data of each network node comprises:
network flow data of each network node are collected according to a preset data collection strategy configuration; wherein the network traffic data comprises: network node equipment information and network link traffic information;
saving the network traffic data to a kafka cluster; wherein the kafka cluster stores the network traffic data in different headers;
the method comprises the steps that a Flink computing engine conducts data aggregation on network flow data in a preset computing period, and stores the aggregated data into a hive table, wherein the aggregated data comprises the following steps: network traffic data and node load data.
6. The method of claim 5, further comprising:
acquiring state data according to the acquisition strategy configuration of the network flow data at the current moment; wherein the status data comprises: network link state information, network topology information, and network traffic information;
calculating a reward function according to the network node load at the current moment and the predicted real-time flow prediction data;
determining the acquisition strategy configuration of the network flow data at the next moment according to the reward function and the state data;
and controlling flow data acquisition by using the determined acquisition strategy configuration of the network flow data at the next moment.
7. The method of claim 6, wherein the step of calculating the reward function comprises:
calculating a first loss value of the real-time traffic prediction data and the network traffic data of the last time seriesAnd a second loss value of the real-time traffic prediction data and the network traffic data of the next time series +.>Obtaining a flow standardized difference value between the first loss value and the second loss value;
calculating a standardized difference value between the total network load of the previous time sequence and the total network load of the current time sequence to obtain a load standardized difference value;
And calculating a reward function according to the flow standardization difference value and the load standardization difference value.
8. The method according to claim 6 or 7, wherein the network traffic data acquisition policy is configured to: and setting a plurality of groups of acquisition time intervals, wherein the acquisition time intervals of each group are sequentially increased.
9. A network traffic real-time prediction system, comprising:
the data acquisition module is used for acquiring historical network flow data and real-time network flow data of each network node; wherein the historical network traffic data comprises: historical network flow data acquired in the data set and/or historical network flow data acquired by each network node are disclosed;
the offline data prediction module is used for training a preset neural network model by utilizing the historical network flow data and the real-time network flow data to obtain offline flow prediction data output by the offline flow prediction model after training;
the online data prediction module is used for online learning the historical network flow data and the real-time network flow data by utilizing an online machine learning platform to obtain online flow prediction data;
The data integration module is used for obtaining real-time flow prediction data based on the offline flow prediction data and the online flow prediction data;
the preset neural network model comprises the following steps: at least one graph convolution neural network and at least one 3D convolution network; each graph convolution neural network is connected with each 3D convolution network in a one-to-one correspondence manner, and each 3D convolution network is connected in turn; presetting a neural network model, constructing a training set according to the historical network flow data, constructing a testing set according to the real-time network flow data, and training to obtain a preliminary offline flow prediction model; wherein the training set comprises: a network topology and a network traffic matrix;
the preliminary offline flow prediction model builds a training set and a testing set according to real-time network flow data, and the offline flow prediction model is obtained through training;
and inputting the real-time network flow data into the offline flow prediction model after training is completed, and obtaining offline flow prediction data output by the offline flow prediction model.
CN202211029883.XA 2022-08-25 2022-08-25 Network traffic real-time prediction method and system Active CN115499344B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211029883.XA CN115499344B (en) 2022-08-25 2022-08-25 Network traffic real-time prediction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211029883.XA CN115499344B (en) 2022-08-25 2022-08-25 Network traffic real-time prediction method and system

Publications (2)

Publication Number Publication Date
CN115499344A CN115499344A (en) 2022-12-20
CN115499344B true CN115499344B (en) 2024-03-19

Family

ID=84466262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211029883.XA Active CN115499344B (en) 2022-08-25 2022-08-25 Network traffic real-time prediction method and system

Country Status (1)

Country Link
CN (1) CN115499344B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851782A (en) * 2019-11-12 2020-02-28 南京邮电大学 Network flow prediction method based on lightweight spatiotemporal deep learning model
CN111638988A (en) * 2019-04-28 2020-09-08 上海伽易信息技术有限公司 Cloud host fault intelligent prediction method based on deep learning
CN111970163A (en) * 2020-06-30 2020-11-20 网络通信与安全紫金山实验室 Network flow prediction method of LSTM model based on attention mechanism
CN111970206A (en) * 2020-08-21 2020-11-20 北京浪潮数据技术有限公司 FC network flow control method, device and related components
CN113259313A (en) * 2021-03-30 2021-08-13 浙江工业大学 Malicious HTTPS flow intelligent analysis method based on online training algorithm
CN114548592A (en) * 2022-03-01 2022-05-27 重庆邮电大学 Non-stationary time series data prediction method based on CEMD and LSTM

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11134016B2 (en) * 2018-10-26 2021-09-28 Hughes Network Systems, Llc Monitoring a communication network
US20220110021A1 (en) * 2020-10-05 2022-04-07 Continual Ltd. Flow forecasting for mobile users in cellular networks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111638988A (en) * 2019-04-28 2020-09-08 上海伽易信息技术有限公司 Cloud host fault intelligent prediction method based on deep learning
CN110851782A (en) * 2019-11-12 2020-02-28 南京邮电大学 Network flow prediction method based on lightweight spatiotemporal deep learning model
CN111970163A (en) * 2020-06-30 2020-11-20 网络通信与安全紫金山实验室 Network flow prediction method of LSTM model based on attention mechanism
CN111970206A (en) * 2020-08-21 2020-11-20 北京浪潮数据技术有限公司 FC network flow control method, device and related components
CN113259313A (en) * 2021-03-30 2021-08-13 浙江工业大学 Malicious HTTPS flow intelligent analysis method based on online training algorithm
CN114548592A (en) * 2022-03-01 2022-05-27 重庆邮电大学 Non-stationary time series data prediction method based on CEMD and LSTM

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NFV场景下的网络流量预测和SFC映射算法研究;虞丰檑;中国优秀硕士学位论文全文数据库;I136-495 *
基于Optima的实时在线交通流预测方法研究;李颖宏;潘佳琪;;交通运输系统工程与信息(第02期);123-129 *

Also Published As

Publication number Publication date
CN115499344A (en) 2022-12-20

Similar Documents

Publication Publication Date Title
CN110851782B (en) Network flow prediction method based on lightweight space-time deep learning model
CN104217258B (en) A kind of electric load sigma-t Forecasting Methodology
CN104463351A (en) Communication bandwidth prediction method and device based on power business requirements
CN113486078B (en) Distributed power distribution network operation monitoring method and system
Tian et al. Predictive compensation for variable network delays and packet losses in networked control systems
Gao et al. Frequentist model averaging for threshold models
Xu et al. Data-driven coordination of distributed energy resources for active power provision
WO2023273837A1 (en) Model training method and apparatus, traffic prediction method and apparatus, traffic load balancing method and apparatus, and storage medium
Lee et al. Energy consumption prediction system based on deep learning with edge computing
CN104486248A (en) AQM (Active Queue Management) system and method based on generalized PID (Proportion Integration Differentiation) random early detection algorithm
CN115499344B (en) Network traffic real-time prediction method and system
Li et al. Adaptive event-triggered finite-time H∞ control for fuzzy semi-Markovian jump systems with immeasurable premise variables
CN114219177A (en) Computer room environment regulation and control method and device, electronic equipment and storage medium
CN111679970A (en) Robot software system running environment state prediction method
CN116455820A (en) Multi-transmission path adjustment system and method based on congestion avoidance
Rafik et al. Learning and Predictive Energy Consumption Model based on LSTM recursive neural networks
Ramirez et al. Bayesian analysis of a queueing system with a long-tailed arrival process
CN110753366A (en) Prediction processing method and device for industry short message gateway capacity
CN114638421A (en) Method for predicting requirement of generator set spare parts
CN114915563A (en) Network flow prediction method and system
Chen et al. Structure-enhanced deep reinforcement learning for optimal transmission scheduling
Zipkin Processing networks with planned inventories: Tandem queues with feedback
Herzallah et al. Robust control of nonlinear stochastic systems by modelling conditional distributions of control signals
LIU et al. A pca-lstm model for stock index prediction
CN117557870B (en) Classification model training method and system based on federal learning client selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant