CN116866202A

CN116866202A - Network traffic prediction method and device and storage medium

Info

Publication number: CN116866202A
Application number: CN202310826327.3A
Authority: CN
Inventors: 段含婷; 张乐; 吴艳芹; 杨昊
Original assignee: China Telecom Technology Innovation Center; China Telecom Corp Ltd
Current assignee: China Telecom Technology Innovation Center; China Telecom Corp Ltd
Priority date: 2023-07-06
Filing date: 2023-07-06
Publication date: 2023-10-10

Abstract

The disclosure provides a network traffic prediction method and device and a storage medium, and relates to the field of big data. The method comprises the following steps: acquiring network flow data of a plurality of historical moments to be processed; processing network flow data at a plurality of historical moments by using a first neural network formed by a plurality of scalar neurons to obtain first characteristic information; processing the first characteristic information by using a second neural network formed by a plurality of vector neurons to obtain second characteristic information; and processing the second characteristic information by using a third neural network with a long-short-term memory structure to obtain predicted network flow data at a future time. Thereby better adapting to long-term predictions of network traffic with unstable characteristics.

Description

Network traffic prediction method and device and storage medium

Technical Field

The disclosure relates to the field of big data, and in particular relates to a network traffic prediction method and device and a storage medium.

Background

With the increasing size of servers in the telecommunications industry, the number of users and business data is increasing, and the telecommunications industry needs to properly plan network traffic resources. The flow prediction technology is to build a flow prediction model to predict future flow data according to the current flow data, so that reasonable allocation of server resources can be performed in advance, and the resource utilization rate of the server is improved efficiently.

In some related art, an autoregressive model (autoregressive model, AR), a moving average model (moving average model, MA), an autoregressive moving average model (autoregressive integrated moving average model, ARIMA), or the like is constructed as a flow prediction model.

The flow prediction model can better process linear time series data. However, the actual traffic data of the telecommunication network shows complex variations, the variation process has instability, and surge or dip variations may occur. The above-described flow prediction mode has difficulty in accurately predicting nonlinear time-series data.

Disclosure of Invention

Aiming at the characteristic of network traffic instability, the embodiment of the disclosure processes network traffic data at a plurality of historical moments by using a first neural network formed by a plurality of scalar neurons to obtain first characteristic information related to a time sequence, then processes the first characteristic information by using a second neural network formed by a plurality of vector neurons to further obtain second characteristic information related to time and space, so that the network has rich characteristic learning capability, and finally processes the second characteristic information by using a third neural network with a long-short-term memory structure to obtain network traffic data at a predicted future moment, thereby having better prediction performance in a longer time sequence and being better suitable for long-term prediction of network traffic.

Some embodiments of the present disclosure provide a network traffic prediction method, including: acquiring network flow data of a plurality of historical moments to be processed; processing network flow data at a plurality of historical moments by using a first neural network formed by a plurality of scalar neurons to obtain first characteristic information; processing the first characteristic information by using a second neural network formed by a plurality of vector neurons to obtain second characteristic information; and processing the second characteristic information by using a third neural network with a long-short-term memory structure to obtain predicted network flow data at a future time.

In some embodiments, obtaining network traffic data for a plurality of historical moments to be processed includes: constructing a plurality of feature vectors according to word frequency-inverse document word frequency information of the network flow data at each historical moment in the historical period, and calculating the distance between different feature vectors; determining a feature vector with maximum word frequency-inverse document word frequency information as a first initial clustering center; determining other initial clustering centers according to the sum of the distances from each feature vector to the existing initial clustering centers; clustering the plurality of feature vectors according to the determined plurality of initial clustering centers; and compressing the network traffic data at each historical moment in the historical period based on the clustering result to obtain the network traffic data at a plurality of historical moments to be processed.

In some embodiments, determining additional initial cluster centers based on the sum of the distances of each feature vector from the existing initial cluster centers comprises: and determining the feature vector with the largest sum of the distances to the existing initial cluster centers as the next initial cluster center, and iteratively executing the step of determining the next initial cluster center until the preset number of initial cluster centers are determined.

In some embodiments, compressing the network traffic data at each historical time in the historical period based on the clustering result to obtain the network traffic data at the plurality of historical times to be processed includes: and extracting part of network traffic data from the network traffic data of each historical moment of the same cluster, and taking the network traffic data extracted from each cluster as the network traffic data of a plurality of historical moments to be processed.

In some embodiments, obtaining network traffic data for a plurality of historical moments to be processed further comprises: evaluating the clustering result according to the inter-class distance between clusters and the intra-class distance in the clusters; and under the condition that the evaluation result shows that the clustering result does not meet the clustering requirement, changing the number of initial clustering centers, re-determining the initial clustering centers and re-clustering.

In some embodiments, processing the network traffic data at the plurality of historical moments using a first neural network comprised of a plurality of scalar neurons, the obtaining first characteristic information includes: processing the network flow data at a plurality of historical moments by utilizing a plurality of convolution kernels with different scales in the first neural network, and splicing output results of the convolution kernels; and determining first characteristic information according to the output results of the plurality of spliced convolution kernels.

In some embodiments, determining the first characteristic information based on the output results of the concatenated plurality of convolution kernels comprises: and processing the output results of the plurality of spliced convolution kernels by utilizing a multi-path residual error network in the first neural network to obtain first characteristic information, wherein the convolution kernels adopted by different paths in the multi-path residual error network have different scales.

In some embodiments, processing the first characteristic information using a second neural network comprised of a plurality of vector neurons, the deriving the second characteristic information includes: and processing the first characteristic information by using a capsule network to obtain second characteristic information.

In some embodiments, processing the second characteristic information using a third neural network having a long-short term memory structure, the obtaining network traffic data for the predicted future time comprises: and processing the second characteristic information by using a long-short-term memory network to obtain predicted network flow data at the future time.

In some embodiments, the first characteristic information comprises time-series characteristic information and the second characteristic information comprises spatial characteristic information.

Some embodiments of the present disclosure provide a network traffic prediction apparatus, including: an acquisition unit configured to acquire network traffic data at a plurality of history times to be processed; a first processing unit configured to process network traffic data at the plurality of historical moments by using a first neural network composed of a plurality of scalar neurons to obtain first characteristic information; a second processing unit configured to process the first characteristic information by using a second neural network composed of a plurality of vector neurons, to obtain second characteristic information; and the prediction unit is configured to process the second characteristic information by using a third neural network with a long-short-term memory structure to obtain predicted network traffic data at a future time.

In some embodiments, the acquisition unit is configured to: constructing a plurality of feature vectors according to word frequency-inverse document word frequency information of the network flow data at each historical moment in the historical period, and calculating the distance between different feature vectors; determining a feature vector with maximum word frequency-inverse document word frequency information as a first initial clustering center; determining other initial clustering centers according to the sum of the distances from each feature vector to the existing initial clustering centers; clustering the plurality of feature vectors according to the determined plurality of initial clustering centers; and compressing the network traffic data at each historical moment in the historical period based on the clustering result to obtain the network traffic data at a plurality of historical moments to be processed.

In some embodiments, the first neural network includes a plurality of convolution kernels with different scales, or the first neural network includes a plurality of convolution kernels with different scales and a multi-path residual network cascaded with the output ends of the plurality of convolution kernels, where the scales of the convolution kernels adopted by different paths in the multi-path residual network are different; alternatively, the second neural network is a capsule network; or, the third neural network is a long-term and short-term memory network.

Some embodiments of the present disclosure provide a network traffic prediction apparatus, including: a memory; and a processor coupled to the memory, the processor configured to perform a network traffic prediction method based on instructions stored in the memory.

Some embodiments of the present disclosure propose a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of a network traffic prediction method.

Drawings

The drawings that are required for use in the description of the embodiments or the related art will be briefly described below. The present disclosure will be more clearly understood from the following detailed description with reference to the accompanying drawings.

It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without inventive faculty.

Fig. 1 illustrates a schematic diagram of a network traffic prediction model of some embodiments of the present disclosure.

Fig. 2 illustrates a schematic diagram of a first neural network and its multi-scale convolution kernels according to some embodiments of the present disclosure.

Fig. 3 illustrates a schematic diagram of a multipath residual network of some embodiments of the present disclosure.

Fig. 4 shows a schematic diagram of a capsule network of some embodiments of the present disclosure.

Fig. 5 shows a schematic diagram of one cell of an LSTM network in accordance with some embodiments of the present disclosure.

Fig. 6 illustrates a schematic diagram of a network traffic prediction method of some embodiments of the present disclosure.

Fig. 7 shows a schematic diagram of raw flow data used in the experiments of the present disclosure.

FIG. 8 shows a comparative schematic of a baseline model and the model of the present disclosure with respect to MAE.

Fig. 9 shows a comparative schematic of the baseline model and the model of the present disclosure with respect to RMSE.

FIG. 10 shows a comparative schematic of the baseline model and the model of the present disclosure with respect to MAPE.

FIG. 11 shows a comparison of a baseline model and the model of the present disclosure with respect to accuracy.

Fig. 12 illustrates a schematic diagram of a network traffic prediction device according to some embodiments of the present disclosure.

Fig. 13 illustrates a schematic diagram of a network traffic prediction device according to some embodiments of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure.

Unless specifically stated otherwise, the descriptions of "first," "second," and the like in this disclosure are used for distinguishing between different objects and are not used for indicating a meaning of size or timing, etc.

As shown in fig. 1, the network traffic prediction model of this embodiment includes: a first neural network composed of a plurality of scalar neurons, a second neural network composed of a plurality of vector neurons, a third neural network having a long-short term memory structure. The input end of the first neural network is used as the input end of the network flow prediction model, the output end of the first neural network is connected with the input end of the second neural network, the output end of the second neural network is connected with the input end of the third neural network, and the output end of the third neural network is used as the output end of the network flow prediction model. The first neural network comprises a plurality of convolution kernels with different scales, and the first neural network can also comprise a multi-path residual error network cascaded with the output ends of the convolution kernels according to requirements, wherein the scales of the convolution kernels adopted by different paths in the multi-path residual error network are different. The second neural network is, for example, a capsule network. The third neural network is, for example, a long short term memory (Long Short Term Memory, LSTM) network. Thus, a flow prediction model based on time sequence feature extraction and fusion of time-space information is constructed.

The various portions of the network traffic prediction model are described in detail below.

As shown in fig. 2, the first neural network of this embodiment includes a plurality of convolution kernels of different scales, and may further include a multi-path residual network as needed. The convolution kernels are spliced in a parallel mode, and all branch networks are connected in a channel mode. The outputs of the plurality of convolution kernels concatenate the inputs of the multi-path residual network. The input data enters a multi-path residual error network to learn the characteristics of time sequence data after multi-scale convolution.

Because the receptive field of convolution operation is limited, feature extraction of time sequence data by adopting a fixed convolution scale has great limitation, and the multi-scale convolution structure can extract more abundant and comprehensive feature information, so that the prediction accuracy of network flow is improved.

For example, the features of different levels of data are extracted from the input by using convolution kernels of three scales of 1×1, 5×5 and 7×7 (respectively identified by conv1×1, conv5×5 and conv7×7), so that the network traffic prediction model extracts layering information among a plurality of perception domains in a parallel mode, and global feature information is extracted more comprehensively.

As shown in fig. 3, the multi-path residual network of this embodiment includes a plurality of residual blocks, each of which can be regarded as one path of the residual network, and the residual blocks of different paths employ different scales of convolution kernels. For example, the scale of the convolution kernels employed by the 3 residual blocks from top to bottom in fig. 3 are 3×3, 5×5, and 7×7, respectively, e.g., conv1 represents the 3×3 convolution kernel, conv2 represents the 5×5 convolution kernel, and conv3 represents the 7×7 convolution kernel. Each residual block contains a skip connection structure. In the residual block, the upper branch is the residual part and the lower branch is the direct mapped part. The residual portion may include multiple convolution operations, such as two or three convolution operations. Batch normalization (Batch Normalization, BN) may be performed between different convolution operations. The direct mapping section may not perform any operations. And performing an addition operation on the output of the residual part and the output of the direct mapping part in the residual block as the output of the residual block. And performing addition operation on the outputs of the residual blocks as outputs of the multi-path residual network. And each residual block adopts a mode of combining a global residual self-learning method and a local residual self-learning method to share weights, so that the feature extraction accuracy is improved. The multipath residual network may use, for example, a PReLU (Parametric Rectified Linear Unit, parameterized modified linear unit) as the activation function.

If the training speed of the convolution layer is found to be very slow or the problems of very slow network convergence speed and the like exist, the convolution layer with too many parameters can be directly skipped by a skip connection (skip connection) mode. The residual network structure has the advantage that the convergence speed and the training effect of the whole network are not affected because the convolution layer is slowly converged.

The multi-path residual error network realizes the fusion of low-layer information and high-layer information, so that the network has more abundant characteristic learning capability and the learning efficiency of the network is improved.

As shown in fig. 4, the capsule network has vector neurons, unlike scalar neurons of a conventional neural network. The capsule network converts the scalar form of each neuron into a vector form for spatial position storage, the length of the vector represents the probability of the existence of the feature, the direction of the vector represents the gesture information of the feature, and finally the vector form is packaged into the capsule form output information. The capsule network encodes both spatial information and object existence probabilities in a capsule vector.

In the process of converting the shallow capsules into the deep capsules, the corresponding relation among different capsules can be established in a dimension increasing mode, and the weight is updated through a dynamic routing algorithm in the process.

The process of updating the weight value of the capsule network is mainly divided into four stages.

The first stage input vector is multiplied by an affine matrix to obtain a predicted output vector, and the formula is as follows:

u _j|i ＝W _ij u _i (1)

wherein u is _i Represents the bottom layer capsules (i=1, 2,., n), n represents the number of capsules, W _ij Affine matrix representing positional relationship, u _j|i Representing the output prediction vector.

The second stage adds the predictive vectors in a weighted manner to obtain an input vector s _j The formula is as follows:

s _j ＝∑ _i c _ij u _j|i (2)

wherein c _ij Is a coupling coefficient satisfying the sum _i c _ij =1 according to b _ij The factor is obtained, and the calculation formula is as follows:

the third stage uses a nonlinear activation function squaring to dynamically compress the vector into the (0, 1) interval as follows:

wherein v is _j Representing the compressed output vector of the squaring function.

Fourth stage updating b with dynamic routing algorithm _ij And c _ij The optimal output vector formula is as follows:

b _ij ＝b _ij +v _j u _j|i (5)

wherein b _ij Representing the logarithmic prior probability, the initial value is set to 0.

As shown in fig. 5, the inputs to one cell of the LSTM network are: c (C) _t-1 、h _t-1 、x _t The cell state and hidden layer state at time t-1 and the input and output at time t are respectively: c (C) _t 、h _t The cell state and the hidden layer state at time t are shown, respectively. One cell (cell) of the LSTM network has 3 gates, called a forget gate, an input gate, and an output gate, respectively. The forgetting gate throws away the cell state by sigmoid (σ) manipulation. The input gate stores cell state by sigmoid (sigma) and tanh operations And pass through C _t-1 Output f of forgetting gate _t Element multiplication (x) operation of (a) and output of element multiplication operation and +.>Is added (+) to obtain updated cell state C _t . The output gate outputs the hidden layer state h at the time t through a sigmoid (sigma) operation, an element multiplication (X) operation and a tanh operation _t 。

The LSTM's "constant error flow" mechanism keeps the error between time steps at a more constant level, making it possible for the network to learn causal relationships over long distances.

After the network traffic prediction model is constructed, the network traffic prediction model may be trained. Forming training data based on historical network flow data, inputting the training data into a network flow prediction model, determining loss based on a gap between a network flow prediction value output by the network flow prediction model and a network flow label value provided by the training data, and updating parameters of the network flow prediction model based on the loss until the network flow prediction model meeting certain requirements is obtained.

After training the network traffic prediction model, network traffic prediction may be performed based on the trained network traffic prediction model.

As shown in fig. 6, the network traffic prediction method of this embodiment includes, for example, steps 610-650, where the data compression processing of step 610 may be selectively performed or not performed according to the size of the data amount, and if the data amount is relatively large, step 610 may be performed, and if the data amount is relatively small, step 610 may not be performed.

In step 610, the raw historical network traffic data obtained is compressed.

Clustering the original historical network flow data, and compressing the original historical network flow data based on a clustering result. For example, the collected network traffic data at each historical moment in the preset historical period is taken as the original historical network traffic data.

In some embodiments, the clustering and compression process includes:

a: and constructing a plurality of feature vectors according to word Frequency-inverse document word Frequency (Term Frequency-Inverse Document Frequency, TF-IDF) information of the network flow data at each historical moment in the historical period, and calculating the distance between different feature vectors.

The network traffic data at each historical moment is regarded as a word, and then the TF-IDF value of the network traffic data at each historical moment is calculated according to the TF-IDF calculation method. Each TF-IDF value is constructed as a feature vector. And calculating the distance between different feature vectors by using the cosine similarity.

B: and determining the feature vector with the largest word frequency-inverse document word frequency information as a first initial clustering center. Thus, a first initial cluster center is determined based on the TF-IDF density peak.

C: and determining other initial clustering centers according to the sum of the distances from each feature vector to the existing initial clustering centers.

For example, the feature vector having the largest sum of distances to the existing initial cluster centers is determined as the next initial cluster center, and the step of determining the next initial cluster center is iteratively performed until a preset number of initial cluster centers are determined.

D: and clustering the plurality of feature vectors according to the determined plurality of initial clustering centers.

After determining the plurality of initial cluster centers, the plurality of feature vectors may be clustered according to a conventional clustering algorithm, such as the K-Means algorithm.

E: evaluating the clustering result according to the inter-class distance between clusters and the intra-class distance in the clusters; and (3) changing the number of initial clustering centers to redefine the initial clustering centers (i.e. re-executing step C) and re-clustering (i.e. re-executing step D) under the condition that the evaluation result shows that the clustering result does not meet the clustering requirement.

The formulation of the evaluation method is as follows:

wherein DVI represents an evaluation value, d (i, j) represents an inter-class distance between clusters i, j;representing intra-class distances within cluster k. The inter-class distance d (i, j) may be any distance measure, such as the distance of the center points of two clusters; intra-class distance- >The measurement can be done in different ways, e.g. the maximum value of the distance between any two points in the cluster k. Min represents a minimum value and max represents a maximum value. The ratio of the minimum value of the inter-class distances to the maximum value of the intra-class distances is used as an evaluation value, so that the clustering result can achieve the effects of similar maximum and similar minimum. The larger the evaluation value DVI, the stronger the cluster validity.

F: and compressing the network traffic data at each historical moment in the historical period based on the clustering result to obtain the network traffic data at a plurality of historical moments to be processed.

And extracting part of network traffic data from the network traffic data of each historical moment of the same cluster, and taking the network traffic data extracted from each cluster as the network traffic data of a plurality of historical moments to be processed. Wherein, the extraction can adopt a random extraction mode.

Redundancy of input data is reduced by data compression.

In step 620, network traffic data for a plurality of historical moments to be processed is obtained.

If step 610 is performed, the compressed historical network traffic data is used as network traffic data of a plurality of historical moments to be processed; if step 610 is not performed, the original historical network traffic data (network traffic data at each historical time in the historical period) is used as the network traffic data at the plurality of historical times to be processed.

In step 630, the network traffic data at the plurality of historical moments is processed using a first neural network comprised of a plurality of scalar neurons to obtain first characteristic information.

Processing the network flow data at a plurality of historical moments by utilizing a plurality of convolution kernels with different scales in the first neural network, and splicing output results of the convolution kernels; and determining first characteristic information according to the output results of the plurality of spliced convolution kernels. The first characteristic information includes time-series characteristic information. And extracting time sequence features of different scales through the multi-scale convolution kernel.

Determining the first characteristic information according to the output results of the spliced convolution kernels comprises: and processing the output results of the plurality of spliced convolution kernels by utilizing a multi-path residual error network in the first neural network to obtain first characteristic information, wherein the convolution kernels adopted by different paths in the multi-path residual error network have different scales. The multi-path residual error network realizes the fusion of low-layer information and high-layer information, so that the network has more abundant characteristic learning capability and the learning efficiency of the network is improved.

In step 640, the first characteristic information is processed using a second neural network comprising a plurality of vector neurons to obtain second characteristic information.

And processing the first characteristic information by using a capsule network to obtain second characteristic information. The second characteristic information includes spatial characteristic information. Thereby obtaining the characteristic information of time and space correlation.

In step 650, the second characteristic information is processed by using a third neural network with a long-short term memory structure, so as to obtain predicted network traffic data at a future time.

And processing the second characteristic information by using a long-short-term memory network to obtain predicted network flow data at the future time.

According to the embodiment of the disclosure, firstly, clustering is carried out before traffic data are input into a network, then, a new learning sample is extracted from a clustering result, and redundancy of the input data is reduced through clustering. And then, the multi-scale convolution and multi-path residual error model can efficiently select reasonable network parameters, and is beneficial to improving the precision and training time of the model. The capsule network utilizes the combination of the thought of vector neurons and a dynamic routing algorithm to convert the output characteristic diagram of the multi-scale convolution and multi-path residual error network into a capsule and then predicts the capsule to obtain local characteristics and global characteristics so as to extract different spatial characteristic information. The LSTM network model has three different gate structures and memory cells mutually regulated, so that the change characteristics of flow data are learned, and history information needing to be stored and discarded is determined by constructing a forgetting gate, an input gate and an output gate together, so that the problems of gradient dispersion and gradient explosion frequently occurring in time sequence training are solved to the greatest extent.

Traffic data is one of typical time series data, and has important reference bases in aspects of system performance management, risk prevention, anomaly detection and the like. The experimental data is traffic data of a certain telecommunication network, the base station traffic data from 9 months 1 day to 10 months 15 days is collected for experiment, training, test and verification, the data sampling interval is five minutes, and 1200 groups of data are extracted in total. Firstly, preprocessing data, cleaning and screening 880 groups of effective data, using 600 groups to complete training of a model, and using the rest 280 groups to test the performance of a network model, wherein actual raw flow data is shown in fig. 7.

To evaluate the performance of the various models, the following four evaluation indexes were selected for performance evaluation. Wherein x is _i Representing the original data value, x' _i And n represents the predicted value and n represents the number of time series data objects.

Root mean square error (Root Mean Square Error, RMSE) is defined as the square average squared difference of the difference between the predicted and actual values, and is given by:

the mean absolute error (Mean Absolute Error, MAE) is defined as the absolute value of the mean error, and is given by:

the mean absolute percentage error (Mean Absolute Percentage Error, MAPE) is defined as the degree of deviation of the predicted value from the true value, which is a percentage, expressed as follows:

accuracy (Accuracy) is defined as the Accuracy of the prediction, as follows:

the evaluation method is detailed in table 1 below:

table 1 evaluation method

Method name	Prediction result evaluation angle	Range	Optimal prediction
				RMSE	Root mean square error	[0,+∞)	0
MAE	Average absolute error	[0,+∞)	0
				MAPE	Average absolute percentage error	[0,+∞)	0％
Accuracy	Accuracy of prediction	[0,1]	1

Predictive performance comparisons were performed in real data sets using baseline models (ARIMA, multi-scale convolutional networks, LSTM, and capsule networks) with the models of the present disclosure. The three future predictions of 20min (min), 40min and 60min were averaged. Wherein, RMSE, MAE and MAPE reflect prediction errors, smaller values indicate better prediction results, while Accuracy indicates prediction Accuracy, and larger values indicate better prediction effects.

To analyze the effectiveness of the disclosed model in overall prediction, experiments were performed in some baseline models using the same dataset, with the assessment indicators visually presented as predicted times, and the data are detailed in table 2.

Table 2 model predictive performance comparison

Through verification, the model disclosed by the invention has applicability and higher accuracy. The columns from left to right in the histogram clusters in fig. 8-11 represent ARIMA, multi-scale convolution, LSTM, capsule network, and the model of the present disclosure in that order.

FIG. 8 shows a comparative schematic of a baseline model and the model of the present disclosure with respect to MAE. It can be observed that at 20min of model training, the mean absolute error MAE of the model of the present disclosure was reduced by 27.9%, 20.5%, 24.4% and 22.6% over the MAE values of ARIMA, multi-scale convolution, LSTM and capsule network, respectively. Similarly, MAE was reduced by 23.6%, 21.6%, 6.8% and 14.8% in the 40min predictive task, respectively. The MAE values obtained in training with different models were reduced by 47.8%, 42.8%, 36.1% and 34%, respectively, in the 60min prediction task.

Fig. 9 shows a comparative schematic of the baseline model and the model of the present disclosure with respect to RMSE. As is evident from the graph, the RMSE values of the disclosed model were significantly lower than the baseline model, the disclosed model was reduced by 0.172 compared to the ARIMA model, by about 0.142 compared to the multi-scale convolution model, by about 0.102 compared to the LSTM model, and by about 0.112 compared to the capsule network. In the 60min prediction task, RMSE indicators were reduced by 46.5%, 41.8%, 36.1% and 34%, respectively. The trend of data decline suggests that the disclosed model is better able to capture time dependence. This fully demonstrates that the disclosed model has better performance in network traffic prediction for time-space characteristics.

FIG. 10 shows a comparative schematic of the baseline model and the model of the present disclosure with respect to MAPE. Experimental data show that the average absolute percentage error of the model disclosed by the disclosure is minimum in the training process of one hour, and the prediction accuracy is higher.

FIG. 11 shows a comparison of a baseline model and the model of the present disclosure with respect to accuracy. From observation, the accuracy of the model disclosed in the present disclosure is highest in the training time range from 20min to 60min, and the accuracy of the model disclosed in the present disclosure is as high as 0.87 at the training time of 60 min. The trend of these data is a significant indication that the model of the present disclosure is more advantageous than the baseline model.

The ARIMA network cannot better utilize time sequence information, the predicted value and the actual measured value are not consistent, and particularly, the error is larger when the wave crest and the wave trough are formed, so that the deviation between the predicted result and the actual situation is larger. The architecture of a single model such as a multi-scale convolution network and a capsule network is easy to cause the problems of gradient disappearance and gradient explosion, and the multi-scale convolution and the capsule network can only reflect the approximate trend of flow data, so that the data which are required to be input by the prediction model have time correlation. When the network model provided by the disclosure predicts network flow data, flow fluctuation can be better handled, and compared with other models, the prediction curve of the network model is closer to the real flow data curve, the prediction error is smaller, and the network model has better prediction effect.

Fig. 12 illustrates a schematic diagram of a network traffic prediction device according to some embodiments of the present disclosure. As shown in fig. 12, the network traffic prediction apparatus 1200 of this embodiment includes:

an acquiring unit 1210 configured to acquire network traffic data at a plurality of history times to be processed;

a first processing unit 1220 configured to process the network traffic data at the plurality of historical moments with a first neural network composed of a plurality of scalar neurons, resulting in first feature information;

a second processing unit 1230 configured to process the first characteristic information using a second neural network composed of a plurality of vector neurons, resulting in second characteristic information;

a prediction unit 1240, configured to process the second characteristic information by using a third neural network with a long-short term memory structure, so as to obtain predicted network traffic data at a future time.

In some embodiments, the acquisition unit 1210 is configured to:

constructing a plurality of feature vectors according to word frequency-inverse document word frequency information of the network flow data at each historical moment in the historical period, and calculating the distance between different feature vectors;

determining a feature vector with maximum word frequency-inverse document word frequency information as a first initial clustering center;

Determining other initial clustering centers according to the sum of the distances from each feature vector to the existing initial clustering centers;

clustering the plurality of feature vectors according to the determined plurality of initial clustering centers;

and compressing the network traffic data at each historical moment in the historical period based on the clustering result to obtain the network traffic data at a plurality of historical moments to be processed.

In some embodiments, the first neural network includes a plurality of convolution kernels of different scales, or the first neural network includes a plurality of convolution kernels of different scales and a multi-path residual network cascaded with the outputs of the plurality of convolution kernels, where the scales of the convolution kernels employed by different paths in the multi-path residual network are different.

In some embodiments, the second neural network is a capsule network;

in some embodiments, the third neural network is a long-term short-term memory network.

Fig. 13 illustrates a schematic diagram of a network traffic prediction device according to some embodiments of the present disclosure. As shown in fig. 13, the network traffic prediction apparatus 1300 of this embodiment includes: a memory 1310 and a processor 1320 coupled to the memory 1310, the processor 1320 being configured to perform the network traffic prediction method of any of the embodiments described above based on instructions stored in the memory 1310.

The apparatus 1300 may also include an input/output interface 1330, a network interface 1340, a storage interface 1350, and the like. These interfaces 1330, 1340, 1350 and between memory 1310 and processor 1320 may be connected, for example, by bus 1360.

The memory 1310 may include, for example, system memory, fixed nonvolatile storage media, and so forth. The system memory stores, for example, an operating system, application programs, boot Loader (Boot Loader), and other programs.

Processor 1320 may be implemented as discrete hardware components such as a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA), or other programmable logic device, discrete gates, or transistors.

The input/output interface 1330 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. Network interface 1340 provides a connection interface for various networking devices. Storage interface 1350 provides a connection interface for external storage devices such as SD cards, U discs, and the like. Bus 1360 may employ any of a variety of bus structures. For example, bus structures include, but are not limited to, an industry standard architecture (Industry Standard Architecture, ISA) bus, a micro channel architecture (Micro Channel Architecture, MCA) bus, and a peripheral component interconnect (Peripheral Component Interconnect, PCI) bus.

(1) A network traffic prediction method, comprising:

acquiring network flow data of a plurality of historical moments to be processed;

processing network flow data at a plurality of historical moments by using a first neural network formed by a plurality of scalar neurons to obtain first characteristic information;

processing the first characteristic information by using a second neural network formed by a plurality of vector neurons to obtain second characteristic information;

and processing the second characteristic information by using a third neural network with a long-short-term memory structure to obtain predicted network flow data at a future time.

(2) According to (1), acquiring network traffic data for a plurality of historical moments to be processed includes:

(3) According to (2), determining further initial cluster centers from the sum of distances of each feature vector to the existing initial cluster centers comprises: and determining the feature vector with the largest sum of the distances to the existing initial cluster centers as the next initial cluster center, and iteratively executing the step of determining the next initial cluster center until the preset number of initial cluster centers are determined.

(4) According to (2-3), compressing the network traffic data at each of the history times in the history period based on the clustering result to obtain the network traffic data at the plurality of history times to be processed includes: and extracting part of network traffic data from the network traffic data of each historical moment of the same cluster, and taking the network traffic data extracted from each cluster as the network traffic data of a plurality of historical moments to be processed.

(5) According to (2-4), obtaining network traffic data for a plurality of historical moments to be processed further comprises: evaluating the clustering result according to the inter-class distance between clusters and the intra-class distance in the clusters; and under the condition that the evaluation result shows that the clustering result does not meet the clustering requirement, changing the number of initial clustering centers, re-determining the initial clustering centers and re-clustering.

(6) According to (1-5), processing the network traffic data at the plurality of historical moments using a first neural network comprised of a plurality of scalar neurons, the obtaining first characteristic information comprising: processing the network flow data at a plurality of historical moments by utilizing a plurality of convolution kernels with different scales in the first neural network, and splicing output results of the convolution kernels; and determining first characteristic information according to the output results of the plurality of spliced convolution kernels.

(7) According to (6), determining the first characteristic information from the output results of the plurality of concatenated convolution kernels comprises: and processing the output results of the plurality of spliced convolution kernels by utilizing a multi-path residual error network in the first neural network to obtain first characteristic information, wherein the convolution kernels adopted by different paths in the multi-path residual error network have different scales.

(8) According to (1-7), processing the first characteristic information using a second neural network composed of a plurality of vector neurons, to obtain second characteristic information includes: and processing the first characteristic information by using a capsule network to obtain second characteristic information.

(9) According to (1-8), processing the second characteristic information using a third neural network having a long-short term memory structure, to obtain predicted network traffic data at a future time comprises: and processing the second characteristic information by using a long-short-term memory network to obtain predicted network flow data at the future time.

(10) According to (1-9), the first characteristic information comprises time-series characteristic information and the second characteristic information comprises spatial characteristic information.

It will be appreciated by those skilled in the art that embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more non-transitory computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing description of the preferred embodiments of the present disclosure is not intended to limit the disclosure, but rather to enable any modification, equivalent replacement, improvement or the like, which fall within the spirit and principles of the present disclosure.

Claims

1. A method for predicting network traffic, comprising:

2. The method of claim 1, wherein obtaining network traffic data for a plurality of historical moments to be processed comprises:

3. The method of claim 2, wherein determining additional initial cluster centers based on the sum of distances of each feature vector from an existing initial cluster center comprises:

determining the feature vector with the largest sum of the distances to the existing initial cluster centers as the next initial cluster center,

the step of determining the next initial cluster center is performed iteratively until a preset number of initial cluster centers are determined.

4. The method of claim 2, wherein compressing the network traffic data at each of the historical moments in the historical period based on the clustering result to obtain the network traffic data at the plurality of historical moments to be processed comprises:

and extracting part of network traffic data from the network traffic data of each historical moment of the same cluster, and taking the network traffic data extracted from each cluster as the network traffic data of a plurality of historical moments to be processed.

5. The method of claim 2, wherein obtaining network traffic data for a plurality of historical moments to be processed further comprises:

Evaluating the clustering result according to the inter-class distance between clusters and the intra-class distance in the clusters;

and under the condition that the evaluation result shows that the clustering result does not meet the clustering requirement, changing the number of initial clustering centers, re-determining the initial clustering centers and re-clustering.

6. The method of any of claims 1-5, wherein processing the network traffic data at the plurality of historical moments using a first neural network comprised of a plurality of scalar neurons to obtain first characteristic information comprises:

processing the network flow data at a plurality of historical moments by utilizing a plurality of convolution kernels with different scales in the first neural network, and splicing output results of the convolution kernels;

and determining first characteristic information according to the output results of the plurality of spliced convolution kernels.

7. The method of claim 6, wherein determining the first characteristic information based on the output of the concatenated plurality of convolution kernels comprises:

and processing the output results of the plurality of spliced convolution kernels by utilizing a multi-path residual error network in the first neural network to obtain first characteristic information, wherein the convolution kernels adopted by different paths in the multi-path residual error network have different scales.

8. The method of any of claims 1-5, wherein processing the first characteristic information using a second neural network of vector neurons to obtain second characteristic information comprises:

and processing the first characteristic information by using a capsule network to obtain second characteristic information.

9. The method of any of claims 1-5, wherein processing the second characteristic information using a third neural network having a long-term memory structure to obtain predicted network traffic data at a future time comprises:

10. The method of any of claims 1-5, wherein the first characteristic information comprises time-series characteristic information and the second characteristic information comprises spatial characteristic information.

11. A network traffic prediction apparatus, comprising:

an acquisition unit configured to acquire network traffic data at a plurality of history times to be processed;

a first processing unit configured to process network traffic data at the plurality of historical moments by using a first neural network composed of a plurality of scalar neurons to obtain first characteristic information;

A second processing unit configured to process the first characteristic information by using a second neural network composed of a plurality of vector neurons, to obtain second characteristic information;

and the prediction unit is configured to process the second characteristic information by using a third neural network with a long-short-term memory structure to obtain predicted network traffic data at a future time.

12. The apparatus of claim 11, wherein the acquisition unit is configured to:

13. The device according to claim 11 or 12, wherein,

the first neural network comprises a plurality of convolution kernels with different scales, or the first neural network comprises a plurality of convolution kernels with different scales and a multi-path residual error network cascaded with the output ends of the convolution kernels, wherein the scales of the convolution kernels adopted by different paths in the multi-path residual error network are different;

alternatively, the second neural network is a capsule network;

or, the third neural network is a long-term and short-term memory network.

14. A network traffic prediction device, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the network traffic prediction method of any of claims 1-10 based on instructions stored in the memory.

15. A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the network traffic prediction method of any of claims 1-10.