CN113905391B

CN113905391B - Integrated learning network traffic prediction method, system, equipment, terminal and medium

Info

Publication number: CN113905391B
Application number: CN202111135948.4A
Authority: CN
Inventors: 严灵毓; 赵羽茜; 王春枝; 夏金耀; 郑坤鹏; 周显敬
Original assignee: Wuhan Zhuoer Information Technology Co ltd; Hubei University of Technology
Current assignee: Wuhan Zhuoer Information Technology Co ltd; Hubei University of Technology
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2023-05-23
Anticipated expiration: 2041-09-27
Also published as: CN113905391A

Abstract

The invention belongs to the technical field of network management, and discloses an integrated learning network traffic prediction method, a system, equipment, a terminal and a medium, wherein the integrated learning network traffic prediction method comprises the following steps: constructing a network flow prediction model based on time and space; determining an integrated learning network flow prediction model framework structure based on a multi-layer perceptron; and carrying out network flow space-time modeling based on the multi-layer perceptron integrated learning, and obtaining a prediction result through an integrated learning network flow prediction model based on the multi-layer perceptron. The invention provides a network flow data prediction application research based on multi-layer perceptron integrated learning: 1) Modeling the space and time of the network flow data by using a convolutional neural network and a GRU gating unit; 2) An integrated learning network flow prediction model based on a multi-layer perceptron is provided; 3) Space-time modeling is introduced into the field of network traffic prediction for the first time. The method has high accuracy of the prediction result, adapts to the influence of complex factors, and is more accurate in control.

Description

Integrated learning network traffic prediction method, system, equipment, terminal and medium

Technical Field

The invention belongs to the technical field of network management, and particularly relates to an integrated learning network traffic prediction method, an integrated learning network traffic prediction system, integrated learning network traffic prediction equipment, an integrated learning network traffic prediction terminal and an integrated learning network traffic prediction medium.

Background

At present, with the rapid development of networks, service applications carried on the networks are increasingly abundant, the relation between life of people and mobile networks is gradually tight, and users using network traffic are in a situation of increasing year by year. Therefore, an important aspect of network management is real-time detection of network traffic, which can effectively grasp network traffic information and improve the running speed and utilization rate of the network. The problems of traffic overload, network congestion, network collapse and the like can be solved through network traffic prediction. The network structure is continuously optimized, the network performance is improved, and the high-precision network traffic prediction is particularly important.

The rapidly-increased user demands and network scale gradually increase the use of network traffic, and the data processing capacity of a network operator is improved while more base stations are established, so that accurate network traffic prediction is performed. However, the data volume of the network traffic is huge, and in the network traffic data set, the situation that the number of days of the port sample points is discontinuous occurs, and the reasons for the occurrence of the situation are newly added port routing, equipment fault maintenance and suspension service and the like. Therefore, a data set needs to be initially screened, discontinuous port samples are removed, and similar factors influencing network traffic prediction are numerous. Traditional network traffic predictions have failed to accommodate rapid network scale development and such complex data changes. Based on the current research situation at home and abroad in recent years, the related documents of network traffic prediction are fully analyzed, and the problems of low prediction precision and the like of the traditional network traffic prediction method on traffic data of multiple influencing factors at present are found.

(1) Classification of network traffic prediction method for domestic and foreign research

Classified from the perspective of commonly used traffic prediction models, can be classified as neural network-based and gray model-based. The neural network achieves the purpose of information processing mainly by adjusting the relation of interconnection among a large number of internal nodes. Its learning mode can be classified into supervised learning and unsupervised learning. The system supervision approach is divided into two kinds, the supervised learning is to use the sample data given by the system as the standard to carry out supervision classification or imitation, the other kind of unsupervised learning only definitely prescribes the basic mode and rule of learning, the specific learning content can possibly generate difference due to different environments of the learning system, and the system can automatically and rapidly discover some basic characteristics and rules of the learning environment, which is similar to the function of human brain automatic learning. If a system itself still has ambiguity of various temporal hierarchies, structures and spatial relationships, randomness of dynamic changes, imperfection of index data, these system characteristics can be referred to as gray system characteristics.

From the viewpoint of emphasis on the network traffic prediction research method, the network traffic prediction method can be basically classified into two main categories, i.e., a linear prediction method and a nonlinear prediction method. The prediction method is based on time series, and data needs to be analyzed to judge the long-term change trend so as to obtain a data result needing to be predicted. The principle is that the prediction is performed by mathematical modeling after finding rules according to the ordered data. Typical linear prediction models are the autoregressive Model Autoregressive (AR) Model, the Moving Average (MA) Model, and modifications thereof, including the autoregressive Moving Average Model Autoregressive Moving Average (ARMA) Model, the differential autoregressive Moving Average Model Autoregressive Integrated Moving Average (ARIMA) Model, the differential autoregressive summing Moving Average Model Fractional autoregressive integration Moving Average (FARIMA) Model, and the like. Based on nonlinear predictive algorithm analysis of the neural network, the neural network has wide application and development prospect in the technical field of modern computers in terms of distributed data storage, parallel processing, good robustness, self-adaption and automatic learning. The actual modeling application theory and method of the Neural Network have certain universality, and the nonlinear Neural prediction Model technology has better scientific research results in the application of the Neural Network modeling, and the models of the common nonlinear theory applied to the Network flow prediction are support vector machines Support Vector Machines (SVM), gray models Grey Model (GM), neural Network (NN) and the like. The prediction result of the nonlinear model is ideal, but the defect still exists, multi-step prediction cannot be effectively performed, and the single-step prediction is obviously advantageous.

From the development stage perspective, the prediction method can be classified into a conventional time series prediction method and a neural network-based prediction method. The prediction method for the traditional time sequence comprises linear regression, autoregressive moving average model and the like. A space-based convolutional neural network and a time-series based recurrent neural network may also be selected. Convolutional neural networks have been successfully applied in the fields of image segmentation, semantic segmentation, mechanical translation, etc. The spatial information of the data can be extracted through one convolution operation, and the spatial information can be divided into one-dimensional convolution, two-dimensional convolution and three-dimensional convolution aiming at different data. Two-dimensional convolution is employed in the image field. The convolutional neural network can well extract spatial information among data, and the residual neural network is adopted to improve the accuracy of prediction. The special structure of the cyclic network is that the model has a memory function, so that the relation between each input data can be remembered. The cyclic network is widely applied in the fields of mechanical translation, voice recognition, text similarity and the like, and particularly, the data which are strongly related to the input data are mechanically translated.

In recent years, the network scale is gradually increased, users using network traffic are increasingly increased, the change of the network traffic is complicated, the factors influencing the change of the network traffic are various, but the network traffic still has a certain change rule, and in the network traffic prediction, the network traffic prediction is usually realized by establishing a proper mathematical prediction model. Among the numerous solutions currently proposed for prediction of network traffic, most are improvements to the predictive model. The following will be described in terms of both domestic and foreign aspects.

(2) Domestic network flow prediction research current situation

In many recent years, new network prediction system models and analysis methods have been used more and more. The technology for predicting the time flow in the application of the neural network mainly comprises a flow regression prediction model, a time sequence prediction model, a gray area prediction analysis method, a neural network, a fuzzy prediction theory, a wavelet theory and the like. The analysis of the Possion network model is the field of predictive analysis which is widely applied to network traffic on the mobile Internet at present. The traffic data at this time is based on an exponential distribution and the data composition of the network traffic is also relatively simple. The poisson distribution method is used to describe the frequency and number of times a random event may occur within a specific unit cycle time, and it is typically used in the case of counting variables among dependent variables.

Later, an AR model based on autoregressive is introduced into the field of flow prediction, and models such as ARMA, ARIMA, FARIMA are respectively proposed through improvement of the AR model. In a classical regression model, in order to examine the association between things, a regression model method is used to build a functional relationship between different variables. The premise of obtaining the prediction result is that the rules existing between the data are found, a mathematical model is established by utilizing the rules between the data, and the time sequence-based prediction model shows the advantages at the moment, because the prediction model can be modeled by only one variable data without establishing a model with causal relation, the time sequence-based analysis can be widely applied to modeling modes. In China, the university of Wuhan also makes a great deal of research work on network performance, flow and route measurement, and completes the realization of partial modules, and the university of Wuhan provides a network performance analysis and measurement support system. The basis of the network flow modeling is to conduct network flow characteristic research in the literature 'time-correlation-based network flow modeling and prediction research', and the C-ON/OFF model and the EMD-AMRA model are put forward after comparing several common flow models to solve the problems of complex flow calculation degree and undefined physical meaning. Because the single-step prediction problem is solved by most flow prediction models at present, the multi-step prediction problem is not solved well, and the method is a research method for improving the prediction precision aiming at errors in a single-step prediction system, and has higher application value in the short-term prediction of network flow.

However, as the network scale is enlarged, the user uses a greatly increasing trend of network traffic, and factors affecting traffic data become various, so that the data composition of the network traffic is not simple any more, and the network traffic gradually appears to have a complicated nonlinear variation trend. Thus, neural networks have brought such non-linear intelligent algorithms into the relevant flow predictions. The artificial neural network discipline is an interdisciplinary discipline with high comprehensive and wide application value, and its research and development have been covered in various fields of neurophysiology, mathematical physics discipline, information technology and computer science. The method has breakthrough effects in a plurality of technical fields such as signal processing, pattern recognition, target tracing, robot monitoring, network management and the like. In general, BP (Back Propagation) neural networks and their improved algorithms are often used for prediction. An obvious feature of the neural network is that if an acceptable error exists in a system, the error will not directly affect the normal operation of the whole system, and good results can still be obtained. The feature makes the neural network model more suitable for the self-similar business flow with burst characteristic than other models, and the simple learning method makes it possible to be used as powerful tool for simulating the business flow of the system. Liu Jie in his related academic paper, first a nonlinear predictive model of a BP model neural network related to flow time fluctuations was proposed. Chen Zhenwei by researching the traditional flow prediction model of the improved model neural network, introducing a wavelet function into the flow prediction model based on the BP neural network and an implicit layer thereof, and establishing a brand new wavelet function neural network flow prediction model. The wavelet transformation processing technology is not only the new generation technical field which is continuously developed in China at present, but also combines deep theory and practical data application. The calculation method refers to a typical analog local frequency transformation with longer time and higher frequency, so that a large amount of information can be effectively and rapidly obtained and directly extracted from an analog signal, and the analog function and the signal frequency are calculated, refined and comprehensively analyzed through various mathematical modes such as information delay filtering, stretching and frequency translation and the like. Wavelet analysis control techniques have been successful in many areas of expertise with significant effort. The wavelet transformation mixing technology refers to that various signals can be interlaced together to form various mixed wavelet signals, and the mixed wavelet signals are decomposed into signals with different time frequencies, so that the mixed wavelet signals have good characteristics in different time domains and spatial frequency bands.

Most of domestic scholars research is focused on some improvements of the existing scale, most of domestic scholars research analysis is performed theoretically, and a practical system is not yet developed.

(3) Current state of research of foreign network flow prediction

Network traffic prediction was studied early abroad and there is a great deal of research literature. The network traffic prediction is studied by first expanding from traffic data and then finding a proper mathematical model according to the data rules. The traditional network flow data model is a short correlation model, but the arrival of the computer age enables the network flow to be gradually improved no matter the development of a network structure or the arrival of people on the network flow, and the network flow has long correlation characteristics, so that researchers aim at a neural network rather than a traditional modeling mode. The neural network has obvious advantages in researching network flow prediction, can perform flow analysis in multiple scales, and becomes an important tool.

The literature Wavelet Analysis of Long-Range-Dependent Traffic, IEEE Transactions on Information Theory,1988,44 (1): 2-15 describes long-related wavelet analysis tool dependence and related Hurst parameter introduction. It can be very efficient to implement direct analysis of very large data sets and is highly robust against the existence of deterministic trends. For example, beran et al demonstrate that this variable bit rate traffic has a long-term dependence on a very large time scale when long-term dependence describes the process of frame arrival transients. The wavelet theory and the neural network are connected together to form a representative wavelet neural network.

The document Multiresolution FIR Neural-Network-Based Learning Algorithm Applied to Network Traffic Prediction [ J ]. Propose an FIR neural Network algorithm and derive a learning algorithm for adaptive gain per layer activation function. Because the neural network has self-learning capability, it can construct a nonlinear model only by outputting and inputting data, and thus is widely used for flow control and flow prediction of computer networks. The artificial network has the application capability of various comprehensive information technologies such as linguistic learning and generalization, and the artificial BP network is widely considered as one of the artificial neural networks which are generally successful at present, and has the main advantages of simple network structure and strong plasticity. However, the BP neural network also has the technical defects of low calculation speed and the like, so that the evolved BP algorithm hopes to have better convergence calculation capability and robustness, and the learning capability of the neural network can be improved.

Although many researchers have made a great deal of work on network traffic prediction, there is a continuing need to overcome the difficulties in terms of adaptive stress and high accuracy of network traffic affected by multiple factors. Meanwhile, the current research situation at home and abroad can find that the traditional prediction method cannot be suitable for the current network flow prediction, and the network flow prediction research literature mainly researches how to improve the prediction precision through a neural network method in recent years. To improve the prediction accuracy of network traffic prediction, adjustments may be made from improved data feature extraction and optimization algorithm models.

Through the above analysis, the problems and defects existing in the prior art are as follows:

(1) The change of network flow data along with time is nonlinear and unstable, the condition of flow overload can occur, and the flow of the base station node can perform flow load balancing so as to influence the flow data of the whole base station network; uncontrollable factors can appear in the network flow prediction process, so that the network flow in the base station is greatly influenced, and the traditional method is difficult to adapt to the nonlinear fluctuation relation of multiple influencing factors.

(2) The traditional network flow prediction method has slow learning speed and can influence the real-time property of prediction; and with the rapid development of computers, the network scale is extremely huge and complex, and the traditional network traffic prediction method can not cope with the complexity and the burstiness of traffic data, and has the problems of low prediction accuracy and the like.

The difficulty of solving the problems and the defects is as follows:

a large number of research results show that compared with the classical network flow prediction method, the neural network prediction performance obtains better results. Summarizing key factors influencing network flow prediction and a prediction error judging method, comparing related schemes and attributes, and obtaining great potential after further improving the neural network prediction. Although the problems are difficult to solve, ideal effects are obtained in view of the current network technology development.

The meaning of solving the problems and the defects is as follows:

high quality network traffic prediction is becoming more and more important and urgent in view of current network development. And verifying the prediction performance of various models, and finding out a prediction model which is more in line with the actual scene and has higher precision. Has important significance for large-scale network management, planning and design.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides an integrated learning network flow prediction method, an integrated learning network flow prediction system, integrated learning network flow prediction equipment, an integrated learning network flow prediction terminal and an integrated learning network flow prediction medium, and particularly relates to an integrated learning network flow prediction method and an integrated learning network flow prediction system based on a multi-layer perceptron.

The invention is realized in such a way that the method for predicting the flow of the integrated learning network comprises the following steps:

based on the frame thought of sequence-to-Seq, the convolutional neural network and the GRU gating unit are used for modeling the space and time of network flow data respectively, a residual neural network ResNet and a batch normalization technology BatchNorm are used in a space model, and a Attention mechanism Attention is used in a time model.

Constructing an integrated learning network flow prediction model based on a multi-layer perceptron; obtaining spatial and temporal feature codes; based on the integrated learning thought, the space and time characteristics are integrated and learned through a multi-layer perceptron; after the space-time characteristics of the integrated learning are obtained, the space-time characteristics are input into a decoding part based on a GUR gate control unit to obtain a prediction result; and adding an attention mechanism and a Teacher-force mechanism, and finally determining an integrated learning network flow prediction model framework structure based on the multi-layer perceptron.

Further, the method for predicting the integrated learning network traffic comprises the following steps:

firstly, constructing a network flow prediction model based on time and space;

step two, determining an integrated learning network flow prediction model framework structure based on a multi-layer perceptron;

thirdly, network flow space-time modeling based on the multi-layer perceptron integrated learning is carried out, and a prediction result is obtained through an integrated learning network flow prediction model based on the multi-layer perceptron.

Further, in the first step, the constructing the network traffic prediction model based on time and space includes:

(1) Determining spatial and temporal dependencies of network traffic

(1) Spatial dependence of data

Aiming at a data format with a certain urban living characteristic, the method is used for indicating traffic use conditions of different residents in a certain city, and forwarding and transmitting traffic through base stations distributed throughout the city; in the network traffic prediction problem, the spatial distribution of the base station is very similar to Euclidean space, and the output points and the sink points of all traffic present a mesh distribution.

(2) Time dependence of data

The change in network traffic data over time is nonlinear, non-stationary; wherein, the flow rates of 2 months and 10 months are the highest two months of all the month flows.

(2) Spatial dependence modeling based on convolutional neural network

Aiming at the dependence of data on space, a convolutional neural network CNNS modeling is provided; and extracting spatial information between the data through the convolutional neural network, and adopting a residual neural network.

For sequence data, convolutional neural network CNN has different processing for the input. The input samples are defined as [ M, C, N ], wherein M represents the number of the input samples, N represents the characteristic dimension of the input samples, C is set to be 1, and one-dimensional data is represented; the sample is converted through data of a one-dimensional convolution network, and the output channel number is set to be 120 of city base stations, so that the data format is [ M,120, N ]; changing the input sample format into [ M,120, N,1], and extracting the space characteristic information of each base station; and finally, delivering the space feature extraction result to a two-dimensional convolutional neural network for space feature extraction.

The ResNet50 network of the classical is adopted, and the level parameters are described as follows:

ZEROPAD: filling the matrix (3, 3), zero filling of 3 rows and 3 columns; namely, the original input data matrix is (2, 2) in size, and the size after filling is (5, 5); the excessive part is all zero;

conv block: the 64 convolution kernels have the size of (7, 7), and the convolution step length is the two-dimensional convolution of (2, 2);

BatchNorm block: batch normalization;

relu: the Relu activation function, the formula is defined as follows:

MAXPOOL, AVGPOOL block: maximum pooling layer, size (3, 3), step size (2, 2); average pooling, size (2, 2), step size (1, 1);

CONVBLOCK Block: performing matrix addition operation on the results of the previous layers of x through a short 'path', wherein the data size is scaled to the [ -1,1] interval through a two-dimensional convolution operation and batch normalization operation, and finally, the data size is output through a Relu activation function;

idblock x n block: x directly performs matrix addition operation on the output result of the previous layer through a short path; n represents a plurality of identical IDBLOCK blocks linked together;

flat block: flattening the input into one-dimensional data; the size is (M, -1), M represents the number of samples, and, -1 represents the input sample matrix data synthesis;

fc block: full connection layer, size (H, N), H represents the last layer input dimension, N represents the desired predicted data output dimension.

Residual neural network ResNet is used to address the "skip-link" technique that occurs with deep neural networks.

The reasons for the good performance of the ResNet network are summarized below: assuming a deeper neural network with an input of x and an output of a ^l The method comprises the steps of carrying out a first treatment on the surface of the And adding a residual block structure, wherein the activation functions in the network are Relu activation functions, namely all the outputs are greater than or equal to zero.

BigRNN is a deep neural network, and Layer1 and Layer2 are additionally added residual block networks, and the assumption that Layer2 does not go through the output of the activation function is not z ^l+2 Output a ^l+2 The formula of (c) is defined as follows:

a ^l+2 ＝g(z ^l+2 +a ^l )；

wherein g represents a Relu activation function; the extended formula becomes:

a ^l+2 ＝g(w ^l+2 x+b ^l+2 +a ^l )；

wherein ,w^l+2 、b ^l+2 Weights and paraphrases for Layer2 layers; if w ^l+2 =0, at the same time b ^l+2 =0, then a ^l+2 Will be equal to a ^l The performance of the network is not changed after the residual block is added.

Assuming that the output in the neural network that is not batch normalized is z ⁱ Where i=1, 2, 3..where n represents n number of samples, z ⁱ Indicating the output result after batch normalization, meterThe formula is defined as follows:

wherein epsilon represents a minimum number not smaller than zero, and eta and beta are parameters learned by a neural network. Let the input of the neural network be x ^t The selected batch size is gamma, wherein gamma is more than 0 and less than or equal to m, and m is the total number of samples, and the number of batches is normalized

Taking a traditional minimum batch gradient descent algorithm as an example, the steps of eta and beta are as follows:

1.For t＝1,2,3……n；

2. at all x ^t Forward propagation is performed on the upper part;

3. obtained using batch normalization techniques

l represents the neural network layer I;

4. the respective gradients were calculated using a back propagation technique: dw (dw) ^l ,dη ^l ,dβ ^l ；

5. Updating parameters: w (w) ^l ＝w ^l -αdw ^l ，η ^l ＝η ^l -αdη ^l ，β ^l ＝β ^l -αdβ ^l Where α represents the learning rate.

(3) Time dependent modeling based on gating unit

For the dependence of data on time, time series modeling based on GRU gating cells is proposed. The model framework adopts a sequence-to-sequence Seq2Seq framework, wherein an encoder part adopts a bidirectional GRU unit, and a decoder part adopts a single GRU unit; before the decoder decodes, an attention mechanism is added.

In the encoder section, it is initially a<0>Bi-GRU represents a Bi-directional GRU cell whose output is 2 times the unidirectional GRU cell output, with zero vector matrix, where

Respectively represent a<1>Forward and reverse outputs of (a) to (b); attention represents the mechanism of Attention; in the decoder section s<0>The initial value of (a) is the hidden layer state of the last Bi-GRU unit of the encoder, namely the memory state.

The working principle of the attention mechanism is as follows: splicing the input xt and s (t-1) into a new matrix according to columns by a connectate layer

The matrix is connected with the softmax activation function Layer through the Layer to obtain an output at; the input xt and at are subjected to the following formula to derive the Context of the attention mechanism:

wherein ,a^＜t,t′＞ Represents the attention that should be paid to the t-th Context, a for all 1, 2 … t ^＜t,t′＞ The sum of the values is 1; x is x ^t′ Each input x1, x2.

The Context is the output of the attention mechanism, and the final required prediction result can be obtained by inputting a plurality of contexts into the decoder.

Further, in the second step, the network traffic space-time modeling based on the multi-layer perceptron integrated learning includes:

the multi-layer perceptron integrated learning is based on the operation of weighting the intermediate result by combining the principle of the multi-layer perceptron on the prediction results of the space model and the time model; the design idea follows the framework from sequence to sequence Seq-to-Seq, and the prediction results of space and time are fused in the encoder part; making predictions of the final result at the decoder section; and meanwhile, the space and time factors are referenced, so that better effect than that of a single space and time model is achieved.

Further, the encoder section based on multi-layer perceptron ensemble learning and the decoder section based on multi-layer perceptron ensemble learning include:

(1) Encoder section based on multi-layer perceptron ensemble learning

The integrated learning integrates a plurality of weak learners into a stronger strong learner, the principle is that the weak learners are subjected to weight distribution through a certain integrated algorithm, each weak learner is given different weight, and the total weight of all the weak learners is 1; based on the idea, the multi-layer perceptron respectively gives different weights to the result features of the space and time model prediction to carry out integrated learning, so that the fused space-time features are used as input to obtain a more powerful space-time model.

After obtaining the output characteristics of time and space, performing integrated learning based on a multi-layer perceptron; performing matrix splicing operation on the time characteristics and the space characteristics to obtain input characteristics of the multi-layer perceptron:

wherein ,

representing temporal and spatial feature vectors, respectively, +.>

Representing the spliced feature vector.

Will be

Inputting the final prediction result into a multi-layer perceptron for integrated learning to obtain the final prediction result +.>

The calculation formula is as follows:

/>

wherein, tanh () represents Tanh activation function, and w is the weight matrix of the multi-layer perceptron.

The multi-layer perceptron is a three-layer perceptron which comprises an input layer, an hidden layer and an output layer respectively.

(2) Decoder part based on multi-layer perceptron integrated learning

And a decoder section for deriving a final prediction result while simplifying complexity of the model. After the integrated learned space-time features are obtained, the decoder performs an attention mechanism operation and outputs the reconstructed features.

Obtaining coded spatio-temporal feature outputs during training

Then input to a Bi-directional Bi-GRU gate control unit to obtain a matrix a _g The method comprises the steps of carrying out a first treatment on the surface of the Will a _g Input attention mechanism layer matrix c _g And c _g Dimension matching is carried out through a full connection layer to obtain a matrix f _g The method comprises the steps of carrying out a first treatment on the surface of the Will f _g Inputting to a unidirectional GRU gating unit; the structure has 14 Bi-directional Bi-GRU units in total, and the number of single GRUs for finally outputting a prediction result is the number of expected prediction days; the structure is relatively fixed, and other internal structures are relatively changed and adjusted.

Further, the Teacher-forming training mechanism comprises:

the Teacher-shaping mechanism is a technology for performing error correction on the input of a model during training and accelerating convergence; during the training process, an output result is obtained after the decoder inputs data at the initial time, and the output result does not match the expected output result, a problem is generated: whether the result is taken as input at the next moment or the correct result is taken as input at the next moment, the problem will produce two training modes:

(1) Regardless of whether the output result at the previous moment is correct or not, always taking the expected correct result as the input at the current moment;

(2) The input of the current moment and the output of the last moment are related, and the input of the current moment is always the output of the last moment;

in each time step, random is a Random function, the output of the Random function is a decimal between 0 and 1, R is a preset probability value, and the value range is 0 to 1; if R is equal to 0, a second training mode is adopted; if R is equal to 1, a first training mode is adopted; balancing two training modes by adjusting the value of R;

The Teacher-forming mechanism controls whether the input at the current moment is the output at the previous moment or the expected correct result through a random number and a probability;

during training, a Teacher-force mechanism is added, and the preset probability is selected to be 0.5.

Another object of the present invention is to provide an ensemble learning network traffic prediction system to which the ensemble learning network traffic prediction method is applied, the ensemble learning network traffic prediction system including:

the prediction model construction module is used for constructing a network flow prediction model based on time and space;

the model framework structure determining module is used for determining an integrated learning network flow prediction model framework structure based on the multi-layer perceptron;

and the network flow prediction module is used for carrying out network flow space-time modeling based on the multi-layer perceptron integrated learning, and obtaining a prediction result through the multi-layer perceptron integrated learning network flow prediction model.

It is a further object of the present invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

Another object of the present invention is to provide a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

Another object of the present invention is to provide an information data processing terminal for implementing the ensemble learning network traffic prediction system.

By combining all the technical schemes, the invention has the advantages and positive effects that: the invention provides an integrated learning network flow prediction method, which provides network flow space-time modeling based on multi-layer perceptron integrated learning, wherein the integrated learning based on the multi-layer perceptron is an operation of weighting intermediate results by combining the principles of multi-layer perceptrons on the prediction results of a space model and a time model. The design simultaneously refers to space and time factors, so that better effects than a single space and time model are achieved. The prediction result has high accuracy, adapts to the influence of complex factors, and is more accurate to control.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a voting mechanism provided by an embodiment of the present invention.

Fig. 2 is a flowchart of a bagging method according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a three-layer perceptron provided in an embodiment of the present invention.

Fig. 4 is a schematic diagram of a convolution process according to an embodiment of the present disclosure.

Fig. 5 is a schematic diagram of a GRU unit according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of an encoding-decoding framework provided by an embodiment of the present invention.

FIG. 7 is a schematic diagram of a sequence-to-sequence basic framework provided by an embodiment of the present invention.

FIG. 8 is a flow chart of spatial dependence modeling based on convolutional neural networks provided by an embodiment of the present invention.

Fig. 9 is a schematic diagram of a structure of a res net50 according to an embodiment of the present invention.

Fig. 10 is a schematic diagram of a convolutional link module according to an embodiment of the present invention.

Fig. 11 is a schematic diagram of a jump link according to an embodiment of the present invention.

Fig. 12 is a schematic diagram of a residual block provided in an embodiment of the present invention.

Fig. 13 is a schematic diagram of a time series model according to an embodiment of the present invention.

FIG. 14 is a schematic diagram of an attention mechanism provided by an embodiment of the present invention.

Fig. 15 is a schematic diagram of learning by integrating perceptrons according to an embodiment of the present invention.

Fig. 16 is a schematic diagram of a decoder according to an embodiment of the present invention.

FIG. 17 is a schematic diagram of a Teacher-forming mechanism process provided by an embodiment of the present invention.

Fig. 18 is a schematic diagram of an integrated learning architecture of a multi-layer perceptron provided by an embodiment of the present invention.

Fig. 19 is a flowchart of an integrated learning network traffic prediction method according to an embodiment of the present invention.

Fig. 20 is a block diagram of an integrated learning network traffic prediction system according to an embodiment of the present invention.

FIG. 21 is a schematic diagram of predicted values and actual values of seven-day data for all samples provided in an embodiment of the present invention.

Fig. 22 is a schematic diagram of sample sampling provided by an embodiment of the present invention.

FIG. 23 is a schematic diagram of predicted values and actual values of seven-day data for all samples provided in an embodiment of the present invention.

Fig. 24 is a schematic diagram of sample sampling provided by an embodiment of the present invention.

FIG. 25 is a schematic diagram of a model loss function provided by an embodiment of the present invention.

FIG. 26 is a schematic diagram of predicted values and actual values of seven-day data for all samples provided by an embodiment of the present invention.

FIG. 27 is a schematic illustration of a sample provided by an embodiment of the present invention;

in the figure: 1. a prediction model construction module; 2. a model framework structure determination module; 3. and a network traffic prediction module.

Detailed Description

The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The spatial feature and the temporal feature are two important factors in network traffic prediction, network traffic prediction modeling is performed on the spatial feature and the temporal feature respectively, then a convolutional neural network is adopted to perform spatial modeling on network traffic, a cyclic neural network is adopted to perform temporal modeling on network traffic, and a attention mechanism is added. Both models take a sequence-to-sequence model framework to generate the prediction results. Meanwhile, a residual neural network (ResNet) and a batch normalization technology (BatchNorm) are used in the space model, and an Attention mechanism (Attention) is used in the time model, so that the quasi-removing rate of model prediction is enhanced. And then, taking the integrated learning thought as a guide, and fusing the space-time characteristics of the space-time model through a multi-layer perceptron to obtain a more powerful and accurate space-time prediction model. In addition, a Teacher-Force mechanism is adopted to accelerate the convergence rate of the model and enhance the robustness of the model during model training. In order to simplify the model, the steps are based on a sequence-to-sequence framework, meanwhile, the attention mechanism is added to improve the accuracy of the model, and the Teacher-forming mechanism is added to improve the convergence rate of the model. Finally, an integrated learning network flow prediction model framework structure based on the multi-layer perceptron is provided.

And carrying out experimental verification on the space-time network flow data prediction model based on optimization ensemble learning. Network data sets and related processing of the data sets required by the invention; setting experimental environment and model parameters, and giving out common indexes of regression prediction; then, a comparison experiment is carried out with the time model and the space model. The accuracy of the space-time model based on optimization ensemble learning is demonstrated by performance evaluation of the model presented herein against basic evaluation indexes of some regression models. Firstly, processing a network flow data set and a data set required in an experiment; setting an experimental environment, experimental model parameters and deep learning superparameters, then giving out common evaluation indexes of a regression model, and finally comparing comparison methods used in the experiment; the experimental result is analyzed, experimental comparison is carried out on the method model and the comparison method provided by the invention on a data set, so that the important influence of space-time modeling on the network flow prediction result is illustrated, and in addition, the model performance is analyzed on the model provided by the invention, so that the feasibility and the superiority of the model are proved.

Aiming at the problems existing in the prior art, the invention provides an integrated learning network flow prediction method and system, and the invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 19, the method for predicting the integrated learning network traffic provided by the embodiment of the invention includes the following steps:

s101, constructing a network flow prediction model based on time and space;

s102, determining an integrated learning network flow prediction model framework structure based on a multi-layer perceptron;

and S103, performing network flow space-time modeling based on the multi-layer perceptron integrated learning, and obtaining a prediction result through an integrated learning network flow prediction model based on the multi-layer perceptron.

As shown in fig. 20, the integrated learning network traffic prediction system provided by the embodiment of the present invention includes:

the prediction model construction module 1 is used for constructing a network traffic prediction model based on time and space;

the model framework structure determining module 2 is used for determining an integrated learning network flow prediction model framework structure based on the multi-layer perceptron;

the network flow prediction module 3 is used for performing network flow space-time modeling based on the multi-layer perceptron integrated learning, and obtaining a prediction result through the multi-layer perceptron integrated learning network flow prediction model.

The technical scheme of the invention is further described below with reference to specific embodiments.

1. Summary of the invention

In order to solve the problems, the invention aims at the integrated learning method based on the multi-layer perceptron. The integrated learning is to combine a plurality of individual machine learning or deep learning models through a strategy, so as to obtain a more powerful and accurate model learning process. The multi-layer perceptron has very good performance on nonlinear data, and the prediction accuracy can be greatly improved.

Therefore, the invention provides network traffic space-time modeling based on multi-layer perceptron integrated learning. The multi-layer perceptron-based integrated learning is an operation of weighting the intermediate result by combining the principle of the multi-layer perceptron on the prediction results of the spatial model and the time model. The design concept follows a sequence-to-sequence framework, fusing spatial and temporal predictions at the encoder part. Finally, the final result is predicted in the decoder section. The design simultaneously refers to space and time factors, so that better effects than a single space and time model are achieved. In addition, a Teacher-Force mechanism is adopted to accelerate the convergence rate of the model during model training, so that a more powerful and accurate space-time prediction model is obtained.

Because the network flow prediction data belongs to time series data, the integrated learning network flow prediction modeling of the multi-layer perceptron can be adopted. However, the traditional prediction method has low prediction accuracy on large-scale data and complex influencing factors, so that the space-time model is fused with the space-time characteristic through the multi-layer perceptron at this time to obtain a more powerful and accurate space-time prediction model. The technical scheme of the prediction method comprises the following steps:

1. the convolutional neural network and the GRU gating unit are used for modeling the network traffic data in space and time respectively, wherein both models are based on a sequence-to-sequence (Seq-to-Seq) frame idea, and meanwhile, a residual neural network (ResNet) and a batch normalization technology (BatchNorm) used in a space model and an Attention mechanism (Attention) used in a time model are introduced, so that the accuracy of model prediction is enhanced.

2. An integrated learning network flow prediction model based on a multi-layer perceptron is provided. First, how to obtain the space and time feature codes is described in detail, and then the space and time features are subjected to integrated learning through a multi-layer perceptron based on the integrated learning idea. After the space-time characteristics of the integrated learning are obtained, the space-time characteristics are input into a decoding part based on the GUR gating unit to obtain a prediction result. In order to simplify the model, the steps are based on a sequence-to-sequence framework, meanwhile, the attention mechanism is added to improve the accuracy of the model, and the Teacher-forming mechanism is added to improve the convergence rate of the model. Finally, an integrated learning network flow prediction model framework structure based on the multi-layer perceptron is provided.

2. Background of the art

2.1 Integrated learning

The ensemble learning (Ensemble Learning) is a learning process that combines multiple individual machine learning or deep learning models through a strategy to obtain a more powerful and accurate model. The current stage of integrated learning is divided into two main categories according to the difference of individual learners: one is serialized ensemble learning, i.e., there are strong dependencies between each individual learner, and the other is parallelized ensemble learning, i.e., there are no or weak dependencies between each individual learner. Of course, the core of integrated learning is how to combine each different learner and produce better learning results, i.e. what combination strategy is adopted to combine different or the same individual learners. Three common binding strategies exist: voting (Voting), bagging (Bagging), and adaptive boosting (AdaBoost).

Voting method: a common and simple voting mechanism in the field of deep learning and robotics is a majority voting mechanism, i.e. the result selected by most classifiers is selected, and more than 50% of supporting prediction results are obtained. However, in theory, this voting mechanism is limited to binary classification scenarios only. However, this problem can be solved by a relatively "majority ticket". The voting algorithm flow chart is shown in FIG. 1, where C ₁ C ₂ ......C _m For each individual dieLearner P ₁ P ₂ ......P _m For each model learner's predictive value, P _n Is the final value after voting.

The voting mechanism of fig. 1 is a voting mechanism of a classification model, and additional rule restrictions need to be made on the voting mechanism of a regression model to obtain the most overall prediction result.

And (3) bagging: closely related to the voting mechanism. The method is mainly a method of performing correlation processing on a training data set and putting the training data set into a single learner for training. In general, a put-back random extraction is performed on the training data set, and T with equal size is divided from the whole training data set ₁ T ₂ ......T _m Small data sets are respectively put into C ₁ C ₂ ......C _m Predicting in individual learner, and finally predicting result P ₁ P ₂ ......P _m Voting to obtain final prediction result P _n . The flow chart is shown in fig. 2.

Fig. 2 is a method of classifying a model, and a method commonly used for a regression model is to take the average value of all predicted values to obtain the final predicted result.

Adaptation enhancement is much more complex than the two methods described above, and involves initial sampling of the data set, as well as final voting mechanisms. The difference is that there are corresponding weights for the correct and incorrect predictors, respectively, and the weight information is brought into each individual learner in the next training. The method is reinforcement learning of the mispredicted data, and the original reinforcement learning step process is as follows:

Step 1: extracting random subset D of training samples from training set D without substitution ₁ For training weak learners C ₁ ；

Step 2: extracting random subset D of training samples from training set D without substitution ₂ And 50% of the samples of previous prediction errors are added to the subset, which is then used to train weak learner C ₂ ；

Step 3: extracting and D from training set D ₁ and d₂ Different from each otherThe samples form training set d ₃ And training weak learner C with it ₃ ；

Step 4: integration of weak learners C by majority voting mechanism ₁ ，C ₂ and C₃ 。

The above is the basic concept of adaptive enhancement (AdaBoost), whose two classes of pseudocode are as follows:

1. an initial equal average weight w is given to all samples, where

i is the number of samples.

And (3) reinforcing for j in m wheels, wherein the following operation is performed:

a. training a weak learner: c (C) _j ＝train(X,y,w)，C _j The method is a weak learner, X is an input sample, and y is a real data tag;

b. predictive classification label:

c. calculating a weight error rate:

e. calculating coefficients:

d. updating weights

f. Normalizing the weights such that the sum is one:

3. calculating a final prediction result:

prediction of greater than zero is positive and negative.

The adaptive enhancement (AdaBoost) algorithm determines, for the regression model, whether the predicted value is correct or not, and requires some rule setting for the nature of the data, as is the final predicted result.

In summary, the ensemble learning algorithm is more friendly to the classification model than to the regression model. The regression model needs to formulate corresponding rules for the nature of the data.

2.2 multilayer perceptron

The multi-layer perceptron (MLP, multilayer Perceptron) is an artificial neural network (ANN, artificial Neural Network) having an input layer, a plurality of hidden layers and an output layer. The simplest MLP structure is a three-layer structure, which consists of an input layer, an hidden layer and an output layer, as shown in fig. 3.

As can be seen from fig. 3, all layers in the multi-layer perceptron are fully linked, layerL1 is the input layer, layerL2 is the hidden layer, and LayerL3 is the output layer. Each circle in the figure represents a neuron node. If the vector of the input layer is denoted by X, f (w ₁ X+b ₁ ) Is the output of the hidden layer, where w ₁ Representing weights of fully connected layers, b ₁ Indicating a paranoid, f indicating an activation function. The activation function typically selects a nonlinear function such as: sigmod function, tanh function, relu function. The linear function cannot be used as an activation function. If a linear function is taken, then no matter how many layers are hidden, it is simply a linear combination of vectors. In practical application, the most activation function is selected as a Relu function, and the output values of the other two activation functions have certain interval restriction.

In practical application, the MLP will generally make changes to the output layer to meet the requirements of the model, for example, if the output layer is 7 neurons, the data values of the present invention for predicting 7 days in the future can be met,

thus, the parameters of the multi-layer perceptron are generally defined by w ₁ 、b ₁ 、w ₂ 、b ₂ Composition is prepared. Gradient descent methods are generally employed to obtain optimum values: firstly initializing all weights and paraphrasing, then calculating gradients through continuous iterative training, updating the weights and paraphrasing values, and stopping searching for optimal values when the maximum iterative algebra is satisfied or the error is small enough.

2.3 space-based convolutional neural networks

Convolutional neural networks (Convolutional Neural Networks, CNNS) have been successfully applied in the fields of image segmentation, semantic segmentation, mechanical translation, etc. The spatial information of the data can be extracted through one convolution operation, and the spatial information can be divided into one-dimensional convolution, two-dimensional convolution and three-dimensional convolution aiming at different data. Two-dimensional convolution is employed in the image field. For the data of the present invention, one-dimensional convolution is employed.

CNNS can perform supervised and unsupervised learning with parameters (weights) shared between their convolution kernels. In order to express the complex characteristics of the data, the convolution layer also comprises a nonlinear activation function, and the commonly used activation function is a Relu function. As shown in fig. 4, a one-dimensional convolutional network is shown, and the principle of other dimension convolutional networks is the same.

As shown in fig. 4, by convolution, one-dimensional 1×5 input data is passed through a convolution kernel of 1×3, and finally one-dimensional 1×3 output data is output. The calculation formula is as follows:

y＝H(wX+b) (1)

where w represents the convolution kernel parameters (weights) consisting of a matrix of 1 row and 3 columns, b represents the paranoid, and H represents the activation function. The parameters of the convolution kernel are multiplied by the first 3 data of the input data respectively, and the obtained results are added to obtain an output result. And in the next convolution process, the convolution kernel moves rightwards by one lattice to repeat the operation to obtain second output data, and so on until the convolution kernel exceeds the maximum length of the input data. The input data and output data sizes are determined by the following formula:

where N represents the length of the input data, f represents the convolution kernel length, and S represents the length per shift.

2.4 time series based recurrent neural networks

2.4.1 recurrent neural network concepts

The recurrent neural network (Recurrent Neural Network, RNN) is a recurrent neural network (recursive neural network) in which one type of input data is sequential, recursive in the sequence direction, and all nodes are chain linked. In brief, the output of each node of the RNN network is related to the output of its previous node, i.e., the output of the previous neuron is taken as the input of the next neuron. Thus each RNN network input has both its own data and the output of the previous neuron node. It is the special structure of RNN network that allows it to have a memory function, so that the relationship between each input data can be remembered. RNN networks find wide application in the fields of mechanical translation, speech recognition, text similarity, etc. In particular, the data which are strongly linked between such input data are mechanically translated.

2.4.2 gating recursion unit

GRU (Gate Recurrent Unit) is a variant of a cyclic network which solves the problem of long-term dependence between data and to some extent alleviates the problem of gradient extinction during back propagation. GRU construction is simpler than LSTM (Long-Short Term Memory). In the LSTM, three gating units control long-term dependency of data, and two gating units control the GRU, which are respectively: update gate and reset gate as shown in fig. 5.

As shown in fig. 5, x ^t For inputting at a certain moment t at present, C ^t-1 The state of the last memory cell contains information about the last node cell. The GRU units can be controlled from x through some gating operations ^t 、C ^t-1 Obtain output from

And state C passed to the next cell ^t-1 。

GRU passes through last node information C ^t-1 And input data x of the current node ^t To obtainTwo gating states.

Representing current state information, G represents an update gate, as shown in the following formula:

G _u ＝σ(w _u [C ^t-1 ,x ^t ]+b _u ) (4)

wherein ,w_c 、w _u 、b _c 、b _u Representing the weight and paraphrasing of each gating cell, respectively, σ represents a sigmoid activation function that can scale the data to [0,1 ]]Interval.

Obtaining

And G _u After that, the GRU unit obtains a new state C by controlling whether the information is updated ^t . The formula is as follows:

from the above formula, G _u The value range of (2) is [0,1 ]]Indicating that the current state needs to be 'forgotten' when it is equal to 0

Hold last state C ^t-1 . On the contrary, when equal to 1, it means that the current state is needed to be' remembered +.>

Discard last state C ^t-1 . This is also the core meaning of the GRU: whether to 'forget' or 'remember' the information of the current or last unit. G in most cases _u The closer the value is between 0 and 1, the less important the information representing the cell, and needs to be properly discarded.Otherwise, the current information is important, and needs to be properly reserved.

2.5 sequence-to-sequence model

The sequence-to-sequence (sequence to sequence, seq2 Seq) model framework can well solve the problem of long-term prediction. The model is a transformation model framework for transforming data from one sequence to another, and has excellent performance in the field of mechanical translation. The sequence-to-sequence model is mostly implemented in a codec-Decoder framework. The frame is divided into an encoding part and a decoding part, wherein the input data of the encoding part can be in formats of video, image, voice and the like, and the decoding part is in a required format such as classification labels, continuous values and the like. The frame is internally provided with a circulating neural network, a convolutional neural network, a long-term and short-term memory unit, a gating unit and the like. Attention mechanisms (Attention) are typically added after the data is encoded by the encoder. Through the attention mechanism and then put into the decoder. Fig. 6 is a classical codec framework.

Through recent years of research, the Seq2Seq framework has evolved greatly. At present, the most widely accepted is the use of long and short term memory networks as encoders and decoders. Under this framework, the encoder receives a fixed length vector representation input, and the decoder receives the output of the encoder and then outputs a fixed length vector. In the field of machine translation, this model is very successful. The specific form is shown in fig. 7.

3. Network traffic prediction model based on time and space

3.1 spatial and temporal dependence of network traffic

3.1.1 spatial dependence of data

The data format of the invention has a certain urban living characteristic, and shows the traffic use condition of different residents in a certain city, and finally, the traffic is forwarded and transmitted through the base stations distributed throughout the city. In the network traffic prediction problem, the spatial distribution of the base stations is very similar to euclidean space. The output points and the sink points of all traffic are in a mesh distribution.

3.1.2 time dependence of data

The change in network traffic data over time is nonlinear, non-stationary. For example, if a traffic overload happens suddenly at a node in the entire base station distribution network, traffic of the node in the base station will be balanced, so as to affect traffic data of the entire base station network. In addition, there are some uncontrollable factors, such as holiday time in spring festival, etc., which can have a great influence on the traffic of the base station network. In the data of the present invention, the 2 month and 10 month flows are the highest two months of all month flows.

3.2 convolutional neural network based spatial dependency modeling

Convolutional neural network (Convolutional Neural Networks, CNNS) modeling was proposed for the spatial dependence of the present data. The convolutional neural network can well extract spatial information between data, and a residual neural network (Resnet) is adopted to improve the accuracy of prediction. The flow chart is shown in fig. 8.

For sequence data, convolutional Neural Networks (CNNs) have different processing for the inputs. As shown in fig. 8, a one-dimensional convolutional network can process sequential data, however, in order to get more spatial features in conformity with the base station distribution information, it is necessary to redefine the samples and take a two-dimensional convolutional neural network. The input samples are thus defined as [ M, C, N ], where M represents the number of input samples, N represents the characteristic dimension of the input samples, and for the present data C is set to 1, representing one-dimensional data. The sample is converted by data of a one-dimensional convolution network, and the output channel number is set to 120 of city base stations, so that the data format is [ M,120, N ]. In order to extract the spatial feature information to each base station, the input sample format is changed to [ M,120, N,1]. And finally, delivering the space feature extraction result to a two-dimensional convolutional neural network for space feature extraction. In order to deepen the neural network and prevent the model effect from being attenuated, the invention adopts a residual neural network. The network structure is shown in fig. 9.

As shown in fig. 9, the ResNet50 network of gold is adopted, and the level parameters are described as follows:

ZEROPAD: the matrix (3, 3), zero-padding of 3 rows and 3 columns. I.e. the original input data matrix is of size (2, 2) and the size after filling is of size (5, 5). The excess portion is all zero.

Conv block: the 64 convolution kernels are of size (7, 7), and the convolution steps are two-dimensional convolutions of (2, 2).

BatchNorm block: batch normalization.

Relu: the Relu activates the function. The formula is defined as follows:

MAXPOOL, AVGPOOL block: maximum pooling layer, size (3, 3), step size (2, 2). Average pooling, size (2, 2), step size (1, 1).

CONVBLOCK Block: the modular structure is shown in fig. 10.

In FIG. 10

The operation shows that the x of the previous layers is matrix added to the results of the previous layers by a short "path" (shortcut), where a two-dimensional convolution operation (CONV 2D) and batch normalization operation (BatchNorm) are performed to scale the data size to [ -1,1]And finally outputting the interval through a Relu activation function.

Idblock x n block: the modular structure is shown in fig. 11.

In FIG. 11

The operation means that x performs matrix addition operation on the output result of the immediately preceding layer through a "short path". n represents a plurality of identical IDBLOCK blocks linked together.

Flat block: the input is flattened into one-dimensional data. The size is (M, -1), M represents the number of samples, -1 represents the input sample matrix data synthesis, if (3, 3) represents 3 matrices of 3 rows and 3 columns, then-1 is represented as 3×3×3=27.

Fc block: full connection layer, size (H, N), H represents the last layer input dimension, N represents the desired predicted data output dimension. For example, the present invention requires predicting the flow for one week in the future and N is 7.

The advent of the residual neural network (ResNet) was one "skip-link" technique that emerged to address deep neural networks. Theoretically, the deeper the neural network, i.e. the more the number of network layers, the more the extracted feature information is, so the better the performance of the network model will be. This is not the case, however, because deeper neural networks are more prone to problems of gradient extinction and gradient explosion. Even if the problems of gradient disappearance and gradient explosion do not occur, the error is obviously reduced when the optimization algorithm such as gradient reduction optimizes the parameters of the network model, but when a certain time or algebra is reached, the error is continuously increased, namely the model effect is worse. And the residual neural network (ResNet) is added to solve the problem well. Fig. 11 illustrates a res net network "hopped link" technique.

The reasons for the good performance of the ResNet network are summarized below: assuming a deeper neural network with an input of x and an output of a ^l . In order to deepen the network depth, a residual block structure is added, and the activation functions in the network are Relu activation functions, namely all the outputs are greater than or equal to zero. The network structure is shown in fig. 12.

In FIG. 12, bigRNN is a deep neural network, and Layer1 and Layer2 are additional residual block networks assuming that Layer2 has not undergone the output of the activation function ^l+2 . Its output a ^l+2 The formula of (c) is defined as follows:

a ^l+2 ＝g(z ^l+2 +a ^l ) (7)

where g represents the Relu activation function. The further expansion formula becomes:

a ^l+2 ＝g(w ^l+2 x+b ^l+2 +a ^l ) (8)

wherein w^l+2 、b ^l+2 For Layer2 weighting and paraphrasing, in equation (3-3), if w ^l+2 =0, at the same time b ^l+2 =0, then a ^l+2 Will be equal to a ^l The performance of the network is not changed after the residual block is added, and of course, if the residual block has some learned characteristics, the model performance is greatly improved. For residual neural networks, the identity in the above formula, i.e. a, is learned ^l+2 ＝a ^l This effect is very easy to achieve in a normal network, or additional weight information has to be added to achieve this effect, which makes training of the network very difficult. Of course, the residual block can be added not only to the end of the network but also to the middle or front end of the network.

The batch normalization (BatchNorm) technique has two advantages, namely, the batch normalization technique is the same as the normalization technique, not only can make the average value of input characteristics be 0 and the variance be 1 to accelerate network learning, but also can make the same operation on the hidden unit layer. Secondly, the batch normalization technology can further enable the weight of your network to be more lagged or deeper than that of your network, so that the weight can withstand the change of the deeper network, and the establishment of the deeper network is facilitated. Because the deeper the network, the distribution of data input to each layer will inevitably change, while the batch normalization technique re-changes the distribution so that the distribution of the first layer is consistent with the distribution of the subsequent layers. Intuitively, each layer of the network can be independently learned without being influenced by the weight coefficient of the previous layer due to the batch normalization technology.

Assuming that the output in the neural network that is not batch normalized is z ⁱ Where i=1, 2,3.

The output result after batch normalization is represented, and the calculation formula is defined as follows:

in the above formula, ε represents a minimum number not smaller than zero, and is a case of preventing zero division by denominator. η, β are parameters learned by the neural network just as the neural network weights are updated. Let the input of the neural network be x ^t The selected batch size is gamma, wherein gamma is more than 0 and less than or equal to m, and m is the total number of samples. Number of batch normalization

Taking a traditional minimum Batch (Mini-Batch) gradient descent algorithm as an example, the eta and beta calculation process is as follows.

1.For t＝1,2,3……n；

2. At all x ^t Forward propagation is performed on the upper part;

3. obtained using batch normalization techniques

l represents the neural network layer I;

Of course, the batch normalization technique is also applicable to other optimization algorithms, such as Adam optimization algorithm.

3.3 time dependent modeling based on gating cells

For the time dependence of the data of the invention, a time series modeling based on a GRU gating unit is provided. The model framework takes a sequence-to-sequence (sequence to sequence, seq2 Seq) framework in which the encoder (encoder) part takes bi-directional GRU units and the Decoder part takes single-item GRU units. Before the Decoder (Decoder) decodes, attention mechanisms are added to improve the model accuracy. The frame is shown in fig. 13.

As shown in fig. 13, in the Encoder (Encoder) section, it is initially a<0>Bi-GRU represents a bidirectional GRU unit whose output is 2 times the unidirectional GRU unit output, e.g. of zero vector matrix

wherein />

Respectively represent a<1>Is provided. Attention represents the Attention mechanism. In the Decoder section s<0>The initial value of (a) is the hidden layer state (memory state) of the last Bi-GRU unit of the Encoder (Encoder).

Attention mechanisms (Attention) have been widely used in various fields of deep learning in recent years. Because of its excellent performance, especially of the sequence type of data, the attention mechanism is of an irreplaceable position. The attentional mechanisms mimic the observed behavior of humans. For transactions, one first observes a more important place or a special transaction that is different from other transactions. So one would pay attention to these particular places to get specific relationships from them to deepen their overall understanding of the entire transaction. Fig. 14 illustrates a basic model of the attention mechanism.

FIG. 14 is a sub-module of an Attention mechanism (Attention) module, which is the sub-module of seven of the above figures in total, for which the present invention needs to predict network traffic trends for one week in the future. In fig. 14 s (t-1) is the hidden layer state (memory state) of the last Bi-GRU unit of the Encoder (Encoder), x1, x2.

The Attention mechanism (Attention) works on the principle that: firstly, input xt and s (t-1) are spliced into a new matrix according to columns through a connectate layer

And then the matrix is connected with the softmax activation function Layer through the Layer to obtain output at. Finally, the input xt and at are subjected to the following formula to obtain the Context of the attention mechanism.

wherein a^＜t,t′＞ Represents the attention that should be paid to the t-th Context, a for all 1, 2 … t ^＜t,t′＞ The sum of the values is 1.X is x ^t′ Each input x1, x2.

The Context is the output of the Attention (Attention) mechanism, and the final required prediction result can be obtained by inputting a plurality of contexts into a Decoder (Decoder) part.

4. Network flow space-time modeling based on multi-layer perceptron integrated learning

The multi-layer perceptron-based integrated learning is an operation of weighting the intermediate result by combining the principle of the multi-layer perceptron on the prediction results of the spatial model and the time model. The design concept follows a sequence-to-sequence (Seq-to-Seq) framework, fusing spatial and temporal predictions in the Encoder (Encoder) section. Finally, the final result is predicted in the Decoder (Decoder) section. The design simultaneously refers to space and time factors, so that better effects than a single space and time model are achieved. The following describes the network space structure characteristics of the Encoder part (Encoder), the integration part and the Decoder part (Decoder) based on the multi-layer perceptron integrated learning in detail.

4.1 encoder part based on Multi-layer perceptron Integrated learning

The integrated learning can integrate a plurality of weak learners into a stronger strong learner, and the principle is that the weak learners are subjected to weight distribution through a certain integrated algorithm, different weights are given to each weak learner, and the total weight of all the weak learners is 1. Based on the thought, the invention respectively gives different weights to the result features of the space and time model prediction through the multi-layer perceptron to carry out integrated learning, thereby obtaining the fused space-time features as input to obtain a more powerful space-time model.

After the output characteristics of time and space are obtained, the integrated learning based on the multi-layer perceptron is carried out on the output characteristics. Firstly, performing matrix splicing operation on the time characteristics and the space characteristics to obtain input characteristics of the multi-layer perceptron:

wherein

Representing temporal and spatial feature vectors, respectively, +.>

Representing the spliced feature vector.

Will be

The calculation formula is as follows:

where Tanh () represents the Tanh activation function and w is the weight matrix of the multi-layer perceptron.

The multi-layer perceptron uses a three-layer perceptron, which comprises an input layer, an hidden layer and an output layer respectively, and the network structure diagram is shown in figure 15.

4.2 decoder portion based on Multi-layer perceptron Integrated learning

The Decoder (Decoder) is partly used to derive the final prediction result, while simplifying the complexity of the model. After the learning-integrated spatio-temporal features are obtained, the decoder first performs an attention mechanism operation and then outputs the reconstructed features.

Obtaining coded spatio-temporal feature outputs during training

Firstly, inputting the matrix into a Bi-directional Bi-GRU gate control unit to obtain a matrix a _g Then a is carried out _g Input attention mechanism layer matrix c _g And c _g Dimension matching is carried out through a full connection layer to obtain a matrix f _g Finally f _g Input to a unidirectional GRU gating unit. The total number of 14 Bi-directional Bi-GRU units in the structure is equal to the number of expected prediction days, and the number of single GRUs for finally outputting the prediction result is equal to the number of expected prediction days. These structures are relatively fixed, and other internal structures can be relatively changed and adjusted, such as the number of neurons in the GRU gating unit, the number of layers of full connection, the number of nerve units, etc. The encoder portion block diagram is shown in fig. 16.

4.3 Teacher-force training mechanism

The Teacher-shaping mechanism is a technique that "corrects errors" the input of the model while training, and speeds up convergence. Taking the model framework of Seq-to-Seq as an example, during training, after the decoder inputs data at the initial time, an output result is obtained, and the result is not consistent with the expected output result. A problem arises: whether the result is to be taken as input at the next moment or the correct result is to be taken as input at the next moment. In fact, the above-mentioned problems create two training modes:

Regardless of whether the output result at the previous time is correct or not, the correct result expected by us is always taken as the input at the current time.

The input at the current time and the output at the previous time are correlated, and the input at the current time is always the output at the previous time.

Both of the above training methods can cause certain problems.

If the first training approach is taken, then the model is constrained by the correct results at the time of decoding, and it is desirable for the model to know whether each generated result is the correct result. The constraint can reduce the divergence of the model and accelerate the convergence rate in the training process, but can kill the diversity of the results, and can cause the model to be over-fitted, so that the model is insensitive to new data, the accuracy of model prediction is deteriorated, and the model establishment failure can be directly caused when the model prediction is serious.

If the second training mode is adopted, as long as the predicted result at the previous moment at a certain moment is incorrect, the predicted result at the current moment of the model is also incorrect due to the fact that the predicted result at the current moment of the model is recycled at the next moment, the predicted result of the model is larger and larger than the expected result, and the model is difficult to converge.

The Teacher-forming mechanism is a compromise of two ways, and controls whether the input at the current moment is the output at the last moment or the expected correct result through a random number and a probability.

As shown in fig. 17, the above process is performed in each time step, where Random is a Random function whose output is a fraction between 0 and 1, and R is a preset probability value in the range of 0 to 1. If R is equal to 0, a second training mode is adopted. If R is equal to 1, the first training mode is adopted. The two training modes are balanced by adjusting the value of R.

4.4 Teacher-force training mechanism

This section will summarize the network traffic prediction model after the combination of the modules used. In summary, the present chapter models the spatial dependency relationship and the time dependency relationship of the network traffic to obtain spatial feature codes and time feature codes, and then performs integrated learning based on a multi-layer perceptron through an integrated learning idea together with the spatial feature codes and the time feature codes, and finally obtains a prediction result. Modeling the space dependence by adopting a convolutional neural network, recoding original data by one-dimensional convolutional convolution to extract space information, inputting the space information into a two-dimensional convolutional network, and meanwhile, adding a residual neural network to obtain space feature codes in order to ensure that the model accuracy is unchanged when the network depth is added; for time dependence, a GRU network is adopted, specifically, a sequence-to-sequence idea is adopted, a bidirectional GRU unit is used for extracting time relation among sequences in the encoding part, meanwhile, a attention mechanism is added, and a unidirectional GRU unit is used for forming time feature encoding in the decoder part. And finally, simultaneously carrying out integrated learning on the spatial feature codes and the time feature codes through a multi-layer perceptron, and inputting the spatial feature codes and the time feature codes into a GRU unit with a memory function to obtain a prediction result.

In training, a Teacher-forming mechanism is added to accelerate the convergence rate and accuracy of the model, and the preset probability is selected to be 0.5. The flow of the whole network traffic prediction model is shown in fig. 18.

The invention provides a network flow data prediction application research based on multi-layer perceptron integrated learning: 1) Modeling the space and time of the network flow data by using a convolutional neural network and a GRU gating unit; 2) An integrated learning network flow prediction model based on a multi-layer perceptron is provided; 3) Space-time modeling is introduced into the field of network traffic prediction for the first time. The method has high accuracy of the prediction result, adapts to the influence of complex factors, and is more accurate in control.

The technical scheme of the invention is further described below with reference to specific application examples.

The invention performs predictive verification on a network traffic data set acquired in the real world. The sample is node flow data of a certain city of China provided by China mobile, and the data volume is huge. In a network traffic dataset there may be situations where port sample points are not continuous in days, which may occur due to newly added port routes, equipment failure maintenance suspension services, etc. Thus, a preliminary screening of the dataset is required to remove the discontinuous port samples. At the same time, some missing value problems occur in the data set, and we set the following rules for the data: if the data missing condition is more than 10% of the total days, the sample point is directly deleted, otherwise, the Lagrange interpolation is adopted for interpolation operation.

Through the operation, 20,041 data samples meet the requirements. And finally, carrying out normalization processing on the data set. 80% of the dataset was divided into training sets and 20% into validation sets.

The main parameters involved in the experiment of the network flow prediction model provided by the invention are as follows:

(1) The number of neural network layers. In time series based modeling, single-layer GRU units are taken. The number of layers of the neural network is not excessive, and a single-layer neural network achieves a good effect.

(2) GRU settings in a neural network. In the encoder, 161 GRU units are provided, corresponding to data of one day, respectively, while the GRUs are set in a bidirectional mode. The bi-directional looping mode facilitates mining of data dependencies by attention mechanisms in the decoder. In the decoder, 7 GRU units are respectively set, corresponding to data of one week predicted in the future, respectively, wherein the GRU is set in a unidirectional mode. Finally, the number of neuron nodes of the hidden layer in the GRU unit is set to 64.

(3) And (5) setting a multi-layer perceptron. To link the output dimensions of the temporal model and the spatial model, the hidden layer unit number of the perceptron is set to 98, i.e. the matrix size is 14X 7. And 14 is the concatenation of 7-day data of the predicted output of the temporal and spatial model. And 7 is the predicted output dimension.

(4) And (5) loss function setting. The loss function takes Mean Square Error (MSE).

(5) Deep learning super-parameter setting. The initial learning step size is set to 0.001. When the iteration algebra reaches 300 generations, the learning step attenuation is 0.0001, and the attenuation rate is 10%. The method adopts a Mini-Batch training mode, the Batch-Size is set to be 100, and a Relu activation function is adopted as an activation function of the neural network. The training process adopts a Techer-Force mechanism, and the probability is set to be 50%.

Common regression prediction index:

let x= [ x ] ₁ ,x ₂ ,x ₃ …x _n ]Where n=1, 2, 3..7, is one in the futureTrue value for each week.

Where n=1, 2, 3..7, is a predicted value for one week in the future. m observing sample number, the common regression prediction indexes are as follows:

(1) Mean square absolute error (MAE):

(2) Root Mean Square Error (RMSE):

(3) Symmetric Mean Absolute Percentage Error (SMAPE):

(4)R ² determining coefficient (R) ² -Square)：

The experiment adopts two deep learning methods for comparison. These two methods are prediction of data based on temporal and spatial features, respectively.

AGRU: the method adopts a sequence-to-sequence model (Seq-to-Seq), the encoder adopts a bidirectional GRU neural network, the decoder adopts an Attention mechanism (Attention), and a random Teacher-Force mechanism is added during training to accelerate algorithm convergence.

CNN-ResNet: the Convolutional Neural Network (CNN) is adopted to extract the spatial information of the data, and then the residual neural network (ResNet) is added to deepen the depth of the network, ensure that the low latitude characteristics are not lost, and reduce the risk of gradient disappearance to a certain extent.

In the experiment, the predicted effect trend graph of all network nodes for the next week is firstly displayed, and then random sampling is carried out for a plurality of times to display the trend graph of each selected continuous sample. The number of randomly extracted samples is 20. Model predictive model (AGRU) based on time series. As shown in fig. 21, a dotted line diagram of all predicted and actual values for one week in the future. The fit of the predicted value to the true value is seen more clearly. All sample points were randomly sampled multiple times, randomly extracting consecutive 20 points. A random extraction of a certain time is then performed like this store as shown in fig. 22. Spatial based predictive model (CNN-Resnet). As shown in fig. 23, a dotted line diagram of all predicted and actual values for one week in the future. The fit of the predicted value to the true value is seen more clearly. All sample points were randomly sampled multiple times, randomly extracting consecutive 20 points. The sample points are then randomly extracted for a certain time as shown in fig. 24.

By analyzing the experimental results, it can be found that:

(1) R of time series model AGRU ² R of Square ratio space model CNN-Resnet ² Square is 0.108 higher, which means that the dependency of the network traffic data on time is higher than on space. The other two MAE and RMSE indices also profile this conclusion.

(2) The SMAPE index shows that the spatial model CNN-Resnet is more stable than the time series model AGRU, which indicates that the sample presents a certain rule in space.

In order to study the prediction effect of the space-time model, a loss function change diagram of the space-time model which is learned by integrating the time model, the space model and the multi-layer perceptron is respectively provided. As shown in fig. 25, MAE indicators are taken to represent the loss function of the model. In order to study the prediction effect of the space-time model, a loss function change diagram of the space-time model which is learned by integrating the time model, the space model and the multi-layer perceptron is respectively provided. As shown in fig. 25, MAE indicators are taken to represent the loss function of the model. Fig. 26 and 27 show the seven-day predicted and actual values for all samples and the predicted and actual values for the randomly extracted 20 sample points, respectively.

From experimental results, it can be seen that the space-time modeling of the network traffic data has a larger influence on prediction. The space-time modeling considers both spatially stable, i.e. the situation of populated population of living places, and time traffic use, i.e. traffic use peak period.

In summary, the space-time model based on the multi-layer perceptron integrated learning has obvious advantages in network flow prediction.

In the description of the present invention, unless otherwise indicated, the meaning of "a plurality" is two or more; the terms "upper," "lower," "left," "right," "inner," "outer," "front," "rear," "head," "tail," and the like are used as an orientation or positional relationship based on that shown in the drawings, merely to facilitate description of the invention and to simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and therefore should not be construed as limiting the invention. Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When used in whole or in part, is implemented in the form of a computer program product comprising one or more computer instructions. When loaded or executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.

Claims

1. The integrated learning network traffic prediction method is characterized by comprising the following steps of:

based on the frame thought from sequence to sequence Seq-to-Seq, modeling the space and time of network flow data by using a convolutional neural network and a GRU gate control unit respectively, wherein a residual neural network ResNet and a batch normalization technology BatchNorm used in a space model and an Attention mechanism Attention used in a time model;

constructing an integrated learning network flow prediction model based on a multi-layer perceptron; obtaining spatial and temporal feature codes; based on the integrated learning thought, the space and time characteristics are integrated and learned through a multi-layer perceptron; after the space-time characteristics of the integrated learning are obtained, the space-time characteristics are input into a decoding part based on a GRU gate control unit to obtain a prediction result; adding an attention mechanism and a Teacher-force mechanism, and finally determining an integrated learning network flow prediction model framework structure based on a multi-layer perceptron;

The integrated learning network flow prediction method comprises the following steps:

firstly, constructing a network flow prediction model based on time and space;

thirdly, carrying out network flow space-time modeling based on multi-layer perceptron integrated learning, and obtaining a prediction result through an integrated learning network flow prediction model based on the multi-layer perceptron;

in the first step, the construction of the network traffic prediction model based on time and space comprises the following steps:

(1) Determining spatial and temporal dependencies of network traffic

(1) Spatial dependence of data

Aiming at a data format with a certain urban living characteristic, the method is used for indicating traffic use conditions of different residents in a certain city, and forwarding and transmitting traffic through base stations distributed throughout the city; in the network traffic prediction problem, the spatial distribution of the base station is very similar to Euclidean space, and the output points and the sink points of all traffic represent a net distribution;

(2) time dependence of data

The change in network traffic data over time is nonlinear, non-stationary; wherein the flow rates of 2 months and 10 months are the highest two months of all month flows;

(2) Spatial dependence modeling based on convolutional neural network

Aiming at the dependence of data on space, a convolutional neural network CNNS modeling is provided; extracting spatial information between data through a convolutional neural network, and adopting a residual neural network;

for sequence data, convolutional neural network CNN has different treatments for the input; the input samples are defined as [ M, C, N ], wherein M represents the number of the input samples, N represents the characteristic dimension of the input samples, C is set to be 1, and one-dimensional data is represented; the sample is converted through data of a one-dimensional convolution network, and the output channel number is set to be 120 of city base stations, so that the data format is [ M,120, N ]; changing the input sample format into [ M,120, N,1], and extracting the space characteristic information of each base station; finally, the space feature extraction work is carried out by a two-dimensional convolutional neural network;

classical ResNet50 networks were adopted, whose level parameters were described as follows:

BatchNorm block: batch normalization;

relu: the Relu activation function, the formula is defined as follows:

fc block: the full-connection layer is (H, N) in size, H represents the input dimension of the upper layer, and N represents the output dimension of the required prediction data;

the residual neural network ResNet is used for solving the 'jump link' technology which occurs in the deep neural network;

the reasons for the good performance of the ResNet network are summarized below: assuming a deeper neural network with an input of x and an output of a ^l The method comprises the steps of carrying out a first treatment on the surface of the Adding a residual block structure, wherein the activation functions in the network are Relu activation functions, namely all the outputs are greater than or equal to zero;

BigRNN is deep neural network, layer1, layer2 is the output of the additionally added residual block network assuming Layer2 has not undergone an activation function, not z ^l+2 Output a ^l+2 The formula of (c) is defined as follows:

a ^l+2 ＝g(z ^l+2 +a ^l )；

wherein g represents a Relu activation function; the extended formula becomes:

a ^l+2 ＝g(w ^l+2 x+b ^l+2 +a ^l )；

wherein ,w^l+2 、b ^l+2 Weights and paraphrases for Layer2 layers; if w ^l+2 =0, at the same time b ^l+2 =0, then a ^l+2 Will be equal to a ^l The performance of the network is not changed after the residual block is added;

assuming that the output in the neural network that is not batch normalized is z ⁱ Wherein i=1, 2,3. N represents that there are n numbers of samples,

wherein epsilon represents a minimum number not smaller than zero, and eta and beta are parameters learned by a neural network; let the input of the neural network be x ^t The selected batch size is gamma, wherein gamma is more than 0 and less than or equal to m, and m is the same as the sampleThe total number is the number of batch normalization

Taking a traditional minimum batch gradient descent algorithm as an example, the steps of eta and beta are as follows: />

1.For t＝1,2,3……n；

2. At all x ^t Forward propagation is performed on the upper part;

3. obtained using batch normalization techniques

l represents the neural network layer I;

5. Updating parameters: w (w) ^l ＝w ^l -αdw ^l ，η ^l ＝η ^l -αdη ^l ，β ^l ＝β ^l -αdβ ^l Wherein α represents a learning rate;

(3) Time dependent modeling based on gating unit

Aiming at the dependence of data on time, providing a time sequence modeling based on a GRU gating unit; the model framework adopts a sequence-to-sequence Seq2Seq framework, wherein an encoder part adopts a bidirectional GRU unit, and a decoder part adopts a single GRU unit; before the decoder decodes, add the attention mechanism;

Respectively represent a<1>Forward and reverse outputs of (a) to (b); attention represents the mechanism of Attention; in the decoder section s<0>The initial value of (a) is the hidden layer state of the last Bi-GRU unit of the encoder, namely the memory state;

wherein ,a^＜t,t′＞ Represents the attention that should be paid to the t-th Context, a for all 1, 2 … t ^＜t,t′＞ The sum of the values is 1; x is x ^t′ Representing each input x1, x 2..xt;

the Context is the output of the attention mechanism, and a plurality of contexts are input into a decoder part to obtain a final required prediction result;

in the second step, the network traffic space-time modeling based on the multi-layer perceptron integrated learning comprises the following steps:

2. The ensemble learning network traffic prediction method as claimed in claim 1, wherein the encoder section based on the ensemble learning of the multi-layer perceptron and the decoder section based on the ensemble learning of the multi-layer perceptron, comprises:

(1) Encoder section based on multi-layer perceptron ensemble learning

The integrated learning integrates a plurality of weak learners into a stronger strong learner, the principle is that the weak learners are subjected to weight distribution through a certain integrated algorithm, each weak learner is given different weight, and the total weight of all the weak learners is 1; based on the idea, the multi-layer perceptron respectively gives different weights to the result features of the space and time model prediction to carry out integrated learning, so that the fused space-time features are used as input to obtain a more powerful space-time model;

wherein ,

representing temporal and spatial feature vectors, respectively, +.>

Representing the spliced feature vector;

will be

The calculation formula is as follows:

wherein, tanh () represents Tanh activation function, and w is the weight matrix of the multi-layer perceptron;

the multi-layer perceptron is a three-layer perceptron which comprises an input layer, an hidden layer and an output layer respectively;

(2) Decoder part based on multi-layer perceptron integrated learning

A decoder section for deriving a final prediction result while simplifying complexity of the model; after the integrated learned space-time characteristics are obtained, the decoder performs attention mechanism operation and outputs the reconstructed characteristics;

obtaining coded spatio-temporal feature outputs during training

3. The ensemble learning network traffic prediction method as claimed in claim 1, wherein the Teacher-forming training mechanism includes:

4. An integrated learning network traffic prediction system applying the integrated learning network traffic prediction method according to any one of claims 1 to 3, characterized in that the integrated learning network traffic prediction system comprises:

5. A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

constructing an integrated learning network flow prediction model based on a multi-layer perceptron; obtaining spatial and temporal feature codes; based on the integrated learning thought, the space and time characteristics are integrated and learned through a multi-layer perceptron; after the space-time characteristics of the integrated learning are obtained, the space-time characteristics are input into a decoding part based on a GRU gate control unit to obtain a prediction result; and adding an attention mechanism and a Teacher-force mechanism, and finally determining an integrated learning network flow prediction model framework structure based on the multi-layer perceptron.

6. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

7. An information data processing terminal for implementing the ensemble learning network traffic prediction system of claim 4.