CN115941510A

CN115941510A - Large-scale SDN network flow prediction method and system

Info

Publication number: CN115941510A
Application number: CN202211399357.2A
Authority: CN
Inventors: 伍乙生
Original assignee: Zhaoqing Medical College
Current assignee: Zhaoqing Medical College
Priority date: 2022-11-09
Filing date: 2022-11-09
Publication date: 2023-04-07

Abstract

The invention discloses a large-scale SDN network flow prediction method, which comprises the following steps: acquiring historical flow data of a switch port in a network topology through an SDN controller to obtain a data set; carrying out normalization processing on the data set, and dividing the data set after normalization processing into a training set and a test set according to the proportion of 7; constructing an adjacency matrix among different links according to the data set after normalization processing, and constructing a correlation matrix among different links according to correlation analysis; the method comprises the steps of constructing and initializing an SDN network traffic prediction model based on GCN-GRU, inputting a training set, an adjacency matrix and an association matrix into the initialized SDN network traffic prediction model based on GCN-GRU for training, and extracting spatial features and time features.

Description

Large-scale SDN network flow prediction method and system

Technical Field

The invention relates to the field of new generation information engineering, in particular to a large-scale SDN network flow prediction method and system.

Background

SDN has gradually become an emerging industry in the network world at present, and is also a relatively advanced technology. The SDN has the main idea of separating a control plane and a data plane, which originally belong to a network switch and a router, from each other, thereby realizing real forwarding and data separation. The SDN controller mainly implements computation of a route, control and management of a network, generation and distribution of a switch flow table, collection of a network topology, and the like. The device of the data layer is only responsible for forwarding data and executing the strategy issued by the control layer. The idea and implementation of the separation of forwarding and control logically realize the centralization of control. The SDN controller stores topology information of the whole network, information of a dynamic forwarding table, a fault state, utilization rate of resources and the like. In this respect, the network capacity is opened and expanded, and the integration, virtualization and unified management of resources on the network can be realized through the centralized controller. The northbound interface in the control layer can provide required services and resources for upper-layer applications, and it is the best embodiment that the network capability is open and provided according to the requirement.

With the rapid development of communication technology, network traffic has a explosive growth trend, and a network traffic prediction technology is developed in response to the purpose of preventing network congestion and improving the utilization rate of network resources. The modeling and prediction of the network flow can know the change trend of the network flow in advance, and a reasonable and effective flow management strategy is formulated according to the predicted value so as to improve the network service quality and the user experience, so that the establishment of a high-precision network flow prediction model has important significance.

In recent years, a deep learning model is widely used for flow prediction, and the accuracy of flow prediction is rapidly advanced through improvement of models such as a Recurrent Neural Network (RNN), a Long-Short-Term Memory (LSTM), a gated cyclic Unit (GRU), and the like. It is worth noting that, at present, most of the practice for network traffic prediction focuses on the field of traditional networks, prediction for SDN networks is few and few, and an existing network traffic prediction model has a common problem all the time, that is, these methods only consider time-series time correlation, ignore correlation between communication links in reality, that is, spatial features of network traffic, which may cause higher-dimensional features of network traffic to be ignored during prediction, and it is difficult to jointly extract spatial and temporal joint features from inputs, and link load information cannot be accurately predicted.

Therefore, for the SDN network, especially for a large-scale SDN network with complex link conditions, it is very important to accurately predict network traffic to adopt an effective prediction model capable of extracting temporal and spatial characteristics of data.

Disclosure of Invention

The invention aims to provide a large-scale SDN network traffic prediction method and a large-scale SDN network traffic prediction system, which can effectively solve the technical problems in the prior art.

In order to achieve the above object, an embodiment of the present invention provides a large-scale SDN network traffic prediction method, including:

s1, historical flow data of a switch port in a network topology are obtained through an SDN controller, a flow characteristic matrix X is obtained, and the flow characteristic matrix is recorded as a data set shown in a formula (1):

representing a link load characteristic value of the ith node at time t, mapping each link among all the switches into a node, wherein N is the number of the nodes, and M is the time length;

s2, performing normalization processing on the data set X, and dividing the data set subjected to normalization processing into a training set X1 and a test set X2 according to the proportion of 7; the normalized data set is shown in equation (2):

wherein, min (X) ⁱ ) Is the minimum value in the data set before normalization, max (X) ⁱ ) Is the maximum value in the data set before normalization processing;

s3, constructing an adjacency matrix among different links according to the data set subjected to normalization processing, and constructing a correlation matrix among the different links according to correlation analysis; in the correlation analysis, the calculation formula of the correlation coefficient and the correlation is shown in formula (3):

wherein, the reference flow sequence is set as

Comparing the flow sequence to->

α _pq [t]As a reference flow sequence X _p And comparing the flow sequences X _q The correlation coefficient at the time t is,

and &>

Are respectively a reference flow sequence X ^p And comparing the flow sequences X ^q In the minimum value and the maximum value of the absolute difference values of the data at all corresponding moments, beta is a resolution coefficient, the value range is (0, 1), the smaller the beta value is, the stronger the differentiable degree of a correlation coefficient is, and the reference flow sequence X is ^p And comparing the flow sequences X _q Degree of correlation λ of _pq For both time intervals, the correlation coefficient alpha _pq [t]Average value of (d);

s4, constructing an SDN network flow prediction model based on GCN-GRU and initializing the model; the SDN network traffic prediction model comprises a dual-channel GCN model for extracting spatial features, a GRU model for extracting temporal features and a full connection layer, the dual-channel GCN model comprises a first spatial feature extraction unit and a second spatial feature extraction unit, and the first spatial feature extraction unit and the second spatial feature extraction unit both use a multi-scale graph convolution topological structure; the GRU model comprises a first gated recursion unit GRU layer to a Wth gated recursion unit GRU layer which are sequentially connected; wherein, the value range of W is between 62 and 122;

s5, inputting the training set X1, the adjacency matrix and the relevance matrix into an initialized SDN network traffic prediction model based on the GCN-GRU for training so as to extract spatial features and time features and obtain a trained SDN network traffic prediction model based on the GCN-GRU, wherein the method specifically comprises the following steps:

s51, inputting the training set X1 and the adjacency matrix into the first spatial feature extraction unit to obtain a first spatial correlation feature matrix

S52, inputting the training set X1 and the correlation matrix into the second spatial feature extraction unit to obtain a second spatial correlation feature matrix

S53, according to the first spatial correlation characteristic matrix

And a second spatial correlation feature matrix>

Obtaining a spatial correlation characteristic matrix output by the dual-channel GCN model according to the following formula (4)>

Wherein "|" represents the concatenation of the matrices;

s54, outputting the spatial correlation characteristic matrix of the dual-channel GCN model

Inputting the GRU model to obtainA network flow matrix H with network flow space-time characteristics;

s55, obtaining the predicted load of each link of the network traffic matrix H through the full connection layer;

s6, iteratively training the SDN network flow prediction model based on the GCN-GRU trained in the step S5 by adopting a back propagation algorithm strategy to obtain optimal model parameters;

s7, inputting the test set X2 into the GCN-GRU-based SDN network traffic prediction model after iterative learning in the step S6, evaluating the GCN-GRU-based SDN network traffic prediction model by using an evaluation index, changing the value of M if the evaluation index of the GCN-GRU-based SDN network traffic prediction model does not meet the preset evaluation index, and then continuing executing the steps S54-S55 until the trained evaluation index of the GCN-GRU-based SDN network traffic prediction model meets the preset evaluation index.

And S8, large-scale SDN network traffic prediction can be carried out by utilizing the SDN traffic prediction model based on the GCN-GRU trained, tested and evaluated in the steps S5-S7.

As an improvement of the foregoing solution, the step S3 specifically includes:

s31, constructing a network topology structure graph G = (V, E, a) according to a link connection attribute of the SDN network, where V is a set of nodes, E is a set of edges between two nodes, and a is an adjacency matrix:

wherein, a _pq For the interconnection of any two nodes p and q on the network topology structure chart, a ^pq =1 denotes nodes p and q are connected, a ^pq =0 represents that nodes p and q are not connected;

s32, setting each element a which is not 0 in the adjacency matrix A constructed in the step S31 _pq According to the formula (3), replacing the correlation degree lambda with the corresponding correlation degree lambda _pq Thereby obtaining the relevancy matrix B.

As an improvement of the above, the first spaceThe feature extraction unit obtains a first spatial correlation feature matrix by learning the 1 st to K th power of the adjacency matrix

The second spatial feature extraction unit obtains a second spatial correlation feature matrix ^ based on learning the power from 1 to K of the correlation matrix>

The first spatial correlation feature matrix ≥>

And a second spatial correlation feature matrix>

Respectively as follows: />

The method comprises the following steps that theta represents a trainable weight matrix and is used for learning characteristic information of link nodes, and sigma represents a Relu nonlinear activation function; x1 is a training set;

obtaining a multi-scale neighborhood feature for each node, wherein A' ¹ σ(X1θ)，B' ¹ Sigma (X1 theta) is used for acquiring feature information, A ', from a neighborhood of order 1 for each node' ^K σ(X1θ)，B' ^K Sigma (X1 theta) is used for acquiring characteristic information from a K-order neighborhood for each node; a' ⁰ σ(X1θ)＝σ(X1θ)，B' ⁰ σ (X1 θ) = σ (X1 θ) retains more own characteristic information for each node, thereby acquiring more neighborhood information for each node;

a' is an adjacency matrix obtained by normalizing the adjacency matrix a, and specifically processes the adjacency matrix a according to the following formula (5):

wherein, I is an identity matrix,

is a diagonal matrix, the elements other than the diagonal are 0, and->

In which each line diagonal element equals>

The sum of the elements of the corresponding row in; a' is the normalized adjacency matrix;

b' is the correlation matrix after the correlation matrix B is normalized, and the specific processing is as the following formula (6):

wherein, I is a unit matrix,

is a diagonal matrix, the elements other than the diagonal are 0, and->

In which each line diagonal element equals>

The sum of the elements of the corresponding row in (1); b' is the normalized correlation matrix.

As a modification of the above solution, in the step S54, the GRU model controls the transmitted information by setting a reset gate and an update gate; the specific calculation process is as formula (7):

wherein H _t-1 The output state at the time t-1;

for time t network traffic characteristics X _t Outputting a corresponding dual-channel GCN model; gamma-shaped _r Controlling how much information is written into the current state at the previous moment for resetting the gate, wherein the smaller the reset gate is, the less the information is written into at the previous moment; gamma-shaped _μ The updating gate is used for controlling the degree of the state information at the previous moment being brought into the current state, and the larger the value of the updating gate is, the more the state information at the previous moment is brought into the updating gate; />

A storage unit for storing the contents stored at time t; h _t For the output state at time t, σ denotes the activation function, W _μ 、W _r 、W _c Is a weight, b _μ 、b _r 、b _c Is a bias term; the output states at various moments form a network traffic matrix H, H = { H = { (H) ₁ ,...,H _t ,...}。

As an improvement of the above scheme, the step S6 specifically includes:

s61, calculating output value of SDN network flow prediction model based on GCN-GRU

With the actual value x _t+1 Deviation of (2)

Wherein x is _t-τ Representing the characteristic value of the load of all nodes at time t-tau, W _θ Representing the weight of the SDN network flow prediction model based on GCN-GRU;

s62, comparing the deviation with the precision epsilon given by the link load resource routing scheduling:

if the deviation meets the precision epsilon, stopping training to obtain a well-trained SDN network flow prediction model based on GCN-GRU;

if the deviation does not satisfy the precision ε, the deviation is calculated for W _θ Partial derivatives of

And (3) updating the weight:

returning to step S61 until the deviation reaches the accuracy epsilon or the model converges, wherein W _θ ' represents the updated weight parameter,>

is a derivative function.

The embodiment of the present invention further provides a large-scale SDN network traffic prediction system, including:

a data set obtaining module, configured to obtain historical traffic data of a switch port in a network topology through an SDN controller, obtain a traffic feature matrix X, and record the traffic feature matrix as a data set shown in formula (1):

the normalization processing module is used for performing normalization processing on the data set X and dividing the data set after the normalization processing into a training set X1 and a testing set X2 according to the proportion of 7; the normalized data set is shown in equation (2):

wherein, min (X) ⁱ ) Is the minimum value in the pre-normalization data set, max (X) ⁱ ) Is the maximum value in the data set before normalization processing;

the adjacency matrix construction module is used for constructing adjacency matrixes among different links according to the data set subjected to normalization processing and constructing correlation matrixes among the different links according to correlation analysis; in the correlation analysis, the calculation formula of the correlation coefficient and the correlation is shown in formula (3):

wherein, the reference flow sequence is set as

Comparing the flow sequence to->

and &>

Are respectively a reference flow sequence X ^p And comparing the flow sequences X ^q Minimum and maximum values of absolute difference of data at all corresponding time, beta being the resolution systemThe value range of the number is (0, 1), the smaller the beta value is, the stronger the distinguishability of the correlation coefficient is, and the reference flow sequence X ^p And comparing the flow sequences X _q Degree of correlation λ of _pq For both time intervals, the correlation coefficient alpha _pq [t]Average value of (d);

the flow prediction model construction module is used for constructing and initializing an SDN network flow prediction model based on GCN-GRU; the SDN network traffic prediction model comprises a dual-channel GCN model for extracting spatial features, a GRU model for extracting temporal features and a full connection layer, the dual-channel GCN model comprises a first spatial feature extraction unit and a second spatial feature extraction unit, and the first spatial feature extraction unit and the second spatial feature extraction unit both use a multi-scale graph convolution topological structure; the GRU model comprises a first gated recursion unit GRU layer to a Wth gated recursion unit GRU layer which are sequentially connected; wherein, the value range of W is between 62 and 122;

a training module, configured to input the training set X1, the adjacency matrix, and the association matrix into an initialized SDN network traffic prediction model based on the GCN-GRU for training, so as to extract spatial features and temporal features, and obtain a trained SDN network traffic prediction model based on the GCN-GRU, where a training process of the training module includes:

(1) Inputting the training set X1 and the adjacency matrix into the first spatial feature extraction unit to obtain a first spatial correlation feature matrix

(2) Inputting the training set X1 and the correlation matrix into the second spatial feature extraction unit to obtain a second spatial correlation feature matrix

(3) According to the first spatial correlation feature matrix

And a second space phaseRelevance feature matrix->

Obtaining a spatial correlation characteristic matrix ^ output by the dual-channel GCN model according to the following formula (4)>

Wherein "|" represents the concatenation of the matrices;

(4) Outputting the spatial correlation characteristic matrix of the dual-channel GCN model

Inputting the GRU model to obtain a network flow matrix H with network flow space-time characteristics; and

(5) Obtaining the predicted load of each link of the network traffic matrix H through the full connection layer;

the iteration training module is used for performing iteration training on the trained SDN network traffic prediction model based on the GCN-GRU by adopting a back propagation algorithm strategy to obtain optimal model parameters;

the test evaluation module is used for inputting the test set X2 into the GCN-GRU-based SDN network traffic prediction model after iterative learning by the iterative training module, evaluating the GCN-GRU-based SDN network traffic prediction model by using an evaluation index, changing the value of M if the evaluation index of the GCN-GRU-based SDN network traffic prediction model does not meet the preset evaluation index, and then enabling the training module to continue to execute the steps (4) and (5) until the trained evaluation index of the GCN-GRU-based SDN network traffic prediction model meets the preset evaluation index;

the GCN-GRU-based SDN network traffic prediction model which is trained and tested and evaluated by the training module, the iterative training module and the test evaluation module can be used for large-scale SDN network traffic prediction.

As an improvement of the above scheme, the adjacency matrix construction module specifically includes:

an adjacency matrix construction unit, configured to construct a network topology structure graph G = (V, E, a) according to a link connection attribute of the SDN network, where V is a set of nodes, E is a set of edges between two nodes, and a is an adjacency matrix:

wherein, a _pq For the interconnection of any two nodes p and q on the network topology structure diagram, a _pq =1 denotes that nodes p and q are connected, a _pq =0 represents that nodes p and q are not connected;

a relevance matrix constructing unit for constructing each element a of the adjacency matrix A which is not 0 _pq According to the formula (3), replacing the correlation degree lambda with the corresponding correlation degree lambda _pq Thereby obtaining the relevancy matrix B.

As an improvement of the above, the first spatial feature extraction unit obtains the first spatial correlation feature matrix by learning the power of 1 to K of the adjacency matrix

The second spatial feature extraction unit obtains a second spatial correlation feature matrix ^ H by learning the power from 1 to K of the correlation matrix>

The first spatial correlation feature matrix +>

And a second spatial correlation feature matrix>

Respectively as follows:

wherein θ represents a trainable weight matrix used for learning characteristic information of the link node, σ represents a Relu nonlinear activation function, and the nonlinear activation function Relu specifically is: relu (x) = max {0, x }, relu activation is used for increasing the nonlinear relation among each layer of the neural network, reducing the interdependence relation of parameters, relieving the occurrence of the over-fitting problem and improving the generalization capability of the model.

X1 is a training set;

obtaining a multi-scale neighborhood feature for each node, wherein A' ¹ σ(X1θ)，B' ¹ Sigma (X1 theta) is used for acquiring feature information, A ', from a neighborhood of order 1 for each node' ^K σ(X1θ)，B' ^K Sigma (X1 theta) is used for acquiring characteristic information from a K-order neighborhood for each node; a' ⁰ σ(X1θ)＝σ(X1θ)，B' ⁰ σ (X1 θ) = σ (X1 θ) retains more own characteristic information for each node, thereby acquiring more neighborhood information for each node; />

wherein, I is a unit matrix,

is a diagonal matrix, the elements other than the diagonal are 0, and->

The elements on the diagonal of each line in the array, etcAt/are>

wherein, I is an identity matrix,

is a diagonal matrix with 0 in addition to the diagonal elements and->

Wherein the element on the diagonal of each row equals->

The sum of the elements of the corresponding row in; b' is the normalized correlation matrix.

As an improvement of the above scheme, in the training module, the GRU model controls the transmitted information by setting a reset gate and an update gate; the specific calculation process is as formula (7):

wherein H _t-1 The output state at the time t-1;

for time t network traffic characteristics X _t Outputting the corresponding dual-channel GCN model; gamma-shaped _r Controlling how much information is written into the current state at the previous moment for resetting the gate, wherein the smaller the reset gate is, the less the information is written into at the previous moment; gamma-shaped _μ To updateThe gate is used for controlling the degree of the state information at the previous moment being brought into the current state, and the larger the value of the updated gate is, the more the state information at the previous moment is brought into; />

As an improvement of the above solution, the working process of the iterative training module includes:

calculating output value of SDN network flow prediction model based on GCN-GRU

With the actual value x _t+1 Deviation of (2)

comparing the deviation with a precision epsilon given by the link load resource routing scheduling:

Updating the weight:

returning to step S61 until the deviation reaches the precision epsilon or the model converges, wherein W _θ ' represents the updated weight parameter,>

is a derivative function.

Compared with the prior art, the large-scale SDN network traffic prediction method and the large-scale SDN network traffic prediction system provided by the embodiment of the invention fully consider the space-time characteristics of the large-scale SDN network, not only can acquire the time characteristics of the network traffic, but also can acquire the space characteristics of the complex topology, therefore, the method can effectively predict the space-time variation characteristics and rules of the SDN network traffic, has high prediction precision, and improves the SDN network traffic prediction effect.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a large-scale SDN network traffic prediction method according to an embodiment of the present invention.

Fig. 2 is a schematic specific flowchart of step S5 of the large-scale SDN network traffic prediction method according to the embodiment of the present invention shown in fig. 1.

Fig. 3 is a schematic specific flowchart of step S3 of the large-scale SDN network traffic prediction method according to the embodiment of the present invention shown in fig. 1.

Fig. 4 is a block diagram of a large-scale SDN network traffic prediction system according to an embodiment of the present invention.

Fig. 5 is a specific structural block diagram of an adjacency matrix building module of the large-scale SDN network traffic prediction system according to the embodiment of the present invention shown in fig. 4.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, an embodiment of the present invention provides a large-scale SDN network traffic prediction method, including steps S1 to S8:

s1, historical flow data of a switch port in a network topology is obtained through an SDN controller, and a flow characteristic matrix X is obtained. Wherein, the flow characteristic matrix is recorded as a data set as shown in formula (1):

representing a link load characteristic value of the ith node at time t, mapping each link among all the exchangers into a node, wherein N is the number of the nodes, and M is the time length;

s2, carrying out normalization processing on the data set X, and dividing the data set after normalization processing into a training set X1 and a testing set X2 according to the proportion of 7. The normalized data set is shown in formula (2):

and S3, constructing an adjacency matrix among different links according to the data set after normalization processing, and constructing a correlation matrix among different links according to correlation analysis. In the relevance analysis, the relevance coefficient and the relevance calculation formula are shown as formula (3):

wherein, the reference flow sequence is set as

Comparing the flow sequence to->

and &>

and S4, constructing an SDN network flow prediction model based on GCN-GRU and initializing the model.

The SDN network flow prediction model comprises a dual-channel GCN model used for extracting spatial features, a GRU model used for extracting time features and a full connection layer, the dual-channel GCN model comprises a first spatial feature extraction unit and a second spatial feature extraction unit, the first spatial feature extraction unit and the second spatial feature extraction unit both use a multi-scale graph convolution topological structure, and more neighborhood information can be acquired for each link node by using the multi-scale graph convolution topological structure. The GRU model comprises a first gated recursion unit GRU layer to a Wth gated recursion unit GRU layer which are sequentially connected; wherein, the value range of W is between 62 and 122;

and S5, inputting the training set X1, the adjacency matrix and the relevance matrix into an initialized SDN network traffic prediction model based on the GCN-GRU for training so as to extract spatial features and time features and obtain the trained SDN network traffic prediction model based on the GCN-GRU.

Referring to fig. 2, the step S5 specifically includes:

S53, according to the first spatial correlation characteristic matrix

And a second spatial correlation feature matrix>

Wherein "|" represents the concatenation of the matrices;

Inputting the GRU model to obtain a network flow matrix H with network flow space-time characteristics;

and S55, obtaining the predicted load of each link by the network traffic matrix H through the full connection layer.

And S6, performing iterative training on the SDN network traffic prediction model based on the GCN-GRU trained in the step S5 by adopting a back propagation algorithm strategy to obtain optimal model parameters.

The evaluation index can be determined by specifically using the average absolute error MAE, the root mean square error RMSE and the R2 decision coefficient to evaluate the prediction result and verify the prediction accuracy. These evaluation index methods are familiar to those skilled in the art and will not be described herein.

And S8, large-scale SDN network flow prediction can be carried out by using the SDN network flow prediction model which is trained, tested and evaluated in the steps S5-S7 and is based on the GCN-GRU.

Further, as shown in fig. 3, the step S3 specifically includes:

wherein, a _pq For the interconnection of any two nodes p and q on the network topology structure chart, a _pq =1 denotes nodes p and q are connected, a _pq =0 represents that nodes p and q are not connected;

s32, setting each element a which is not 0 in the adjacency matrix A constructed in the step S31 as _pq According to the formula (3), replacing the correlation degree lambda with the corresponding correlation degree lambda _pq Thereby obtaining the relevancy matrix B.

It can be understood that, the adjacency matrix a is set according to the connectivity between the nodes in the embodiment of the present invention, and this method for determining the adjacency matrix of the traffic network has certain rationality, and it is considered that the correlation degree between the connected nodes is higher than that between the disconnected nodes. However, each target node has a plurality of connecting nodes, and the influence of each connecting node on the target node is not the same. That is to say, each link node has spatial correlation, and the spatial correlation (correlation size, i.e. correlation coefficient) between each target node and other adjacent nodes is different, in order to solve this problem, the present invention analyzes the influence between different nodes by using the calculation formula about the correlation coefficient and the correlation shown in formula (3), thereby constructing the correlation matrix between different links according to the correlation analysis, and the spatial relationship of the SDN network can be better described by using the correlation matrix. Correspondingly, the first spatial correlation (adjacency matrix) characteristic and the second spatial correlation (relevance matrix) characteristic are respectively obtained through the dual-channel GCN model comprising the first spatial feature extraction unit and the second spatial feature extraction unit and then are fused, and the spatial features of the link nodes can be more comprehensively extracted.

Further, the first spatial feature extraction unit obtains a first spatial correlation feature matrix by learning the power of 1 to K of the adjacency matrix

The second spatial feature extraction unit learns the 1 st to K th power of the correlation matrixA second spatial correlation feature matrix is obtained>

The first spatial correlation feature matrix +>

And a second spatial correlation feature matrix>

Respectively as follows:

obtaining a multi-scale neighborhood feature for each node, wherein A' ¹ σ(X1θ)，B' ¹ Obtaining feature information from a 1 st order neighborhood, A ', for each node' ^K σ(X1θ)，B' ^K Sigma (X1 theta) is used for acquiring characteristic information from a K-order neighborhood for each node; a' ⁰ σ(X1θ)＝σ(X1θ)，B' ⁰ Sigma (X1 θ) = sigma (X1 θ) reserves more own feature information for each node, thereby acquiring more neighborhood information for each node;

wherein I is an identity matrix，

The method is characterized in that a self-loop is added to each road section node in the road network, and the characteristic information of a part of road section nodes can be reserved when the characteristics of the road section nodes are updated.

Is a diagonal matrix, the elements other than the diagonal are 0, and->

Wherein the element on the diagonal of each row equals->

The sum of the elements of the corresponding row in (1); a' is the normalized adjacency matrix;

wherein, I is a unit matrix,

is a diagonal matrix, the elements other than the diagonal are 0, and->

In which each line diagonal element equals>

It can be understood that the prediction accuracy and the convergence rate of the model can be effectively improved by normalizing the adjacency matrix and the relevance matrix.

Further, in the step S54, the GRU model controls the transmitted information by setting a reset gate and an update gate; the specific calculation process is as formula (7):

wherein H _t-1 The output state at the time t-1;

for time t network traffic characteristics X _t Outputting the corresponding dual-channel GCN model; gamma-shaped _r Controlling how much information is written into the current state at the previous moment for resetting the gate, wherein the smaller the reset gate is, the less the information is written into the previous moment; gamma-shaped _μ The updating gate is used for controlling the degree of the state information at the previous moment being brought into the current state, and the larger the value of the updating gate is, the more the state information at the previous moment is brought into the updating gate; />

A storage unit for storing the contents stored at time t; h _t For the output state at time t, σ denotes the activation function, W _μ 、W _r 、W _c Is a weight, b _μ 、b _r 、b _c Is a bias term; the output states at each time constitute a network traffic matrix H, H = { H ₁ ,...,H _t ,...}。

Further, in this embodiment, the step S6 of performing iterative training on the SDN network traffic prediction model based on the GCN-GRU trained in the step S5 by using a back propagation algorithm strategy includes the following steps:

s61, calculating output values of the SDN network flow prediction model based on the GCN-GRU

With the actual value x _t+1 Deviation of (2)

Wherein x is _t-τ Representing the characteristic value of the load of all nodes at time t-tau, W _θ Representing weights of an SDN network flow prediction model based on GCN-GRU;

if the deviation does not satisfy the precision epsilon, calculating the deviation for W _θ Partial derivatives of

And (3) updating the weight:

returning to step S61 until the deviation reaches the precision epsilon or the model converges, wherein W _θ ' indicates an updated weight parameter,>

is a derivative function.

Referring to fig. 4, an embodiment of the present invention further provides a large-scale SDN network traffic prediction system, including:

a data set obtaining module 401, configured to obtain historical traffic data of a switch port in a network topology through an SDN controller, to obtain a traffic feature matrix X, and record the traffic feature matrix as a data set shown in formula (1):

and (4) representing the link load characteristic value of the ith node at the time t, mapping each link among all the switches into a node, wherein N is the number of the nodes, and M is the time length.

A normalization processing module 402, configured to perform normalization processing on the data set X, and divide the data set after the normalization processing into a training set X1 and a test set X2 according to a ratio of 7; the normalized data set is shown in equation (2):

wherein, min (X) ⁱ ) Is the minimum value in the data set before normalization, max (X) ⁱ ) Is the maximum value in the data set before normalization processing.

An adjacency matrix construction module 403, configured to construct an adjacency matrix between different links according to the normalized data set, and construct a correlation matrix between different links according to correlation analysis; in the correlation analysis, the calculation formula of the correlation coefficient and the correlation is shown in formula (3):

wherein, the reference flow sequence is set as

Comparing the traffic sequence to>

α _pq [t]As a reference flow sequence X _p And comparing the flow sequences X ^q The correlation coefficient at the time t is,

and &>

Are respectively a reference flow sequence X ^p And comparing the flow sequences X ^q In the minimum value and the maximum value of the absolute difference values of the data at all corresponding moments, beta is a resolution coefficient, the value range is (0, 1), the smaller the beta value is, the stronger the differentiable degree of a correlation coefficient is, and the reference flow sequence X is ^p And comparing the flow sequences X _q Degree of correlation λ of _pq For both time intervals, the correlation coefficient alpha _pq [t]Average value of (a).

A traffic prediction model construction module 404, configured to construct and initialize an SDN network traffic prediction model based on the GCN-GRU; the SDN network traffic prediction model comprises a dual-channel GCN model for extracting spatial features, a GRU model for extracting temporal features and a full connection layer, the dual-channel GCN model comprises a first spatial feature extraction unit and a second spatial feature extraction unit, and the first spatial feature extraction unit and the second spatial feature extraction unit both use a multi-scale graph convolution topological structure; the GRU model comprises a first gated recursion unit GRU layer to a Wth gated recursion unit GRU layer which are sequentially connected; wherein the value range of W is between 62 and 122.

A training module 405, configured to input the training set X1, the adjacency matrix, and the association matrix into an initialized SDN network traffic prediction model based on the GCN-GRU for training, so as to extract spatial features and temporal features, and obtain a trained SDN network traffic prediction model based on the GCN-GRU, where a training process of the training module includes the steps of:

(3) According to the first spatial correlation feature matrix

And a second spatial correlation feature matrix>

Wherein "|" represents the concatenation of the matrices;

(5) And obtaining the predicted load of each link by the network flow matrix H through the full connection layer.

And the iterative training module 406 performs iterative training on the trained SDN network traffic prediction model based on the GCN-GRU by adopting a back propagation algorithm strategy to obtain optimal model parameters.

A test evaluation module 407, configured to input the test set X2 into the GCN-GRU-based SDN network traffic prediction model after iterative learning by the iterative training module, evaluate the GCN-GRU-based SDN network traffic prediction model using the evaluation index, change the value of M if the evaluation index of the GCN-GRU-based SDN network traffic prediction model does not meet the preset evaluation index, and then enable the training module to continue to perform steps (4) and (5) until the trained evaluation index of the GCN-GRU-based SDN network traffic prediction model meets the preset evaluation index.

The SDN network traffic prediction model based on the GCN-GRU, which is trained and tested and evaluated by the training module 405, the iterative training module 406, and the test evaluation module 407, can perform large-scale SDN network traffic prediction.

Referring to fig. 5, the adjacency matrix building module 403 specifically includes:

an adjacency matrix construction unit 4031, configured to construct a network topology structure graph G = (V, E, a) according to a link connection attribute of the SDN network, where V is a set of nodes, E is a set of edges between two nodes, and a is an adjacency matrix:

wherein, a _pq For the interconnection of any two nodes p and q on the network topology structure diagram, a _pq =1 denotes that nodes p and q are connected, a _pq =0 means that nodes p and q are not connected;

a relevance matrix construction unit 4032 for constructing each element a of the adjacency matrix a that is not 0 _pq According to the formula (3), replacing the correlation degree lambda with the corresponding correlation degree lambda _pq Thereby obtaining the relevancy matrix B.

Further, the first spatial feature extraction unit obtains a first spatial correlation feature matrix by learning the 1 st to K th power of the adjacency matrix

The first spatial correlation feature matrix +>

And a second spatial correlation feature matrix>

Respectively as follows:

/>

obtaining a multi-scale neighborhood feature for each node, wherein A' ¹ σ(X1θ)，B' ¹ Sigma (X1 theta) is used for acquiring feature information, A ', from a neighborhood of order 1 for each node' ^K σ(X1θ)，B' ^K Sigma (X1 theta) is used for acquiring characteristic information from a K-order neighborhood for each node; a' ⁰ σ(X1θ)＝σ(X1θ)，B' ⁰ Sigma (X1 θ) = sigma (X1 θ) reserves more own feature information for each node, thereby acquiring more neighborhood information for each node;

wherein, I is an identity matrix,

is a diagonal matrix with 0 in addition to the diagonal elements and->

Wherein the element on the diagonal of each row equals->

wherein, I is a unit matrix,

is a diagonal matrix with 0 in addition to the diagonal elements and->

In which each line diagonal element equals>

Further, in the training module 405, the GRU model controls the transmitted information by setting a reset gate and an update gate; the specific calculation process is as formula (7):

wherein H _t-1 The output state at the time t-1;

for time t network traffic characteristic X _t Outputting the corresponding dual-channel GCN model; gamma-shaped _r To reset the gate, it is controlled how much information was written into the current state at the previous time, the smaller the reset gate, the information at the previous time was writtenThe less; gamma-shaped _μ The updating gate is used for controlling the degree of the state information at the previous moment being brought into the current state, and the larger the value of the updating gate is, the more the state information at the previous moment is brought into the updating gate; />

Further, the operation process of the iterative training module 406 includes:

calculating output value of SDN network flow prediction model based on GCN-GRU

With the actual value x _t+1 In (b) is greater than or equal to>

And (3) updating the weight:

is a derivative function.

In summary, the method and the system for predicting the flow of the large-scale SDN network provided by the embodiment of the invention fully consider the time-space characteristics of the large-scale SDN network, and not only can acquire the time characteristics of the network flow, but also can acquire the space characteristics of a complex topology, so that the method can effectively predict the time-space variation characteristics and rules of the SDN network flow, has high prediction precision, and improves the effect of predicting the flow of the large-scale SDN network.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A large-scale SDN network flow prediction method is characterized by comprising the following steps:

s1, acquiring historical flow data of a switch port in a network topology through an SDN controller to obtain a flow characteristic matrix X, and recording the flow characteristic matrix as a data set shown in a formula (1):

representing the characteristic value of the link load at the ith node at time t, each link between all switches is mapped to oneEach node, wherein N is the number of the nodes, and M is the time length;

s2, carrying out normalization processing on the data set X, and dividing the data set after normalization processing into a training set X1 and a testing set X2 according to the proportion of 7; the normalized data set is shown in formula (2):

wherein, the reference flow sequence is set as

Comparing the flow sequence to->

α _pq [t]As a reference flow sequence X ^p And comparing the flow sequences X ^q The correlation coefficient at the time t is,

and &>

Are respectively a reference flow sequence X ^p And comparing the flow sequences X ^q In the minimum value and the maximum value of the absolute difference values of the data at all corresponding moments, beta is a resolution coefficient, the value range is (0, 1), the smaller the beta value is, the stronger the distinguishability of the association coefficient is, and the reference flow sequence X is ^p And comparing the flow sequences X ^q Degree of correlation λ of _pq For both time intervals, the correlation coefficient alpha _pq [t]Average value of (d);

s4, constructing an SDN network flow prediction model based on GCN-GRU and initializing the SDN network flow prediction model; the SDN network traffic prediction model comprises a dual-channel GCN model for extracting spatial features, a GRU model for extracting temporal features and a full connection layer, the dual-channel GCN model comprises a first spatial feature extraction unit and a second spatial feature extraction unit, and the first spatial feature extraction unit and the second spatial feature extraction unit both use a multi-scale graph convolution topological structure; the GRU model comprises a first gated recursion unit GRU layer to a Wth gated recursion unit GRU layer which are sequentially connected; wherein, the value range of W is between 62 and 122;

S53, according to the first spatial correlation characteristic matrix

And a second spatial correlation feature matrix>

Wherein "|" represents the concatenation of the matrices;

s55, obtaining the predicted load of each link by the network traffic matrix H through the full connection layer;

s7, inputting the test set X2 into the GCN-GRU-based SDN network traffic prediction model after iterative learning in the step S6, evaluating the GCN-GRU-based SDN network traffic prediction model by using an evaluation index, changing the value of M if the evaluation index of the GCN-GRU-based SDN network traffic prediction model does not accord with a preset evaluation index, and then continuing to execute the steps S54-S55 until the trained evaluation index of the GCN-GRU-based SDN network traffic prediction model meets the preset evaluation index;

2. The large-scale SDN network traffic prediction method according to claim 1, wherein the step S3 specifically includes:

3. The large-scale SDN network traffic prediction method of claim 2, wherein the first spatial feature extraction unit obtains a first spatial correlation feature matrix by learning a 1 to K power of an adjacency matrix

The first spatial correlation feature matrix ≥>

Is related to the second spaceSexual characteristic matrix->

Respectively as follows:

obtaining a multi-scale neighborhood feature for each node, wherein A' ¹ σ(X1θ)，B' ¹ Obtaining feature information from a 1 st order neighborhood, A ', for each node' ^K σ(X1θ)，B' ^K Sigma (X1 theta) is used for acquiring characteristic information from a K-order neighborhood for each node; a' ⁰ σ(X1θ)＝σ(X1θ)，B' ⁰ σ (X1 θ) = σ (X1 θ) retains more own characteristic information for each node, thereby acquiring more neighborhood information for each node;

wherein, I is an identity matrix,

is a diagonal matrix, the elements other than the diagonal are 0, and->

In which each line diagonal element equals>

b' is a correlation matrix obtained by normalizing the correlation matrix B, and is specifically processed according to the following formula (6):

wherein, I is an identity matrix,

is a diagonal matrix, the elements other than the diagonal are 0, and->

Wherein the element on the diagonal of each row equals->

4. The large-scale SDN network traffic prediction method according to claim 1, wherein in step S54, the GRU model controls the passed information by setting a reset gate and an update gate; the specific calculation process is as formula (7):

/>

wherein H _t-1 The output state at the time t-1;

for time t network traffic characteristic X _t Outputting a corresponding dual-channel GCN model; gamma-shaped _r Controlling how much information is written into the current state at the previous moment for resetting the gate, wherein the smaller the reset gate is, the less the information is written into the previous moment; gamma-shaped _μ The updating gate is used for controlling the degree of the state information at the previous moment being brought into the current state, and the larger the value of the updating gate is, the more the state information at the previous moment is brought into the updating gate; />

A storage unit for indicating the storage content stored at time t; h _t For the output state at time t, σ denotes the activation function, W _μ 、W _r 、W _c Is a weight, b _μ 、b _r 、b _c Is a bias term; the output states at each time constitute a network traffic matrix H, H = { H ₁ ,...,H _t ,...}。

5. The large-scale SDN network traffic prediction method according to claim 1, wherein the step S6 specifically includes:

With the actual value x _t+1 Deviation of (2)

Updating the weight:

is a derivative function.

6. A large-scale SDN network traffic prediction system, comprising:

the normalization processing module is used for performing normalization processing on the data set X and dividing the data set after the normalization processing into a training set X1 and a testing set X2 according to the proportion of 7; the normalized data set is shown in formula (2):

the adjacency matrix construction module is used for constructing adjacency matrixes among different links according to the data set after normalization processing and constructing a correlation matrix among the different links according to correlation analysis; in the correlation analysis, the calculation formula of the correlation coefficient and the correlation is shown in formula (3):

wherein, the reference flow sequence is set as

Comparing the flow sequence to->

and &>

Are respectively a reference flow sequence X ^p And comparing the flow sequences X ^q The minimum value and the maximum value in the absolute difference values of the data at all corresponding moments, beta is a resolution coefficient, the value range is (0, 1), and the value of beta isThe smaller the correlation coefficient is, the stronger the distinguishability of the correlation coefficient is, and the reference flow sequence X is ^p And comparing the flow sequences X ^q Degree of correlation λ of _pq For both time intervals, the correlation coefficient alpha _pq [t]Average value of (d);

the traffic prediction model construction module is used for constructing and initializing an SDN network traffic prediction model based on GCN-GRU; the SDN network traffic prediction model comprises a dual-channel GCN model for extracting spatial features, a GRU model for extracting temporal features and a full connection layer, the dual-channel GCN model comprises a first spatial feature extraction unit and a second spatial feature extraction unit, and the first spatial feature extraction unit and the second spatial feature extraction unit both use a multi-scale graph convolution topological structure; the GRU model comprises a first gated recursion unit GRU layer to a Wth gated recursion unit GRU layer which are sequentially connected; wherein, the value range of W is between 62 and 122;

the training module is used for inputting the training set X1, the adjacency matrix and the relevance matrix into an initialized SDN network traffic prediction model based on GCN-GRU for training so as to extract spatial features and time features and obtain a trained SDN network traffic prediction model based on GCN-GRU, and the training process of the training module comprises the following steps:

(3) According to the first spatial correlation feature matrix

And a second spatial correlation characteristicSign matrix>

Wherein "|" represents the concatenation of the matrices;

(5) Obtaining the predicted load of each link by the network traffic matrix H through the full connection layer;

the iteration training module is used for performing iteration training on the trained SDN network flow prediction model based on the GCN-GRU by adopting a back propagation algorithm strategy to obtain optimal model parameters;

the test evaluation module is used for inputting the test set X2 into the GCN-GRU-based SDN network traffic prediction model subjected to iterative learning by the iterative training module, evaluating the GCN-GRU-based SDN network traffic prediction model by using an evaluation index, changing the value of M if the evaluation index of the GCN-GRU-based SDN network traffic prediction model does not accord with a preset evaluation index, and then enabling the training module to continue executing the steps (4) and (5) until the trained evaluation index of the GCN-GRU-based SDN network traffic prediction model meets the preset evaluation index;

the SDN network traffic prediction model based on the GCN-GRU, which is trained and tested and evaluated by the training module, the iterative training module and the test evaluation module, can be used for large-scale SDN network traffic prediction.

7. The large-scale SDN network traffic prediction system of claim 6, wherein the adjacency matrix construction module specifically comprises:

wherein, a _pq For the interconnection of any two nodes p and q on the network topology structure diagram, a _pq =1 denotes nodes p and q are connected, a _pq =0 represents that nodes p and q are not connected;

8. The large-scale SDN network traffic prediction system of claim 7, wherein the first spatial feature extraction unit obtains the first spatial correlation feature matrix by learning a 1 to K power of the adjacency matrix

The first spatial correlation feature matrix ≥>

And a secondSpatially correlated feature matrix ≥>

Respectively as follows:

wherein, I is a unit matrix,

is a diagonal matrix, the elements other than the diagonal are 0, and->

Wherein the element on the diagonal of each row equals->

wherein, I is an identity matrix,

is a diagonal matrix, the elements other than the diagonal are 0, and->

In which each line diagonal element equals>

9. The large-scale SDN network traffic prediction system of claim 1, wherein in the training module, the GRU model controls information passed through setting a reset gate and an update gate; the specific calculation process is as formula (7):

wherein H _t-1 The output state at the time t-1;

for time t network traffic characteristic X _t Outputting the corresponding dual-channel GCN model; gamma-shaped _r Controlling how much information is written into the current state at the previous moment for resetting the gate, wherein the smaller the reset gate is, the less the information is written into the previous moment; gamma-shaped _μ The updating gate is used for controlling the degree of the state information at the previous moment being brought into the current state, and the larger the value of the updating gate is, the more the state information at the previous moment is brought into the updating gate; />

10. The large-scale SDN network traffic prediction system of claim 6, wherein the iterative training module comprises:

calculating output value of SDN network flow prediction model based on GCN-GRU

With the actual value x _t+1 Is greater than or equal to>

And (3) updating the weight:

is a derivative function. />