CN112564945B

CN112564945B - IP network flow estimation method based on time sequence prior and sparse representation

Info

Publication number: CN112564945B
Application number: CN202011318745.4A
Authority: CN
Inventors: 王传栋; 张永
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2023-03-24
Anticipated expiration: 2040-11-23
Also published as: CN112564945A

Abstract

The invention discloses an IP network flow estimation method based on time sequence prior and sparse representation, which comprises the steps of firstly, acquiring flow values transmitted among all source-destination nodes in a network to construct an incomplete flow matrix; then modeling is carried out on the space-time correlation existing in the incomplete flow matrix by utilizing a sparse representation theory and a regularization technology to form a flow matrix estimation model; and then, converting the complex original problem into a plurality of sub-problems which are easy to solve by an alternating direction multiplier method, and finding out the local optimal solution of the original problem by iteratively optimizing the global optimal solution of the sub-problems. And finally, estimating a complete flow matrix. The invention can utilize the spatial correlation of the traffic matrix while considering the time sequence correlation of adjacent network nodes based on the time sequence and the spatial correlation in the traffic matrix, and provides theoretical support for the optimization of the traffic estimation method.

Description

IP network flow estimation method based on time sequence prior and sparse representation

Technical Field

The invention relates to a network traffic estimation method, in particular to an IP network traffic estimation method based on time sequence prior and sparse representation.

Background

Traffic Matrix (TM) is common full-network-level Traffic data, records Traffic values transmitted between all source-Destination (OD) node pairs of a measured network, and is widely applied to Traffic engineering, full-network anomaly detection and other application problems. However, since the traffic matrix needs to capture the global state information of the network traffic, the cost of directly measuring all the traffic matrix data is too high, and it is almost infeasible and impractical in practical application. Estimating the traffic matrix by indirect observation can reduce the cost and overhead of direct measurement, which has become a popular research field.

Many effective methods should be used for the traffic matrix estimation, but the estimation accuracy is not enough due to the fact that the inherent space-time correlation characteristic of the traffic matrix is not utilized

Disclosure of Invention

The invention aims to: aiming at the defects of the prior art, the invention provides an IP network flow estimation method based on time sequence prior and sparse representation, and the accuracy of flow matrix estimation is improved.

The technical scheme is as follows: the invention discloses an IP network flow estimation method based on time sequence prior and sparse representation, which is characterized by comprising the following steps of:

s1, acquiring flow values transmitted among all source-destination nodes in a network to construct an incomplete flow matrix;

s2, establishing a flow matrix estimation model aiming at the space-time correlation existing in the incomplete flow matrix by using a sparse representation theory and a regularization technology;

s3, converting the original flow matrix estimation problem into a plurality of sub-problems easy to solve by an alternating direction multiplier method;

and S4, iteratively optimizing the global optimal solution of the sub-problem to find the local optimal solution of the original problem, and estimating a complete flow matrix.

The incomplete flow matrix in the step S1 is constructed according to the following steps:

assuming that the number of time intervals of one day is T and the total number of OD pairs is N, the traffic matrix can be represented as

Wherein m is _ij Represents the flow value of the jth OD pair at the ith time interval, "# represents a known flow value,"? "represents missing flow values, and in the flow matrix, one column represents one sample, one OD pair for all time intervals of the day, and one row represents the flow values of all OD pairs for one time interval.

The step S2 includes the following steps:

firstly, a traffic estimation model based on a sparse representation theory is established according to the spatial correlation existing in an incomplete traffic matrix, and the expression is as follows:

wherein,

for a known incomplete traffic matrix>

For a complete flow matrix that needs to be solved, then>

The coefficient matrix is expressed for sparseness needing to be solved, the omega set expresses a known flow value element subscript set in the flow matrix, and lambda ₁ Is an adjustable parameter. P _Ω (. Cndot.) is a projection operator, indicating that when the element index (i, j) ∈ Ω, the corresponding position sample element is obtained:

because the element values in the traffic matrix are not negative, X is more than or equal to 0, and in order to avoid trivial solution, the diagonal elements of the constraint sparse representation matrix W are all 0, namely diag (W) =0;

aiming at the time sequence correlation existing in the incomplete flow matrix, a flow estimation model based on time sequence prior and sparse representation is established by combining a flow estimation model based on a sparse representation theory, and the expression of the flow estimation model is as follows:

wherein λ is ₂ In order to be an adjustable parameter, the device is provided with a power supply,

is a Toeplitz (0,1, -1) matrix.

In step S3, the sub-problems include the flow matrix X, the sparse representation coefficient matrix W, and the error variable C representing the error of the incomplete flow matrix M and the complete flow matrix X outside the set Ω during the iteration process,

the step S4 comprises the following specific steps:

s41, introducing an error variable C for solving conveniently, and rewriting the flow estimation model into the following form:

s.t.X≥0,diag(W)＝0，M＝X+C，P _Ω (C)＝0

s42, putting the constraint terms into the objective function, defining indication functions g (X) and f (C), and converting the optimization problem into an equivalent penalty function form, wherein the expression of the indication functions is as follows:

the penalty function is expressed as follows:

s43, obtaining an expression of X, W, C and beta to be solved according to the penalty function as follows:

where ρ is a fixed parameter, preferably 1.1 or 1.2, β is a parameter of a penalty function, β is _k ，β _k-1 Beta values, beta, in the k-th iteration and the k-1-th iteration, respectively _max Is a fixed parameter representing the maximum value of beta, preferably 10 ⁶ F is the F norm; and performing alternate optimization solution on the above formula, wherein the flow matrix X when the preset maximum iteration times are reached is the estimated complete flow matrix, the maximum iteration times are constants within 50, and the specific numerical values are determined according to the experimental effect.

SparsityIn the solving process of the expression matrix W, each column is solved respectively, the solving process of each column is regarded as a LASSO problem, and W is set _i I-th column, X, representing W _i I-th column of X, X _-i Representing the matrix obtained by removing the ith column from X, solving for W so as to satisfy the constraint diag (W) =0 _i When X is not involved in the calculation, W _i The expression of (a) is as follows:

has the beneficial effects that: compared with the prior art, the invention has the following remarkable advantages: the method has the advantages that the defect of inaccurate estimation caused by the traditional KNN estimation algorithm is effectively overcome, the advantages of the method can be better embodied under the high-dimensional condition, the spatial correlation of the flow matrix is utilized while the time sequence correlation of adjacent network nodes is considered, the obtained flow matrix is more accurate, and theoretical support is provided for optimization of the flow estimation method.

Drawings

FIG. 1 is a schematic view of a model structure according to the present invention;

FIG. 2 is a flow chart of the method of the present invention.

Detailed Description

The technical scheme of the invention is further explained by combining the attached drawings.

As shown in fig. 1, the basic idea of the IP network traffic estimation method based on time sequence prior and sparse representation of the present invention is to collect traffic values at t times in the network, construct an incomplete traffic matrix according to the known traffic values, construct a time sequence prior and sparse representation model based on time sequence correlation and spatial correlation, solve the model by an alternating direction multiplier method, and estimate an complete traffic matrix. The method comprises the following specific steps:

step 1), constructing an incomplete flow matrix M:

the method comprises the steps of collecting traffic values transmitted among all source-Destination (OD) nodes in a network to obtain an incomplete traffic matrix M. In the flow matrix M, a column represents a sample, and is an OD pairFlow values for all time intervals during the day. One row represents the flow values of all OD pairs in a time interval, and if the time interval number of one day is T and the total number of the OD pairs is N, the flow matrix can be represented as

Wherein m is _ij Represents the flow value of the jth OD pair at the ith time interval, "-" represents a known flow value, "? "represents the missing flow value. Due to the influence of the communication behavior of the user, the traffic values of the adjacent nodes in the traffic matrix are related, i.e. spatially related, and a large number of missing values in the traffic matrix are a sparse matrix.

Step 2), establishing a flow estimation model based on a sparse representation theory:

in a sparse traffic matrix, each sample can be represented as a linear combination of other samples, and the closer to the sample the higher the weight coefficient, the farther away the sample the lower or close to 0 the weight coefficient. Therefore, the traffic matrix estimation problem can be estimated using sparse representation theory. By representing the weight coefficient matrix by W, the traffic matrix estimation problem can be modeled as:

/>

wherein,

for a known incomplete traffic matrix, ->

For a complete flow matrix that needs to be solved, then>

The method comprises the steps of representing a coefficient matrix for sparseness needing to be solved, representing a known flow value element subscript set in a flow matrix by an omega set, and representing a known flow value element subscript set in the flow matrix by a lambda ₁ Is an adjustable parameter. P _Ω (. Cndot.) is a projection operator, indicating that when the element index (i, j) ∈ Ω, the corresponding position sample element is obtained:

since the element values in the traffic matrix are not negative, there is X ≧ 0, and to avoid trivial solution, the diagonal elements of the constraint sparse representation matrix W are all 0, i.e., diag (W) =0.

Step 3), establishing a flow estimation model based on time sequence prior and sparse representation:

the flow data was found to be time-ordered. For each OD stream, its own time series characteristics may be described in terms of time correlation. To characterize the time-dependent nature of the flow matrix, the flow matrix is characterized by minimizing an objective function | RX | ₁ The effect of temporary stabilization of elements in the flow matrix in the time dimension can be obtained, so that the correlation in the time sequence in the flow matrix is better described. At this time, the flow estimation model based on the time sequence prior and sparse representation is as follows:

is a Toeplitz (0,1, -1) matrix.

Step 4), converting a more complex model solution problem into three sub-problems which are easy to solve by an Alternating Direction Method of Multiprocessors (ADMM) according to the flow estimation model which is established in the step 3) and is based on time sequence prior and sparse representation, namely: a flow matrix X, a sparse representation matrix W and an error variable C in an iteration process. And (4) finding a local optimal solution of the original problem by iteratively solving the global optimal solution of the subproblem, and finally estimating a complete flow matrix X.

Further, the specific steps of the step 4) are as follows:

step 4.1), in order to conveniently solve the optimization problem, firstly introducing an error variable C and rewriting the error variable C into:

s.t.X≥0,diag(W)＝0,M＝X+C,P _Ω (C)＝0

(5)

step 4.2), in order to solve the optimization problem of step 4.1), two indicator functions are defined as follows:

wherein g (X) and f (C) have the meaning that constraint terms are put into the objective function in order for the variables to satisfy the constraint. Thus, the above optimization problem can be converted into an equivalent penalty function form:

step 4.3), alternately optimizing and solving:

step 4.3.1), for solving X effectively, without introducing variables D and S, let D = X, S = RX, the sub-problem can be transformed into an equivalent constraint optimization problem as follows (for simplicity, W is used here without affecting understanding ^k-1 Abbreviated as W, beta _k Abbreviated as β):

the sub-problem, the corresponding Lagrangian function, can be defined as:

if order

The corresponding optimization steps are:

(1) updating X:

(2) and D, updating:

(3) and (4) updating S:

(4) - (5) update U ₁ 、U ₂ ：

(6) Updating mu _t ：

μ _t ＝min(ρμ _t-1 ,μ _max ) (16)

Step 4.3.2), solving W:

it can be seen that each row of the W matrix is independent, each row can be separated, each subproblem can be regarded as a LASSO problem, and W is set _i I-th column, X, representing W _i I-th column of X, X _-i The matrix obtained by removing the ith column from X is shown. To satisfy the constraint diag (W) =0, obtain W _i At this time, the ith column of X does not participate in the calculation.

Step 4.3.3), solving C:

step 4.3.4), update of β

β _k ＝min(ρβ _k-1 ,β _max ) (20)

And 4.4) reaching the maximum iteration times, and obtaining the estimated complete flow matrix X after the solution is finished.

The algorithm steps for sorting out the traffic estimation based on the time sequence prior and the sparse representation are shown in fig. 2:

in the field of flow estimation, the estimation missing value algorithm has advantages and disadvantages, and the IP network flow estimation method based on time sequence prior and sparse representation can effectively utilize time sequence prior information and sparse representation theory, excavate the inherent space-time correlation characteristic in the flow matrix and improve the accuracy of flow matrix estimation.

Claims

1. An IP network flow estimation method based on time sequence prior and sparse representation is characterized by comprising the following steps:

s2, establishing a flow matrix estimation model aiming at space-time correlation existing in an incomplete flow matrix by utilizing a sparse representation theory and a regularization technology;

s4, iteratively optimizing the global optimal solution of the subproblem to find the local optimal solution of the original problem and estimate a complete flow matrix;

Wherein m is _ij Represents the flow value of the jth OD pair at the ith time interval, "# represents a known flow value,"? "represents missing flow values, in the flow matrix, one column represents one sample, which is the flow value of one OD pair over all time intervals of the day, and one row represents the flow value of all OD pairs over one time interval;

the step S2 comprises the following steps:

firstly, a flow matrix estimation model based on a sparse representation theory is established, and the expression of the model is as follows:

s.t X≥0，diag(W)＝0，P _Ω (M)＝P _Ω (X)

wherein,

in order for the traffic matrix to be known to be incomplete,

for the complete traffic matrix to be solved,

the sparse representation coefficient matrix is needed to be solved, the omega set represents a known flow value element subscript set in the flow matrix, and the lambda set represents a flow value element subscript set ₁ Is an adjustable parameter; p _Ω (. Cndot.) is a projection operator, indicating that when the element index (i, j) ∈ Ω, the corresponding position sample element is obtained:

because the element values in the flow matrix are not negative, X is more than or equal to 0, and in order to avoid trivial solution, the diagonal elements of the constraint sparse representation coefficient matrix W are all 0, namely diag (W) =0;

establishing a flow matrix estimation model based on time sequence prior and sparse representation, wherein the expression is as follows:

s.t.X≥0，diag(W)＝0，P _Ω (M)＝P _Ω (X)

wherein λ is ₂ In order to be able to adjust the parameters,

is a Toeplitz (0,1, -1) matrix;

in step S3, the sub-problems include the traffic matrix X, the sparse representation coefficient matrix W, and the error variable C representing the error of the incomplete traffic matrix M and the complete traffic matrix X outside the set Ω during the iteration process,

the step S4 comprises the following specific steps:

s41, introducing an error variable C for solving conveniently, and rewriting the flow matrix estimation model into the following form:

s.t.X≥0，diag(W)＝0，M＝X+C，P _Ω (C)＝0

the expression of the penalty function is as follows:

where ρ is a fixed parameter, β is a parameter of a penalty function, β _k ，β _k-1 Beta values, beta, in the k-th iteration and the k-1 th iteration, respectively _max Is a fixed parameter representing the maximum value of β, F is the F norm; and carrying out alternate optimization solution on the formula, wherein the flow matrix X when the preset maximum iteration times is reached is the estimated complete flow matrix.

2. IP network based on timing priors and sparse representation according to claim 1The flow estimation method is characterized in that in the solving process of the sparse representation coefficient matrix W, each column is respectively solved, the solving process of each column is regarded as a LASSO problem, and W is set _i I-th column, X, representing W _i I-th column of X, X _-i Representing a matrix obtained by removing the ith column from X, solving for W so as to satisfy the constraint diag (W) =0 _i When X is not involved in the calculation, W _i The expression of (a) is as follows: