CN106685686B

CN106685686B - Network topology estimation method based on simulated annealing

Info

Publication number: CN106685686B
Application number: CN201610817914.6A
Authority: CN
Inventors: 费高雷; 何俊武; 胡光岷; 蒋晴
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2016-09-12
Filing date: 2016-09-12
Publication date: 2020-09-18
Anticipated expiration: 2036-09-12
Also published as: CN106685686A

Abstract

The invention discloses a network topology estimation method based on simulated annealing, which comprises the following steps: s1, carrying out network tomography detection on the network with the known network topology structure; s2, processing the detection result by utilizing a wavelet packet decomposition algorithm to obtain a clustering characteristic; s3, selecting characteristics by using a simulated annealing algorithm; and S4, taking the selected features as the features used in the estimation of all other unknown network topologies, and estimating the network topology by utilizing a coacervation hierarchical clustering method. The invention selects the clustering characteristics by using a random optimization algorithm, namely a simulated annealing algorithm, can select the characteristics most beneficial to topology estimation, and excludes the characteristics of increased topology estimation error caused by the fact that the clustering characteristics are not selected before the topology is estimated by using a coacervation hierarchical clustering method in the traditional network tomography topology detection method.

Description

Network topology estimation method based on simulated annealing

Technical Field

The invention particularly relates to a network topology estimation method based on simulated annealing.

Background

With the rapid development of network technologies such as internet communication and the like, the connection between the life of people and the network is more and more tight, but while the network brings a lot of convenience to users, the scale of the network is larger and larger, the complexity of the network is higher and higher, great difficulty is brought to the guarantee of the service quality of the network, and the performance characteristics of the network are more and more concerned by the users and network supervision departments. However, most of the existing network measurement systems are performed on the premise of knowing the network topology, while the actual network is usually variable, and if the topology structure cannot be accurately inferred, the network cannot be accurately supervised. To this end, scholars have proposed methods for measuring networks by network topology estimation. Network topology estimation has become an important component of modern network management systems, and has a very important position in the scientific development of communication networks.

Network topology estimation refers to the process of searching and capturing certain elements in a network, finding the interrelations among the elements, and then displaying the relationships by using an appropriate topology structure. From the perspective of the particular element we are interested in, network topologies can be divided into physical topologies and logical topologies. The physical topology represents the connection relationship between each entity device in the network, and the logical topology describes how traffic is transmitted in the network. In this regard, network topology estimation refers to measuring the network of interest and inferring its logical topology.

The network tomography technology is a new technology for detecting logic topology in the internet, and is a cross-domain application of the tomography technology. Network tomography is based on an end-to-end technique to obtain information in the network that cannot be directly observed. It considers that the routing nodes inside the network to be probed do not return information to the observer. The method sends and receives detection packets among controllable nodes at the edge of the network to be detected, and the detection packets pass through the network to be detected in the transmission process, so that the detection result reflects the internal characteristics of the network, and the internal structure of the network to be detected can be reversely estimated.

Currently, network tomography mainly consists of two parts: the first part is the collection of probe data, wherein the main research is how to collect relevant useful information inside the network; the second part is statistical reasoning, which mainly finds out the information and rules inside the network according to the collected data.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a network topology estimation method based on simulated annealing, which selects clustering characteristics by using a random optimization algorithm of a simulated annealing algorithm, can select the characteristics most beneficial to topology estimation and excludes the characteristics causing the increase of topology estimation errors.

The purpose of the invention is realized by the following technical scheme: the network topology estimation method based on simulated annealing comprises the following steps:

s1, carrying out network tomography detection on the network with the known network topology structure;

s2, processing the detection result by utilizing a wavelet packet decomposition algorithm to obtain a clustering characteristic;

s3, selecting characteristics by using a simulated annealing algorithm;

and S4, taking the selected features as the features used in the estimation of all other unknown network topologies, and estimating the network topology by utilizing a coacervation hierarchical clustering method.

Further, the specific implementation method of step S2 is as follows: each iteration of wavelet packet decomposition divides a last signal into a low-frequency part and a high-frequency part, wherein S is an original signal, A represents approximate filtering and is regarded as a low-frequency part of a previous-order signal; d represents detail filtering and is regarded as the high-frequency part of the previous-order signal;

wherein y [ n ]]For the signal to be decomposed, g [ n ]]Is a low-pass filter, h [ n ]]Is a high-pass filter, satisfies h [ n ]]＝(-1)ⁿg[1-n]；g[n]A scale function ψ (t) corresponding to wavelet transform, called a scale vector; h [ n ]]Wavelet function corresponding to wavelet transform

And (3) calculating by taking the scale vector corresponding to the db4 wavelet as g [ n ]: performing low-pass and high-pass filtering on the signals respectively, and performing down-sampling to obtain low-frequency and high-frequency wavelet packet decomposition coefficients respectively:

in the same way, for Y_1，0[n]And Y_1，1[n]Performing the same treatment to obtain 4 groups of wavelet packet decomposition coefficients of the second layer, and repeating the steps to obtain any layerWavelet packet decomposition coefficients of an arbitrary number of layers; the recurrence formula is:

original signal X for each destination node using 4-layer wavelet packet decomposition_iProcessing to obtain 16 groups of wavelet packet decomposition coefficients, and recording as Y_i，jWhere i is 0, 1.. multidot.n-1 is the destination node number, j is 0, 1.. multidot.15 is the wavelet packet decomposition coefficient number, X_iAnd Y_i，jAre all vectors.

Further, the step S3 includes the following sub-steps:

s31, selecting several groups at random from the 16 groups of wavelet packet decomposition coefficients obtained in S2 as the first estimated features, and setting the feature selection state vector F to (F)₁，f₂，...，f₁₆) Is shown in which f₁，f₂，...，f₁₆∈ {0, 1}, a value of 0 indicating that the feature was not selected, and 1 indicating that the feature was selected;

let R be result set, used for depositing the key value pair that the result of characteristic choice and its error rate make up, R { }underthe initialization; initialization temperature T ═ T₀2m, wherein m is the number of destination nodes;

setting an upper limit L of the iteration times at the same temperature to be 20, initializing an iteration counter L to be 0, initializing an unaccepted new solution time t to be 0, and cooling the time n to be 0;

s32, carrying out topology estimation based on coacervation hierarchical clustering on the sample according to F, calculating error rate e (F) of the topology estimation result by using the tree editing distance, updating R ═ R { (F): e (f);

s33, when L is less than L, executing step S34, otherwise jumping to step S39;

s34, changing the selection state of the t-th feature in the F, and keeping the selection states of other features unchanged to obtain a new group of solutions F'_i＝(f′_i，1，f′_i，2，...，f′_i，16) Which satisfies:

let i take all values through, resulting in 16 new sets of solutions: f'₁，F′₂，...，F′₁₆；

S35, judging whether the new solution exists in R or not, and executing the method same as the step S32 on the solution which does not exist in R to obtain e (F'_i) Update R ═ R ∪ { F'_i：e(F′_i) Get e^*＝min(e(F′_i))＝e(F′_m) And (b) corresponding F'_m；

S36, let l be l +1, calculate Δ e be e^*-e (F) accepting a new solution F 'with probability p according to the metropolis criterion'_mP is calculated as follows:

if new solution is received, let F ═ F'_m，e(F)＝e^*When t is 0, go to step S33; otherwise, carrying out S37 when t is t + 1;

s37, when L is less than L, executing step S38, otherwise jumping to step S39;

s38, changing the existing selection state with a probability of 50% for each feature in F, obtaining a new set of solutions F'_mThe same method as step S32 is performed to obtain e^*＝e(F′_m) Update R ═ R ∪ { F'_m：e(F′_m) Jumping to step S36;

s39, let n be n +1,

l is 0; when t is less than 10 and n is less than 10, jumping to step S33; otherwise, selecting the optimal solution F from R^*As a result of the feature selection, the algorithm ends.

Further, the specific implementation method of the topology estimation based on the agglomerative hierarchical clustering in step S32 is as follows:

s321, defining the distance between each cluster sample, and obtaining a distance matrix; for each destination node i, the selected characteristics are set as

Wherein k is not less than 0₁＜k₂＜…＜k_nK is not more than 15₁，k₂，...k_n∈ N, splicing the features into a new vector

For any two destination nodes i, j, solving the characteristic vector Y of the destination nodes i, j_iAnd Y_jCorrelation coefficient between:

wherein

And

respectively represent Y_iAnd Y_jThe mathematical expectation of (1) is mean:

defining the correlation distance between the nodes i, j as:

d_i，j＝1-|ρ_i，j|

when the correlation of the feature vectors of two destination nodes i, j is stronger, the absolute value rho of the correlation coefficient is_i，jThe closer to 1, the relative distance d between the two_i，jThe closer to 0;

obtaining a correlation distance matrix among all destination nodes:

d_i，j＝d_j，i(i, j ═ 1, 2.., n) and d_1，1＝d_2，2＝…＝d_n，n0; so the matrix D is a symmetric matrix with 0 diagonal;

s322, hierarchical clustering is carried out to obtain topology, and the method comprises the following steps:

s3221, during initialization, each sample is classified into one type, n types in total, and marked as G₁，G₂，...，G_nLet it form N leaf nodes of the cluster tree, denoted as N₁，N₂，...，N_n(ii) a Each node is assigned a weight w₁，w₂，...，w_nLet w₁＝w₂＝…＝w_nThe distance matrix between classes is G-D; any two kinds of G_i，G_jThe distance between is g_i，j＝d_i，jInitializing the clustering frequency l to be 1;

s3222, searching for minimum distance G in G_s，t＝min_i≠j(g_i，j) Will correspond to two classes G_s，G_tAre aggregated into a new class G_n+lConstructing a new node N_n+lAs N_s，N_tThe parent node of (2) having its weight value w_n+l＝g_s，t；

S3223, calculating the new class G by using the formula_n+lDistance to other classes, two classes G_p，G_qThe distance between them is calculated by the formula:

s3224, wherein n_p，n_qAre respectively class G_p，G_qThe number of samples in (1);

s3225, updating the distance matrix G between classes: neutralizing G with G_s，G_tThe associated rows and columns are eliminated and finally a new class G is added_n+lThe corresponding value is the distance from the new class to other classes;

s3226, updating the clustering frequency l ═ l + 1;

s3227, and repeating S3222 to S3225 until only one class G remains_2n-1。

Further, in the clustering process of step S322, in each round, two classes are merged into a new class until only one class remains, and a weighted binary tree with a node number of 2n-1 is generated; the nodes on the binary tree are merged according to the following rules: setting a threshold t equal to 0.02, and setting a slave node N_n+1Start to N_2n-2If there is one node N_iAnd its parent node N_jIs satisfied with

Then N will be_iAnd N_jMerge, i.e. N_iIs changed into N_jPost-delete N_iAnd finally, the obtained tree is used as the network logic topology of the estimation.

Further, the specific implementation method of calculating the error rate of the topology estimation result by using the tree edit distance in step S32 is as follows:

three tree editing operations are defined:

and (3) changing the node label: defining labels of leaf nodes of the tree network topology as respective corresponding numbers, and leaving labels of other nodes as null; setting the cost r (a → b) of changing the node label from a to b as 2m, wherein m is the number of target nodes in the network detection process, namely the number of leaf nodes of the tree;

and deleting the nodes: deleting a non-root node v in the tree T, setting a father node of the non-root node v as v ', and changing the father node of a child node of v into v'; setting the cost r (a → Λ) of the deleted node as 1, wherein a is a label of the node to be deleted, and Λ represents a null node;

adding nodes: the newly added node v is used as a child node of the node v ', and the father node of partial child nodes of the node v' is changed into v; setting the cost r (Λ → a) of the node to be added as 1, wherein Λ represents a null node, and a is a label of the node to be added;

let E be the Slave Tree T₁To tree T₂Contains a number of tree editing operations e₁，e₂，...，e_n(ii) a Memory T₁Conversion to T₂Total cost of (e) ═ r (e)₁)+r(e₂)+…+r(e_n) (ii) a The tree edit distance is min (r (E)), i.e. T₁Conversion to T₂The minimum total cost of;

to find the tree edit distance, the problem is converted to find two trees T₁And T₂Maximum matching subtree problem between: let the common subtree be M, T₁And T₂The matching relationship of (M, T)₁，T₂) Is of T₁But node set not belonging to M is N₁Is of T₂But node set not belonging to M is N₂，T₁The label of the middle node i is marked as l₁(i)，T₂The label of the middle node i is marked as l₂(i) (ii) a Then there are:

by dynamic programming algorithm, r ((M, T) is obtained₁，T₂) ) the tree edit distance.

The invention has the beneficial effects that:

1. the invention selects the clustering characteristics by using a random optimization algorithm, namely a simulated annealing algorithm, can select the characteristics most beneficial to topology estimation, and excludes the characteristics that the clustering characteristics are not selected before the topology is estimated by using a coacervation hierarchical clustering method in the traditional network tomography topology detection method, so that the error of topology estimation is increased;

2. in the process of selecting characteristics by using a simulated annealing algorithm, the invention uses two methods for obtaining new solutions, wherein the first method can quickly find a local optimal solution in a certain range, and the second method can jump out of the range to find optimal solutions in other ranges.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of a wavelet packet decomposition process;

fig. 3 is a flowchart of wavelet packet decomposition computation.

Detailed Description

The invention uses wavelet packet decomposition to process the time delay change curve of the original input signal, namely the source node to each destination node, and obtains the time frequency characteristics of the signal on different frequency bands. By comparing the correlation among wavelet packet decomposition coefficient vectors on different paths, the shared length depiction among different destination nodes can be obtained, and further the network logic topology is obtained by a coacervation hierarchical clustering method.

In principle, the more the changes of the transmission delay in each time-frequency domain are grasped, the more clearly the similarity of the paths from the source node to the different destination nodes can be delineated. However, in practice, not all the wavelet packet decomposition coefficients are good for obtaining correct results, where there is noise data, for example, the low-frequency dc component is affected by the actual physical link length, and if the ratio of the length of the non-shared path is high, the determination of the length of the shared path is seriously affected. In order to eliminate the negative influence of partial features on the topology estimation as much as possible and improve the accuracy of the topology estimation as much as possible, feature selection needs to be performed on the coefficients. On the basis of improving the accuracy of topology estimation, the reduction of the characteristic number also reduces the subsequent calculation amount.

To evaluate whether the selected features can accurately estimate the topology, a known network topology is required as a reference. The method of traceroute and the like can be used for obtaining a more accurate topological structure. After the topology is obtained, tomography detection is carried out on the network, the topology obtained through tomography is compared with the topology obtained through other modes, a feature set enabling the two topologies to be most similar is found out and serves as a feature set in later practical application, and the feature set is used for detecting and topology estimating on the network with unknown topology.

The technical scheme of the invention is further explained by combining the attached drawings.

As shown in fig. 1, the network topology estimation method based on simulated annealing includes the following steps:

s2, under the actual network environment, the network flow has high burstiness, which causes more medium-high frequency components in the time delay curve, in order to more finely acquire the time-frequency characteristics on the high-frequency components, the invention utilizes the wavelet packet decomposition algorithm to process the detection result to obtain the clustering characteristics; the specific implementation method comprises the following steps: as shown in fig. 2, each iteration of wavelet packet decomposition subdivides the previous signal into two parts, namely a low frequency part and a high frequency part, and the frequency band is subdivided the more the iteration times are, wherein S is the original signal, and a represents approximate (approximation) filtering and is regarded as the low frequency part of the previous-order signal; d represents detail (detail) filtering, which is regarded as the high-frequency part of the previous-order signal;

the calculation flow of wavelet packet decomposition is shown in FIG. 3, where y [ n ]]For the signal to be decomposed, g [ n ]]Is a low-pass filter, h [ n ]]Is a high-pass filter, satisfies h [ n ]]＝(-1)ⁿg[1-n]；g[n]A scale function ψ (t) corresponding to wavelet transform, called a scale vector; h [ n ]]Wavelet function corresponding to wavelet transform

in the same way, for Y_1，0[n]And Y_1，1[n]Performing the same treatment to obtain 4 groups of wavelet packet decomposition coefficients of the second layer, and so on to obtain wavelet packet decomposition coefficients of any number of layers; the recurrence formula is:

original signal X for each destination node using 4-layer wavelet packet decomposition_iProcessing to obtain 16 groups of wavelet packet decomposition coefficients, and recording as Y_i，jWhere i is 0, 1.. multidot.n-1 is the destination node number, j is 0, 1.. multidot.15 is the wavelet packet decomposition coefficient number, X_iAnd Y_i，jAre all vectors. Since the 16 sets of wavelet packet decomposition coefficients are not all capable of correctly reflecting the correlation between nodes, the 16 sets of wavelet packet decomposition coefficients are subjected to feature selection to obtain several sets of coefficients with the best effect.

S3, selecting characteristics by using a simulated annealing algorithm, and judging whether the characteristics are selected by using the error between the estimated topology and the known topology; the method comprises the following substeps:

the specific implementation method of the topology estimation based on the coacervation hierarchical clustering comprises the following steps:

wherein

And

respectively represent Y_iAnd Y_jThe mathematical expectation of (1) is mean:

defining the correlation distance between the nodes i, j as:

d_i，j＝1-|ρ_i，j|

when the correlation of the feature vectors of two destination nodes i, jThe stronger the absolute value ρ of the correlation coefficient_i，jThe closer to 1, the relative distance d between the two_i，jThe closer to 0;

obtaining a correlation distance matrix among all destination nodes:

s3226, updating the clustering frequency l ═ l + 1;

s3227, and repeating S3222 to S3225 until only one class G remains_2n-1；

In the clustering process, two classes are merged into a new class in each round until only one class remains, and a weighted binary tree with the node number of 2n-1 is generated; the nodes on the binary tree are merged according to the following rules: setting a threshold t equal to 0.02, and setting a slave node N_n+1Start to N_2n-2If there is one node N_iAnd its parent node N_jIs satisfied with

The specific implementation method for calculating the error rate of the topology estimation result by using the tree edit distance comprises the following steps:

three tree editing operations are defined:

S33, when L is less than L, executing step S34, otherwise jumping to step S39;

s34, changing the selection state of the ith feature in the F, and keeping the selection states of other features unchanged to obtain a new group of solutions F'_i＝(f′_i，1，f′_i，2，...，f′_i，16) Which satisfies:

let iAll values were taken over to obtain 16 new solutions: f'₁，F′₂，...，F′₁₆；

s37, when L is less than L, executing step S38, otherwise jumping to step S39;

s39, let n be n +1,

And S4, taking the selected features as the features used for estimating all other unknown network topologies, and estimating the network topology by using the same aggregation level clustering method as that in the step S32.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. The network topology estimation method based on simulated annealing is characterized by comprising the following steps of:

s2, processing the detection result by utilizing a wavelet packet decomposition algorithm to obtain a clustering characteristic; the specific implementation method comprises the following steps: each iteration of wavelet packet decomposition divides a last signal into a low-frequency part and a high-frequency part, wherein S is an original signal, A represents approximate filtering and is regarded as a low-frequency part of a previous-order signal; d represents detail filtering and is regarded as the high-frequency part of the previous-order signal;

in the same way, for Y_1，0[n]And Y_1，1[n]Performing the same treatment to obtain4 groups of wavelet packet decomposition coefficients of the second layer are obtained by analogy, and wavelet packet decomposition coefficients of any number of layers are obtained; the recurrence formula is:

original signal X for each destination node using 4-layer wavelet packet decomposition_iProcessing to obtain 16 groups of wavelet packet decomposition coefficients, and recording as Y_i，jWhere i is 0, 1.. multidot.n-1 is the destination node number, j is 0, 1.. multidot.15 is the wavelet packet decomposition coefficient number, X_iAnd Y_i，jAre all vectors;

s3, selecting characteristics by using a simulated annealing algorithm; the method comprises the following substeps:

s321, defining between each cluster sampleDistance, obtaining a distance matrix; for each destination node i, the selected characteristics are set as

wherein

And

respectively represent Y_iAnd Y_jThe mathematical expectation of (1) is mean:

defining the correlation distance between the nodes i, j as:

d_i，j＝1-|ρ_i，j|

obtaining a correlation distance matrix among all destination nodes:

s3222, searching for minimum distance G in G_s，t＝min_i≠j(g_i，j) Will correspond to two classes G_s，G_tAre aggregated into a new class G_n+lConstructing a new node N_n+1As N_s，N_tThe parent node of (2) having its weight value w_n+l＝g_s，t；

s3226, updating the clustering frequency l ═ l + 1;

s3227, and repeating S3222 to S3225 until only one class G remains_2n-1；

three tree editing operations are defined:

by dynamic programming algorithm, r ((M, T) is obtained₁，T₂) Minimum of) i.e. tree edit distance;

s33, when L is less than L, executing step S34, otherwise jumping to step S39;

s37, when L is less than L, executing step S38, otherwise jumping to step S39;

s38, changing the existing selection state with a probability of 50% for each feature in F, obtaining a new set of solutions F'_mThe same method as that of step S32 is performed to obtainTo e^*＝e(F′_m) Update R ═ R ∪ { F'_m：e(F′_m) Jumping to step S36;

s39, let n be n +1,

l is 0; when t is less than 10 and n is less than 10, jumping to step S33; otherwise, selecting the optimal solution F from R^*Ending the algorithm as a result of the feature selection;

2. The simulated annealing-based network topology estimation method according to claim 1, wherein in the step S322, in the clustering process, each round merges two classes into a new class until only one class remains, and generates a weighted binary tree with a node number of 2 n-1; the nodes on the binary tree are merged according to the following rules: setting a threshold t equal to 0.02, and setting a slave node N_n+1Start to N_2n-2If there is one node N_iAnd its parent node N_jIs satisfied with