CN112688813B

CN112688813B - Routing node importance ordering method and system based on routing characteristics

Info

Publication number: CN112688813B
Application number: CN202011545499.6A
Authority: CN
Inventors: 陶致远; 刘粉林; 刘琰; 罗向阳; 魏亮
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2022-07-15
Anticipated expiration: 2040-12-24
Also published as: CN112688813A

Abstract

The invention belongs to the technical field of routing node positioning, in particular to a routing node importance sorting method and a system based on routing characteristics, which are used for searching backbone nodes in a target network area and comprise the following steps: acquiring path information among nodes in a target network area through network measurement, and constructing a network topology; based on the stability of the paths among the nodes of the network topology, pruning the network topology to obtain a topology network structure only containing stable paths; counting the number of stable paths between each adjacent node in the topological network structure, and weighting each corresponding edge in the topological structure according to the counting result; and sequencing the routing nodes of the target network according to the weighted centrality. The invention can effectively reduce the data processing amount, can obviously distinguish the action of the edge in the network in the actual communication, and is convenient for accurate positioning to discover more backbone nodes.

Description

Routing node importance ordering method and system based on routing characteristics

Technical Field

The invention belongs to the technical field of routing node positioning, and particularly relates to a routing node importance ranking method and system based on routing characteristics.

Background

The expansion of the network scale brings unprecedented pressure to the operation and maintenance of the network. Due to limited resources, network operation and maintenance personnel pay more attention to important nodes in the network when guaranteeing the network service quality. The node importance ranking of the internet can grade the routing nodes according to importance degrees, help network operation and maintenance personnel to optimize operation and maintenance strategies, improve operation and maintenance efficiency and prevent catastrophic failures. In addition, the node importance ranking work can also provide reference for optimizing the existing network protocol, and help the network recover more quickly and efficiently after the node fails. The existing node importance sequencing work already achieves an excellent effect on a static theoretical network model, but in the application of the actual internet, the real situation of the importance degree of the node cannot be reflected accurately in real time. The method has very important practical significance on efficiently and accurately sequencing the importance of the nodes in the Internet. Traditional node sorting algorithms such as Degree Centrality (DC), aggregation Coefficient (Clustering Cooefficient), k-shell decomposition (k-shell decomposition), Closeness Centrality (CC), Betweenness Centrality (BC) and the like play a great role in the existing research, but the methods are mostly statistics of the macroscopic laws of the graph, cannot be combined with a specific network, have higher calculation cost of part of algorithms, and cannot be applied to a real large-scale network.

In recent years, two research directions are differentiated on the basis of the traditional method in the node sequencing research, one is the improvement of the calculation efficiency of the traditional method, and the calculation cost is reduced, so that the node sequencing research can be applied to a large-scale network; another is a node ordering algorithm designed in conjunction with the network's own characteristics. The first method considers that part of sequencing indexes in the traditional method can still be applied to a real network, but the computation complexity is too high to be applied to a large-scale network, so that a series of algorithms are provided on the basis of the traditional indexes to reduce the computation cost; because the calculation complexity of the betweenness centrality is higher and the methods have stronger representativeness and practical significance, the methods mostly optimize the efficiency of the methods on the basis of the betweenness centrality. The second method combines the concrete characteristics of the actual network to provide a series of sequencing indexes, such as in the internet field, flow load centrality, and combines the network flow and betweenness centrality on the basis of only considering the shortest path; on the basis of betweenness centrality, the method combines the characteristics of paths, flow, protocols and the like in the network to sequence the nodes, but the method still does not fully utilize the routing characteristics of the network.

Disclosure of Invention

Therefore, the invention provides a routing node importance sorting method and a system based on routing characteristics, which can effectively reduce the processed data volume, can obviously distinguish the functions of the edges in the network in actual communication, and is convenient for accurate positioning to discover more backbone nodes.

According to the design scheme provided by the invention, the routing node importance ordering method based on the routing characteristics is used for searching the backbone nodes in the target network area and comprises the following contents:

acquiring path information among nodes in a target network area through network measurement, and constructing a network topology;

based on the stability of the path between the network topology nodes, pruning the network topology to obtain a topology network structure only containing stable paths;

counting the number of stable paths between each adjacent node in the topological network structure, and weighting each corresponding edge in the topological structure according to the counting result; and sequencing the routing nodes of the target network according to the weighted centrality.

As the method for sorting the importance of the routing nodes based on the routing characteristics, a plurality of detection sources are arranged inside and outside a target network area to detect IP nodes in the target network area so as to acquire the path information among the nodes.

As the routing node importance ordering method based on routing characteristics, further, a target IP set is obtained according to an IP address segment allocated to a target network area in an IP address database; and carrying out multiple rounds of continuous network measurement on the target IP set by utilizing a plurality of probe sources to acquire path information for constructing the network topology.

As the routing node importance ordering method based on routing characteristics, the IP address field of the target network area is further divided according to the prefix of 24, and one or more IP addresses are selected under each prefix of 24 to form a target IP set.

As the routing node importance ordering method based on routing characteristics, the method further comprises the step of counting the number of newly added nodes and newly added path data for each round of detection in a plurality of rounds of continuous network measurement, if the number of the newly added nodes and the newly added path data are lower than a set threshold value in a plurality of consecutive rounds of detection, ending the current round of detection, and otherwise, carrying out a new round of detection.

As the routing node importance ordering method based on routing characteristics, further, according to a path set between two nodes, a path with the most occurrence times in the set is taken as a stable path of the two paths during network measurement, and other paths in the set are taken as standby paths; and deleting the standby path in pruning the network topology, and reserving the stable path.

In the method for ranking the importance of the routing nodes based on the routing characteristics, the number of stable paths passing through one edge is used as the weight of the edge in the topological network structure, and each edge in the topological structure is weighted.

Further, based on the above method, the present invention further provides a routing node importance ranking system based on routing characteristics, for searching backbone nodes in a target network region, including: a topology construction module, a topology pruning module and a node sequencing module, wherein,

the topology construction module is used for acquiring the path information among the nodes of the target network area through network measurement and constructing network topology;

the topology pruning module is used for pruning the network topology based on the stability of the path between the network topology nodes to obtain a topology network structure only containing a stable path;

the node sequencing module is used for counting the number of stable paths between each adjacent node in the topological network structure and weighting each corresponding edge in the topological structure according to the counting result; and sequencing the routing nodes of the target network according to the weighted centrality.

The invention has the beneficial effects that:

aiming at the problems that in the existing node importance ordering research, the existence of a stable path between actual network nodes is not considered, the importance of internet nodes is difficult to accurately evaluate and the like, a preliminary network topology is constructed by acquiring path information between target area nodes through network measurement, and an alternative path is removed from the obtained network topology based on the routing characteristic that the stable path often exists between the network nodes, so that the network topology only containing the stable path is obtained; weighting sides by counting the number of stable paths passing through adjacent nodes, and sequencing the importance of the nodes of the target network by using the centrality of the weighting degree; the method can effectively find out important routing nodes in the target area network, and obviously improve the accuracy of node sequencing. And further, the reasonable effectiveness of the scheme is verified through experimental data, so that the method has a good application prospect.

Description of the drawings:

FIG. 1 is a schematic diagram of an importance ranking method of routing nodes based on routing characteristics in an embodiment;

FIG. 2 is a schematic diagram of an algorithm principle of an importance ranking of routing nodes based on routing characteristics in the embodiment;

FIG. 3 is a schematic diagram of a network topology pruning flow in an embodiment;

FIG. 4 is a schematic flow chart of calculating edge weights based on paths in the embodiment;

FIG. 5 is a diagram illustrating a steady path fraction statistical result of a target IP in an embodiment;

FIG. 6 is a schematic diagram showing a comparison between an unauthorized topology and a weighted topology in the embodiment;

FIG. 7 is a comparison of the effects of deleting IP2 and IP3 in the examples;

FIG. 8 is a schematic cross-ISP communication under a layered network architecture in an embodiment;

FIG. 9 is a comparison of the accuracy of the pre-and post-pruning sorting in the examples;

FIG. 10 is a diagram illustrating an RWDC ranking result of a routing node importance ranking algorithm in an embodiment;

FIG. 11 is a comparative illustration of the results of the RWDC and the prior DC, BC, RBC algorithms in the example;

fig. 12 is a schematic diagram of the statistical result of the number of backbone nodes with top ranking results in the embodiment.

The specific implementation mode is as follows:

in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.

Due to the existence of strategies such as redundant routing, load balancing and the like in the internet, the deviation between a path in actual network communication and a theoretical path is large, so that the existing node importance ranking method is not accurate enough when being used for the actual internet. To this end, an embodiment of the present invention, as shown in fig. 1, provides a routing node importance ranking method based on routing characteristics, which is used for searching a backbone node in a target network area, and includes the following contents:

s101, obtaining path information among nodes in a target network area through network measurement, and constructing a network topology;

s102, based on the stability of the path between the network topology nodes, pruning the network topology to obtain a topology network structure only containing stable paths;

s103, counting the number of stable paths between adjacent nodes in the topological network structure, and weighting corresponding edges in the topological structure according to the counting result; and sequencing the routing nodes of the target network according to the weighted centrality.

Acquiring path information among nodes in a target area through network measurement to construct a primary network topology, and removing alternative paths from the obtained network topology based on the routing characteristic that stable paths often exist among the network nodes to obtain the network topology only containing the stable paths; weighting is carried out on the edges by counting the number of stable paths passing through adjacent nodes, and the importance of the nodes of the target network is sequenced by utilizing the centrality of the weighting degree; the method can effectively find out the important routing nodes in the target area network, and obviously improve the accuracy of node sequencing.

As the method for ranking importance of routing nodes based on routing characteristics in the embodiment of the present invention, further, a plurality of probing sources are arranged inside and outside the target network area to probe IP nodes in the target network area to obtain inter-node path information.

When the target IP is detected by only adopting a single detection source, the detection result is easy to have spatial offset and has contingency. Thus, n can be selected_IA detection source located inside the target area A, and n_oThe detection sources positioned outside the target area form a detection source set P_V。

As a method for sorting importance of routing nodes based on routing characteristics in the embodiment of the present invention, a target IP set is further obtained according to an IP address segment allocated to a target network area in an IP address database; and carrying out multiple rounds of continuous network measurement on the target IP set by utilizing a plurality of probe sources to acquire path information for constructing the network topology. Furthermore, the IP address field of the target network area is divided according to the prefix of 24, and one or more IP addresses are selected under each prefix of 24 to form a target IP set.

Routing rules in the internet are difficult to obtain directly, but the routing rules determine that a stable path exists between two nodes in the network, so that the stable path in the network can be obtained through network measurement. The IP address section S of the target city can be screened from six IP address libraries (Maxmind, Quova, IP2location, Whois, IPIP, IPPlus and IPcn)_AThe screening method is to reserve IP sections which are at least present in more than three IP address libraries. Existing network topology measurement methods are classified into two categories: and detecting the total IP of the target area and extracting the IP according to the network segment. The most complete network topology can be obtained by detecting the total amount of IP, but the time cost is higher, and the detection with larger time span cannot really reflect the whole condition of the target area network because the network is dynamically changed. The method for extracting IP detection according to network segments is to consider that IP under the same network segment often has similarity on the settings of routing strategies and the like, so that detection of one or a plurality of IP addresses in the network segment can be used for replacing detection of full IP so as to improve the detection efficiency. Meanwhile, in the node importance ranking of the internet, the importance degree of the routing nodes inside the/24 prefix is lower relative to the whole area.

Referring to fig. 2, an IP address section S assigned to the target area a is acquired from the IP address database D_A(ii) a In each IP address field, selecting IP according to/24 prefix to form target IP set P_T. Using a Probe Source set P_VFor target IP set P_TAnd carrying out multiple rounds of continuous and high-frequency network measurement, and acquiring network topology information such as paths, time delay and the like. Extracting a node set P positioned in the target area according to the detection result_A(ii) a Performing alias analysis to obtain a node set P_C(ii) a Weighting edges according to path distribution conditions to construct network extension with right of target area AFig. G. Finding out main path R in network according to route forwarding rule_M(ii) a Pruning the network topological graph G based on the main path, updating the weight of the edge in the graph, and obtaining the topological graph G only retaining the main path_M。

The specific Algorithm is as shown in the following table, namely Algorithm 1, network measurement is carried out on a target area to obtain path information among all nodes in the target area, and a preliminary topology is constructed (see lines 1-4); then pruning the topology by rejecting the alternative paths in the topology to obtain the topology only retaining the stable path (see lines 5-12); secondly, counting the number of stable paths passing through edges between adjacent nodes in the topology, and giving weight to the edges in the graph (see lines 13-16); finally, the importance ranking is performed according to the routing weighting degree of the nodes (see lines 17-19).

The centrality of the routing weight degree reflects the role played by the node in network communication, and the calculation complexity is O (mn). Node P_iThe calculation formula of the centrality of the route weighting degree is as follows:

wherein, N_iIs node P_iOf all neighbor nodes, R_GIs the set of all actual communication paths in the figure, card is the number of elements in a set, w (e)_ij) Is an edge e_ijIf the path r passes through the node P_iAnd P_jOn the other hand, δ (r, i, j) is equal to 1, otherwise δ (r, i, j) is equal to 0.

As a method for ranking importance of routing nodes based on routing characteristics in the embodiment of the present invention, further, in a plurality of rounds of continuous network measurements, for each round of detection, by counting the number of newly added nodes and newly added path data, if both of the number of continuously added nodes and the newly added path data are lower than a set threshold, the current round of detection is ended, otherwise, a new round of detection is performed.

Using selected probing sources P_VFor target IP set P_TAnd carrying out multi-source, continuous and high-frequency network detection. And the target IP set can be subjected to network detection by using a plurality of detection sources positioned inside and outside the target city at the same time. And continuously, setting a detection termination condition, and continuously performing network detection on the target IP set before the detection termination condition is reached. And the high frequency can perform a new round of detection on the target city every h hours by setting the detection time interval to be h hours. The network detection process is carried out under different time periods and different time delay conditions, and paths through which communication between nodes passes under different network conditions can be obtained, so that more accurate topological information of a target city is obtained. The network probing termination condition may be set as: in each round of network detection, counting the number of newly added nodes detected in the round q

And the number of newly added paths

If the number of the newly added continuous r wheels is lower than a preset threshold value K_nAnd K_rIf yes, ending the detection; otherwise, a new round of detection is performed.

As a routing node importance ranking method based on routing characteristics in the embodiment of the present invention, further, according to a path set between two nodes, a path with the highest occurrence frequency in the set is used as a stable path of the two paths during network measurement, and other paths in the set are used as standby paths; and deleting the standby path in pruning the network topology, and reserving the stable path.

By analyzing the detection result of the target area network, a stable path exists in the communication between the nodes in the internet. On the basis, the efficiency and the accuracy of node importance sequencing are improved by a topology pruning method based on a stable path.

A stable path between nodes in a network refers to the path most commonly traversed by communication between nodes. There are a large number of routers in the internet, and the routing rules configured by these routers are difficult to directly obtain, but the routing rules determine that a stable path exists for communication in the network. Through extensive probing and statistical analysis, those stable paths can be found, and the remaining paths are referred to as backup paths. And pruning the network topological graph G of the target area, deleting all standby paths and only reserving stable paths.

Will P_iAnd P_jAmong paths between two nodes, the path with the most occurrence frequency is regarded as P_iAnd P_jA stable path therebetween. The set of paths between nodes is denoted as R:

wherein R (i, j) represents a node P_iAnd P_jSet of probing paths between, P_AA set of nodes representing a target area. The number of occurrences of a path r is denoted as N_rThen P is_iTo P_jSet of stable paths R_M(i, j) is:

R_M(i,j)＝argmax_r N_r,r∈R(i,j) (3)

will P_iTo P_jAnd the number of times of stable path occurrence and arrival at the target node P_jIs called P_iTo P_jStable path ratio of f_M(i, j), the formula is as follows:

aggregating R stable paths in a network topology of a target area_MReserving and deleting all other paths to obtain a topological graph G after pruning_M＝(V_M,E_M) In which V is_MIs from R_MSet of extracted nodes, E_MIs from R_MThe extracted edge set.

As shown in fig. 3, at IP1(211.149.219.168) to target IP₁₁(202.97.19.46) of432 sounding paths can be extracted from the sounding results, which include 2 different paths: the path 1 is: IP (Internet protocol)₁-IP₂-IP₃-IP₅-IP₆-IP₇IP₈-IP ₁₁1 occurrence; the path 2 is: IP (Internet protocol)₁-IP₂-IP₄-IP₅-IP₉-IP₁₀-IP₁₁And 431 occurrences. The most frequent occurrence was Path 2, with a ratio of 99.77% to the total number. Hence, call Path 2 to destination IP₁₁(202.97.19.46) stable path to IP₁₁The stable path occupation ratio of (2) is 0.9977, and path 1 is deleted and only path 2 is reserved when network topology pruning is performed. If the two paths occur the same number of times, then both paths are considered stable paths to the IP.

As the method for ranking importance of routing nodes based on routing characteristics in the embodiment of the present invention, further, in the topology network structure, the number of stable paths passing through an edge is used as the weight of the edge, and each edge in the topology structure is weighted.

The existing research on node importance ranking, such as degree centrality, betweenness centrality, routing betweenness centrality, etc., often performs node ranking on a fixed network topology model, and usually looks at the same side between node pairs in the topology. However, in an actual network, edges between different node pairs play different roles, and influence on connected nodes is different, so that different weights need to be given to the edges between different node pairs. The number of stable paths through an edge is used as the weight of the edge. Edge e_i,jThe calculation formula of the weight is as follows:

wherein e is_i,jAs a neighboring node P_iAnd P_jM is a side of R_MIs a set of primary paths in the actual communication, P_rIs the set of all nodes on path R, when R_MPath r in (1) passes through edge e_ijWhen δ (r, i, j) is 1, otherwise it is 0.

As shown in fig. 4, in actual communication, the main path through the edges (IP4-IP5) has three in total:

IP1-IP2-IP4-IP5-IP9-IP10-IP11,

IP12-IP13-IP4-IP5-IP6-IP7-IP8-IP11,

IP1-IP2-IP4-IP5-IP6-IP7-IP8-IP11。

so the weight w (e) of the edge (IP4-IP5)_4,5)＝3。

Further, based on the foregoing method, an embodiment of the present invention further provides a routing node importance ranking system based on routing characteristics, configured to search for backbone nodes in a target network region, where the system includes: a topology construction module, a topology pruning module and a node sequencing module, wherein,

Existing node importance ranking studies will typically add all existing nodes and edges to the network topology. The scheme analyzes the measurement result and then verifies a routing characteristic: a stable path exists for communications between nodes in the network. Therefore, the topological graph can be pruned based on the stable paths, and only the stable paths in actual communication are reserved in the finally constructed network topological graph.

In the present case, 28,987,966 detection results and 168,594 detection results are obtained from the detection results of the first 40 days of the three cities. And (3) respectively searching stable paths for target IPs responding to the three cities, and counting the stable path ratios, wherein the result is shown in FIG. 5, the abscissa is the number of the target IP, and the ordinate is the stable path ratio of the target IP, and as can be seen from the figure, the stable path ratios of the target IPs are basically more than 0.6, and the stable path ratios of most target IPs are 1. This illustrates that a stable path does generally exist for communications between two different nodes in the internet.

In addition, the scheme counts the response detection results of 5,484,648, 14,977,107 and 8,526,211 target cities, and the proportion of paths which are stable paths is shown in table 1:

TABLE 1 proportion of the Stable Path in the Probe results

As shown in the above table, it can be seen that the paths are stable paths in the detection results of zheng, hang and junior, respectively, in a proportion of 83.1%, 86.1%, 85.5%, and the number of stable paths is only 47% of the total number of paths. This result further demonstrates the existence of a stable path.

Due to limited network resources, network operation and maintenance personnel need to perform hierarchical management on the routing nodes to ensure the service quality. During network operation, nodes in a stable path should have higher priority than nodes in an unstable path. Therefore, by the network topology pruning method based on the stable path, the node sorting scale can be reduced, the node sorting efficiency is improved, and a more accurate node sorting result can be obtained.

Taking the data of the first 40 days of the Chengdu area as an example, the network scale and the node sequencing accuracy before and after pruning are compared, as shown in tables 2 and 3.

TABLE 2 comparison of network size before and after pruning

TABLE 3 comparison of accuracy of the front and rear sorting of the pruning

As can be seen from tables 2 and 3, pruning can reduce about 3% of nodes, 52% of edges, and 50% of paths, greatly reduce the scale of node sorting, and improve the efficiency of node sorting, on this basis, the sorting result after pruning can include 10 country-level backbone nodes in the first 10, which is improved by 30% compared with the network before pruning, and thus, it can be seen that the accuracy of sorting is obviously improved by pruning the network topology based on the stable paths, and the time required for sorting is greatly reduced.

In the existing work of node importance ranking, the research object is often a fixed network topology model, and the edges between node pairs in the topology are generally considered to be equally important, that is, only the existence or nonexistence of the two states. In the scheme, different weights are given to the edges in the topological graph according to the number of stable paths passing through a certain edge in the actual communication of the network.

Due to the existence of the routing rules, the number of paths passed by edges between different node pairs is greatly different, the functions exerted by the edges are greatly different, and therefore the influence degrees of the edges on the connected nodes are also different. Under the condition, different weights are given to edges between different node pairs based on paths in actual communication, and then the importance of the nodes is measured by combining the weights of the edges, so that a more accurate node importance sequencing result can be obtained. The necessity of constructing an inclusive topology is illustrated by way of example in fig. 6.

Assume that the paths existing in communications from IP1 to IP8, IP9, and IP10 are:

IP9:[IP1-IP2-IP4-IP6-IP9],[IP1-IP2-IP3-IP6-IP9],[IP1-IP2-IP3-IP9]

IP10:[IP1-IP2-IP3-IP10],[IP1-IP2-IP3-IP7-IP10],[IP1-IP2-IP3-IP6-IP10]

IP11:[IP1-IP2-IP5-IP8],[IP1-IP2-IP5-IP8-IP11]

an unauthorized topology can be constructed from the above paths as shown in fig. 6 (a). The method proposed in this case constructs a weighted topology graph by assigning weights to edges according to the number of paths passing through an edge, as shown in fig. 6 (b).

In the unweighted topology fig. 6(a), the degrees of all nodes are calculated and the results are sorted as shown in the following table:

TABLE 4 node ordering results by degree in the weightless graph

In the weighted topology of fig. 6(b), the weighting degree is calculated by considering the weight of the edge, and the result is shown in the following table:

TABLE 5 node ordering results according to degree of weighting in weighted graph

As can be seen from tables 4 and 5, if the edges in the graph are not weighted, the degree of IP3 is greater than IP2, and IP3 is ranked higher than IP 2. And after weighting the edges in the graph, the IP2 is weighted more heavily than IP3, i.e., IP2 ranks more forward than IP 3. In actual network communication, arrival at IP3 must be via IP 2. The importance of the network is evaluated by using a node deletion method, and after the IP2 and the IP3 are deleted respectively, the topological structure of the network is shown in FIG. 7. Obviously, after the IP3 is deleted, the rest nodes in the network can still be interconnected, but after the IP2 is deleted, normal communication cannot be carried out among a plurality of nodes. The role that IP2 actually plays in the network is more important than IP 3. Therefore, more accurate node sequencing results can be obtained by using the weighted network topology.

To verify the validity of the protocol, the following further explanation is made with reference to experimental data:

in the case of obtaining actual communication paths between all node pairs in the target area, the most accurate node ranking result can be obtained, but this requires that a probe program be deployed at each node in the target network, which obviously cannot be achieved. Therefore, the scheme can select a plurality of detection sources to perform continuous and high-frequency sampling detection on the IP of the target area to approximate the path condition of a real network and perform importance sequencing on the nodes of the target area.

The experimental setup for the data acquisition phase is shown in table 6:

table 6 data acquisition phase experimental setup

In order to ensure the accuracy of the importance ranking of the nodes, a network topology of a target area, which is relatively stable in a period of time, needs to be acquired first. Combining with the actual conditions, three cities such as the Chengdu city of Sichuan province, Hangzhou city of Zhejiang province, Zhenzhou city of Henan province, etc. are selected as target cities. Then, the IP address field in the target city is screened out from 6 IP address bases, and the IP fields appearing in at least more than three IP address bases are reserved to form the IP field set S of the target city_A。

And a scheme of sampling according to network segments and efficiently detecting a target network is utilized. Existing network topology measurement methods are divided into two categories: and detecting the total IP of the target area and extracting the IP according to the network segment. The most complete network topology can be obtained by detecting the total amount of IP, but the time cost is higher, and the detection with larger time span can not truly reflect the whole situation of the target area network because the network is dynamically changed. Considering that the IPs in the same network segment often have similarity in the settings of routing policy, geographical location, etc. and often belong to the same mechanism [12,13,14], the probing efficiency can be greatly improved by using the probing of one or several IP addresses in the same network segment instead of probing the full amount of IPs.

At the same time, the importance of the nodes in the Internet is rankedIn order, routing nodes inside the/24 prefix are less important than nodes for the entire region. Therefore, the method for selecting the IP according to the network segment for detection is selected, and the IP of the target area is divided according to the prefix of 24. On the basis of extracting one IP address from each IP section appearing in the IP address database, at least one IP address is selected under each/24 prefix to form a detection target IP set P_T. Three detection sources located in Chengdu, Hangzhou and Zhengzhou are used to form a detection source set P_V. For target IP set P with 2 hours as period_TAnd carrying out continuous and high-frequency network detection to obtain paths through which communication between nodes passes under different network conditions, thereby obtaining more accurate topology information of the target city.

Since the china internet is a special case of a hierarchical architecture, communication between network operators who do not implement interworking needs to be forwarded via a communication switching node deployed in a specific city, as shown in fig. 8. In city X, only communication between devices belonging to the same ISP is possible, as indicated by a and C in the figure by thick dashed lines; equipment belonging to a different ISP needs to forward data via a NAP in another city, as shown by the communications of a and B in the thick solid lines in the figure. As can be seen from the above figure, communication across operators may cause nodes outside the city to join, which may cause interference data in node sequencing data, and affect the accuracy of node sequencing. Taking data acquired 40 days before the Chengdu area as an example, when all IPs in the Chengdu area are tested, 486 extraurban nodes exist in 9,336 acquired routing nodes, account for more than 5% of all the nodes, and have large sequencing interference; when the data of only one operator (such as Chengdu telecom) is tested, no extra-urban node exists in the 5,484 routing nodes, and the sequencing result is more accurate. Therefore, when node importance ranking is performed on target cities, target IPs of the same operator need to be screened out for measurement, and data of Chinese telecommunications of three target cities are selected for experiments.

The Scamper developed by CAIDA is adopted to carry out IP interface level topology measurement, and the tool is widely applied to the field of network measurement. The IP address sections of Zheng, Hangzhou and Chengdu of three target cities are obtained by screening 6 IP address databases of IPIP, Whois, IPPlus, IP2location, Maxmind, IPcn and the like published in 11 months in 2019, and the total number of the IP address sections is 6,174, including 12,748,117 IPs. By combining the IP selection method in the scheme, a target IP set constructed for Zheng, Hangzhou and Chengdu contains 60,337 target IPs. The number of IP segments, the total number of IP segments, and the number of target IP segments extracted for three cities are shown in table 7.

TABLE 7 statistics of IP Address count for target City

In 2019 and 2020, full IP detection and IP extraction detection according to network segments are respectively carried out on Zhengzhou city, Hangzhou city and metropolis; 43,475,703 detection results are obtained by full IP detection, wherein 9,301,149 detection results are obtained by responding, and 3,274,157 IP are obtained by responding; setting detection period t of detection experiment for extracting IP according to network segment_DFor 2 hours, 12 cycles of probing were performed each day, and a total of 1,284 cycles of probing were performed, yielding 232,418,124 probes, wherein 79,432,347 probes responded and 28,681 IP probes responded, and the specific data are shown in table 8.

Table 8 network probe statistics

Meanwhile, the results of the sampling detection and the full detection adopted by the scheme are compared with the data provided by the CAIDA and the IPIP, and the results are shown in the following table 9:

TABLE 9 comparison of detection results for different datasets

In Table 8, N_CThe/24 prefix network segment number covered by the detection result, and the node is the number of the routing nodes obtained by detection.

It can be seen from the table that the coverage rates of the method adopted in the scheme on the/24 prefix network segment and the nodes in the network respectively reach more than 98% and 91% of the total detection, the data provided by the CAIDA can only cover less than 3% of the/24 prefix network segment and 8% of the nodes, and the coverage rates of the commercial database provided by the IPIP on the/24 prefix network segment and the nodes in the network respectively reach 82% and 59%, so that the rationality of the detection method and the IP selection algorithm adopted in the scheme is proved, and the topology detection speed can be obviously improved on the basis of obtaining the topology required by the node importance ranking.

After topological data of a target city are obtained, a weighted network topological graph is constructed and pruning is carried out based on a main path, the Route Weighting Degree Centrality (RWDC) of the nodes is obtained, and a ranking experiment of the node importance is carried out. The following three experiments were utilized: comparison experiments of node sequencing before and after pruning, comparison experiments of the algorithm and the existing algorithm, comparison experiments of node sequencing under different detection durations, and verification of experimental results by combining the existing database.

The network size and ranking accuracy before and after pruning were compared and the results are shown in table 10 and fig. 9:

TABLE 10 comparison of network sizes before and after pruning

As can be seen from table 10 and fig. 9, pruning can reduce nodes by about 7%, edges by 80%, and paths by 80%, greatly reduce the size of node sorting, and improve the efficiency of node sorting, and in comparison of 15 groups in total of 3 cities, the sorted result after pruning (i.e., RWDC) performs better in 10 groups, accounting for 66.7%. Furthermore, in all comparison groups, RWDC can find more (or the same number) country-level backbone nodes than the pre-pruning index (i.e., WDC). It can be concluded that the proposed topology pruning method can significantly improve the accuracy of the ranking results and reduce the time cost.

In order to verify the effectiveness of the algorithm provided by the scheme, the result obtained by the algorithm (RWDC) provided by the scheme is compared with the result obtained by using degree centrality, betweenness centrality and routing betweenness centrality. The node importance ranking result of the algorithm provided by the scheme is shown in fig. 10, and it can be seen that the RWDC value distribution of the nodes presents clear layering, which indicates that the RWDC can better rank the nodes.

The comparison results of the RWDC and the degree-centrality DC, the betweenness-centrality BC and the route betweenness-centrality RBC proposed by the scheme are shown in fig. 11, and it can be seen that the results obtained by the four sorting methods are different greatly, and the nodes ranked in the top of the degree-centrality, the betweenness-centrality and the route betweenness-centrality are not necessarily superior in ranking in the algorithm of the scheme.

In order to verify the accuracy of the result, the existing public database is used for verifying the sequencing result, the result is shown in table 11, and the node sequencing results of the four indexes have quite large difference. The highest ranked nodes in the sorted results for DC, BC, and RBC are not always top ranked in RWDC. In order to compare the accuracy of the four indexes, the number of backbone nodes with the top ranking results is counted. The results of the comparison are shown in table 12 and fig. 12:

TABLE 11 comparison of the accuracy of the indexes

Taking the node IP ranked 10 above each index of the Chengdu area and the verification result as an example:

table 12 comparison of node ranking results for RWDC, DC, BC, and RBC

Wherein, B_CRepresenting a national level telecommunications backbone that can be looked up in an existing databaseNode, B_PRepresenting a provincial telecommunication backbone network node, and x represents a common node.

As can be seen from the above table, the nodes ranked to the top ten obtained by the algorithm in the present application are marked in the database as the national-level backbone network nodes with a proportion of 90%, the provincial-level backbone network nodes with a proportion of 10%, the nodes ranked to the top ten by using degree centrality as an index are not the national-level or provincial-level backbone network nodes in the result obtained by sorting, the national-level backbone network nodes in the betweenness centrality are 10%, the provincial-level backbone network nodes are 40%, and no national-level backbone nodes are in the nodes ranked to the top ten by routing betweenness centrality, but all are the provincial-level backbone network nodes. The experimental result shows that RWDC can obtain better sequencing result than DC, BC and RBC, because the algorithm of the scheme starts from the routing rule and grasps the network characteristics of the actual internet.

Further, the results of ranking from the probe data for 40 days (360 rounds) and 107 days (1,284 rounds), respectively, were compared, and the results are shown in Table 13, taking the case of both.

TABLE 13 Path number Change, node number Change, edge number Change

As can be seen from the table, the number of points, edges, and paths in the data for 107 days is 1.12,2.41, and 2.49 times that for 40 days. Under the condition that the detection duration difference is large, the obtained network topology is greatly changed. However, in the ranking results based on the proposed algorithm, the top-ranked nodes are substantially the same: the top 10 nodes are only 1 different. This is because the actual network is static in motion and the probability of changes in the backbone nodes is often small. The algorithm is realized by an approximate experiment on the actual situation, and the algorithm can obtain a better result than the conventional node sorting algorithm (the conventional algorithm is based on a static topological model).

Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.

Based on the foregoing system, an embodiment of the present invention further provides a server, including: one or more processors; a storage device to store one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the system as described above.

Based on the above system, the embodiment of the present invention further provides a computer readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the above system.

The device provided by the embodiment of the present invention has the same implementation principle and the same technical effects as those of the foregoing system embodiment, and for the sake of brief description, reference may be made to corresponding contents in the foregoing system embodiment where no part of the embodiment of the device is mentioned.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working process of the system and the apparatus described above may refer to the corresponding process in the foregoing system embodiment, and details are not described herein again.

In all examples shown and described herein, any particular value should be construed as merely exemplary, and not as a limitation, and thus other examples of example embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, systems and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and system may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units into only one type of logical function may be implemented in other ways, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in software functional units and sold or used as a stand-alone product, may be stored in a non-transitory computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention or a part thereof which contributes to the prior art in essence can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the system according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A routing node importance ordering method based on routing characteristics is used for searching backbone nodes in a target network area, and is characterized by comprising the following contents:

based on the stability of the paths between the nodes of the network topology, pruning the network topology to obtain a topology network structure only containing stable paths, wherein according to the path set between the two nodes, the path with the most occurrence times in the set is used as the stable paths of the two nodes during network measurement, and other paths in the set are used as standby paths;

2. The method as claimed in claim 1, wherein the path information between nodes is obtained by probing IP nodes in the target network region with a plurality of probing sources arranged inside and outside the target network region.

3. The routing node importance ranking method based on routing characteristics of claim 1 or 2, wherein the target IP set is obtained according to the IP address segment allocated to the target network area in the IP address database; and carrying out multiple rounds of continuous network measurement on the target IP set by utilizing a plurality of probe sources to acquire the path information for constructing the network topology.

4. The routing node importance ranking method based on routing characteristics of claim 3, wherein the IP address segments of the target network area are divided by/24 prefixes, and one or more IP addresses are selected under each/24 prefix to form a target IP set.

5. The method as claimed in claim 3, wherein in the multiple rounds of continuous network measurement, the number of newly added nodes and newly added path data are counted for each round of detection, and if both of the consecutive rounds of detection are lower than a set threshold, the current round of detection is terminated, otherwise, a new round of detection is performed.

6. The method of claim 1, wherein a backup path is deleted and a stable path is retained during pruning of the network topology.

7. The method as claimed in claim 1, wherein the number of stable paths passing through an edge in the topology network structure is used as the weight of the edge to weight each edge in the topology structure.

8. A routing node importance ranking system based on routing characteristics, for finding backbone nodes in a target network region, comprising: a topology construction module, a topology pruning module and a node sequencing module, wherein,

the topology construction module is used for acquiring path information among nodes in a target network area through network measurement and constructing network topology;

the topology pruning module is used for pruning the network topology based on the stability of the paths between the network topology nodes to obtain a topology network structure only containing stable paths, wherein according to the path set between the two nodes, the path with the most occurrence times in the set is used as the stable paths of the two nodes during network measurement, and other paths in the set are used as standby paths;

9. A computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, performs the method of any of claims 1 to 7.

10. A computer device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to perform the method of any one of claims 1 to 7.