CN108848203B - Network boundary identification method and system - Google Patents

Network boundary identification method and system Download PDF

Info

Publication number
CN108848203B
CN108848203B CN201810666649.5A CN201810666649A CN108848203B CN 108848203 B CN108848203 B CN 108848203B CN 201810666649 A CN201810666649 A CN 201810666649A CN 108848203 B CN108848203 B CN 108848203B
Authority
CN
China
Prior art keywords
address
gateway
data
information
traceroute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810666649.5A
Other languages
Chinese (zh)
Other versions
CN108848203A (en
Inventor
张宇
朱金玉
曾良伟
张宏莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201810666649.5A priority Critical patent/CN108848203B/en
Publication of CN108848203A publication Critical patent/CN108848203A/en
Application granted granted Critical
Publication of CN108848203B publication Critical patent/CN108848203B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/66Arrangements for connecting between networks having differing types of switching systems, e.g. gateways

Abstract

A method and a system for identifying China network boundaries relate to the technical field of network resource mapping. The invention aims to identify the IP address of the Chinese gateway, identify the IP address of the gateway for each path passing through the national gateway and accurately identify which IP address in the path is the IP address of the gateway. The method comprises the following steps: acquiring a Traceroute data set with an IP address attribute mark; and (3) data screening: obtaining the Traceroute data which are respectively stored and marked according to the target IP address after being screened; and (3) gateway extraction process: extracting a candidate set, and selecting respective gateway IP addresses according to each attribute of each node; and (3) gateway verification process: and performing gateway verification by adopting a minimum time delay verification method or a data cross verification method. The invention synthesizes traceroute information and IP address in path, which can mutually restrict and mutually promote information and comprehensively identify the national gateway IP address.

Description

Network boundary identification method and system
Technical Field
The invention relates to the technical field of network resource mapping, in particular to China national border gateway identification.
Background
The reliability of some points in the internet is particularly important, such as country boundary points and the like, and if the points are attacked by DDOS and the like, the consequences are not reasonable. Identifying the Chinese gateway IP address is the basis for researching the link congestion problem between China and other countries; the identification of the China gateway IP address can more accurately construct a China network topological graph and network modeling, so that the network elasticity and robustness are analyzed, and the possibility of attacking or breaking links is prevented; the identification of the Chinese gateway IP address can more accurately map the IP-to-locating and the country information passing through the traceroute path, and has certain guiding significance for the network connectivity and the network diagnosis between countries.
However, there is no relevant research work on the identification of the country border, and most of the work is more inclined to research the identification of the AS border. However, both the identification of country boundaries and the identification of AS boundaries have common challenges, such AS IP-to-location mapping and IP-to-AS mapping, heuristic methods are required to infer boundary information. National boundary identification needs to superpose traceroute detection information of measurement source points at different positions, but no concept of geographic position and boundary exists in a network layer of a TCP/IP system structure; and in measurement sampling, sampling deviation influences the accuracy of inference. There is some error in the information available from the IP address, such AS geographical location information, AS information, WHOIS (query IP address owner) information, PING (network diagnostic tool) delay information, etc.
Disclosure of Invention
The invention aims to identify the Chinese gateway IP address, and can identify the gateway IP address for each path passing through the national gateway; and can accurately identify which IP address in the path is the gateway IP address.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for identifying China network boundaries comprises the following steps:
(1) data marking process: the method comprises the steps of obtaining a Traceroute data set with an IP address attribute mark;
(2) and (3) data screening: inputting a Traceroute data set with an IP address attribute mark in the process of data screening; outputting, namely respectively storing marked Traceroute data according to the target IP addresses after screening; the process is as follows:
(2.1) firstly, separately measuring data of different destination IP addresses, and separately storing the data in files;
(2.2) then deleting Traceroute data which does not reach the destination IP address;
(2.3) deleting the data of the measurement points with errors;
(2.4) deleting the Traceroute data if the autonomous domain has a ring;
(3) the gateway extraction process comprises the following steps:
(3.1) extracting a candidate set, traversing each group of marked Traceroute data, and if one label of the IP address of the current node has Chinese information, stopping from the current node until all labels of certain nodes with the Chinese information are used as the candidate set; in addition, the first two hops and the second two hops of nodes outside the candidate set need to be added into the candidate set to prevent misjudgment; marking the Ping results of the multiple monitoring points;
(3.2) selecting respective gateway IP addresses according to the attributes of each node;
(4) and (3) gateway verification process: and performing gateway verification by adopting a minimum time delay verification method or a Traceroute data cross verification method.
Further, the data marking process specifically includes:
(1.1) extracting an IP address in each hop of data from Traceroute source data, and adding the IP address into a non-repeated IP address set;
(1.2) obtaining monitoring point information closest to a target through each target IP address of the multiple monitoring points Ping;
inputting a target IP address, and outputting the monitoring point geographic information with the minimum Ping target IP address time delay and the packet loss rate, wherein the process comprises the following steps:
initiating a Ping target IP address task;
obtaining time delay information of the Ping target IP address of each monitoring point;
the packet loss rate is calculated and,
Figure GDA0003028911390000021
selecting the monitoring point with the minimum time delay, converting the monitoring point into geographic information,
(1.3) query IP address attribute:
obtaining an AS number and country information corresponding to the IP address by inquiring a BGP database;
inquiring reverse domain name information corresponding to the IP address through a command of inquiring a Reverse Domain Name (RDNS);
inquiring the country, region and city information corresponding to the IP address through a commercial database;
inquiring organization information corresponding to the IP address through a whois information inquiring command;
(1.4) marking the attribute for the IP address in each Traceroute data, and storing the data of each hop in the following form: hop | IP _ addr | RTT | reverse domain name | AS number | BGP country | geographic information | WHOIS | monitoring point packet loss rate
hop represents the hop count of the IP address currently passed through, and the IP _ addr represents the IP address; RTT represents the round-trip delay of the current IP address from the source IP address; AS number represents the number of autonomous domain; BGP countries represent country information carried in BGP data; WHOIS represents organizational information queried with WHOIS commands;
further, in step (3.2), the specific method for selecting the respective gateway IP address through the respective attributes of each node is as follows:
(3.2.1) RTT method: traversing the candidate set, and if the change difference value of the current hop delay and the previous hop delay is larger than a threshold value, selecting the current hop IP address as the gateway IP address selected by the method;
(3.2.2) RDNS method: traversing the candidate set, and selecting a gateway IP address according to the information of a reverse domain name of the IP address, including the gateway (gw), a neighboring next autonomous domain (as4837, china-unicom) and a neighboring country (china);
(3.2.3) BGP method: traversing the candidate set, finding out the IP address with the first BGP country attribute being China, and selecting the current hop IP address as the gateway IP address;
(3.2.4) GEO method: traversing the candidate set, finding out the IP address of which the first geographic information is China, and selecting the current hop IP address as the gateway IP address;
(3.2.5) WHOIS Process: because the WHOIS organization information of the IP address contains Chinese characteristic information, traversing the candidate set, and selecting a first IP address pointing to China in a target country as a gateway IP address;
(3.2.6) the multipoint Ping time delay method: traversing the candidate set, finding the IP address with the first nearest monitoring point as China, and selecting the current hop IP address as the gateway IP address;
the gateway IP address weight obtained by each method is shown as the following table:
Method RTT method RDNS method BGP method GEO method WHOIS process Multipoint Ping time delay method
Weight value 2.0 1.5 1.0 1.0 1.0 0.0 or 3.0 or 7.0
And taking the IP address with the maximum weight sum as the gateway IP address of the marked Traceroute data.
Further, the process of the gateway IP address weight obtained by the methods is as follows:
RTT is relative time delay information, under an ideal condition, the distance between the routes corresponding to the IP address is reflected most intuitively, but the influence is larger under the inevitable conditions such as network congestion, and the weight is set to be 1.5;
RDNS is IP address attribute information, contains information such as geographical position and the like, and can partially update reverse domain name information when the IP address changes; the information is also used as an important basis by a plurality of geographic positioning databases, the accuracy is high, and the weight is set to be 2.0;
BGP/GEO/WHOIS information allocates IP addresses according to IP prefixes, conditions such AS address space sharing between AS and routes cannot be identified, and weight is set to be 1.0;
the determination of the weight when the multi-monitoring-point Ping time delay method measures the gateway IP address needs to be divided into several situations: (1) when 70% -100% of the monitoring points have a return time delay result, the weight is 7.0; (2) when only 30% -70% of the monitoring points return a time delay result, the weight is 3.0, and the end value is not 70%; (3) when only 0% -30% of the monitoring points have a return time delay result, the weight is 0.0, and the end value is not included by 30%.
Further, gateway verification is performed by adopting a minimum time delay verification method or a Traceroute data cross-verification method, which specifically comprises the following steps:
(4.1) minimum latency validation: verifying the gateway IP address selected by each group of marked Traceroute data, and calculating the distance between the city where the first hop IP address is located and the city where the gateway IP address is located, so as to calculate the minimum round-trip delay between the cities;
if the delay difference between the two IP addresses in the marked Traceroute data is smaller than the minimum round-trip delay, the fact that the geographic position corresponding to the IP address is not in the corresponding city is proved, and the IP address is not the gateway IP address of the marked Traceroute data;
finding out an IP address set failed in verification through minimum time delay verification, analyzing the reasons (RDNS or WHOIS incomplete characteristics, data errors of a certain monitoring point and the like) of the failure of the verification from the marked Traceroute data, returning to a data screening step to correct a data screening method or returning to a gateway extraction step 3.2 to correct a gateway IP address selection method, and then carrying out data screening and gateway extraction again; the steps are repeated until the IP address with the verification failure cannot be found;
the method for correcting the gateway IP address selection refers to a method for correcting the RDNS and/or WHOIS in the step 3.2, and specifically comprises the following steps:
in the RDNS method, information indicating a country boundary contained in a reverse domain name is updated;
in the WHOIS method, Chinese characteristic information contained in WHOIS organization information is updated;
(4.2) Traceroute data cross validation: combining candidate sets in Traceroute data of each target IP address, placing a gateway node in the center, and drawing a topological graph; the IP addresses of foreign countries and domestic countries are sequentially arranged from left to right in the topological graph, and because the gateway IP address is taken as a center, when a node on the right of the gateway IP address points to the gateway node and the left side; the Traceroute data or the sought gateway IP address is proved to be in error.
A system for identifying network boundaries in china, the system comprising:
(1) the data marking module is used for acquiring a Traceroute data set with an IP address attribute mark;
(2) data screening module for
(2.1) firstly, separately measuring data of different destination IP addresses, and separately storing the data in files;
(2.2) then deleting Traceroute data which does not reach the destination IP address;
(2.3) deleting the data of the measurement points with errors;
(2.4) deleting the Traceroute data if the autonomous domain has a ring;
(3) gateway extraction module for
(3.1) extracting a candidate set, traversing each group of marked Traceroute data, and if one label of the IP address of the current node has Chinese information, stopping from the current node until all labels of certain nodes with the Chinese information are used as the candidate set; in addition, the first two hops and the second two hops of nodes outside the candidate set need to be added into the candidate set to prevent misjudgment; marking the Ping results of the multiple monitoring points;
(3.2) selecting respective gateway IP addresses according to the attributes of each node;
(4) and the gateway verification module is used for performing gateway verification by adopting a minimum time delay verification method or a Traceroute data cross verification method.
The invention has the following beneficial effects:
the invention synthesizes traceroute information and IP address in path, which can mutually restrict and mutually promote information and comprehensively identify the national gateway IP address. The Chinese gateway IP address is a key node for home and abroad network communication, is the first hop of the foreign node entering the Chinese network, and is the last hop of the Chinese node exiting the Chinese network; the traceroute data for measuring the Chinese target IP address at detection points around the world has the following characteristics: 1. each path should have a Chinese gateway IP address; 2. the time delay from the foreign node to the Chinese gateway IP address is large; 3. the route around the Chinese gateway IP address should be the boundary point between foreign and Chinese geographic locations. The invention utilizes the IP address information in each traceroute path, such as the time delay information of adjacent IP addresses in the path, the geographical position information, the position information of a corresponding autonomous region, the whois position information, the position information of multi-point ping identification and the like, to jointly identify the IP address information of the Chinese gateway according to the characteristics of the gateway IP. Because the attributes of a plurality of IP addresses are used for determining the China gateway IP address, the accuracy is higher than that when only a single attribute is used.
Drawings
Fig. 1 is a schematic diagram of the overall architecture of the present invention, and fig. 2 is a topological diagram drawn by Traceroute data cross validation.
Detailed Description
The first embodiment is as follows: as shown in fig. 1 to 2, the implementation of the method or system for identifying a chinese network boundary in the present embodiment is described as follows:
to achieve this, the system is divided into four modules:
(1) data marking
(2) Data screening
(3) Gateway abstraction
(4) Gateway authentication
The Traceroute data set from the detection point to the target IP address node is input, and the Traceroute data set is output as a gateway IP address set. For the input Traceroute data set, marking each attribute of the IP address in each data, then removing the wrong Traceroute data, voting and extracting the gateway IP address from the rest data through each attribute, and finally verifying the accuracy of the gateway IP addresses.
The module (1) is specifically implemented by (1.1) extracting the IP address in each hop of data from Traceroute source data, and adding the IP address into a non-repeated IP address set. And (1.2) obtaining the information of the monitoring point closest to the target through each target IP address of the multiple monitoring points Ping.
Figure GDA0003028911390000051
Figure GDA0003028911390000061
(1.3) query IP address attribute:
BGP: and obtaining the AS number and the country information corresponding to the IP address by inquiring the BGP database.
RDNS: and inquiring the reverse domain name information corresponding to the IP address through a command of inquiring the reverse domain name.
GEO: and inquiring the information of the country, the region and the city corresponding to the IP address through a commercial database.
WHOIS: and inquiring organization information (org, netname and the like) corresponding to the IP address through a command of inquiring the whois information.
(1.4) marking the attribute for the IP address in each Traceroute data, and storing the data of each hop in the following form: hop | IP _ addr | RTT | reverse domain name | AS number | BGP country | geographic information | WHOIS | monitoring point packet loss rate
The specific implementation steps of the module (2) are that firstly (2.1) data of different destination IP addresses are separately measured and stored separately by files. Traceroute data that does not reach the destination IP address, which may have an impact on finding the gateway IP address last, is then (2.2) deleted. And (2.3) deleting data of some measurement points when the measured data of some measurement points are wrong (such as the middle contains an intranet IP address, the fourth hop is necessarily a target IP address and the like). (2.4) deleting the Traceroute data if the autonomous domain has a ring (such AS AS1- > AS2- > AS1- > AS 3).
Figure GDA0003028911390000062
The concrete implementation steps of the module (3) are (3.1) extracting a candidate set, wherein Traceroute data are from a monitoring point to a target node, if a label of the current hop IP address has Chinese information, the current hop IP address is stopped from the current node until all labels of a certain node have the Chinese information, and the candidate set is used. In addition, the first two hops and the second two hops nodes outside the candidate set need to be added into the candidate set to prevent misjudgment. And marking the Ping results of the multiple monitoring points.
Then (3.2) select the respective gateway IP address by each attribute:
(3.2.1) RTT method: when traceroute measures the destination IP address from one measurement point to another country, the country gateway IP address entering the destination IP needs longer time delay. And traversing the candidate set, and if the change of the current hop delay and the previous hop delay is larger, selecting the current hop IP address as the gateway IP address selected by the method.
(3.2.2) RDNS method: and traversing the candidate set, and selecting the gateway IP address according to the information of the reverse domain name of the IP address, such as a gateway (gw), a neighboring next autonomous domain (as4837, china-unicom), a neighboring country (cn) and the like.
(3.2.3) BGP method: and traversing the candidate set, finding the IP address with the first BGP country attribute being China, and selecting the current hop IP address as the gateway IP address.
(3.2.4) GEO method: and traversing the candidate set, finding out the IP address with the first geographic information of China, and selecting the current hop IP address as the gateway IP address.
(3.2.5) WHOIS Process: because the WHOIS organization information of the IP address contains Chinese characteristic information, the candidate set is traversed, and the first IP address pointing to China in the target country is selected as the gateway IP address.
(3.2.6) the multipoint Ping time delay method: traversing the candidate set, finding the IP address with the first nearest monitoring point as China, and selecting the current hop IP address as the gateway IP address.
The gateway IP address weight obtained by each method is shown as the following table:
Method RTT method RDNS method BGP method GEO method WHOIS process Multipoint Ping time delay method
Weight value 2.0 1.5 1.0 1.0 1.0 0.0/3.0/7.0
The RTT is relative delay information, and ideally, the distance between routes corresponding to the IP address is most intuitively reflected, but the distance is greatly affected by unavoidable conditions such as network congestion, and the weight is set to 1.5.
The RDNS is IP address attribute information, which implies information such as geographical location, and when the IP address changes, the reverse domain name information is partially updated. The information is also used as an important basis by a plurality of geographic positioning databases, the accuracy is high, and the weight is set to be 2.0.
The BGP/GEO/WHOIS information allocates the IP address according to the IP prefix, the conditions of sharing address space between AS and routes and the like cannot be identified, and the weight is set to be 1.0.
The weight of the multi-monitoring-point Ping time delay method needs to be divided into several conditions: when the gateway IP address is measured, (1) when most monitoring points return delay results (70% -100%), the weight is 7.0. (2) When the delay result (30% -70%) is not returned from some monitoring points, the weight is 3.0. (3) When most monitoring points do not return delay results (0% -30%), the weight is 0.0.
The following is an algorithm for extracting the Traceroute data gateway IP address by each attribute.
Figure GDA0003028911390000071
Figure GDA0003028911390000081
The module (4) is realized by the following steps:
(4.1) minimum RTT verification: and verifying the gateway IP address selected by each Traceroute data, and calculating the distance between the area of the first hop IP address and the area of the gateway IP address so as to calculate the minimum round-trip delay between cities. If the time delay difference of the two IP addresses in the data is smaller than the minimum round-trip time delay, the fact that the corresponding geographic position of the IP address is not in the corresponding area is proved, and the IP address is not the gateway IP address of the area.
Figure GDA0003028911390000082
Figure GDA0003028911390000091
By the method, an IP address set which fails in verification is found out, the reasons (RDNS or WHOIS incomplete characteristics, data errors of a certain monitoring point and the like) of the verification failure are analyzed from the source data, then a data screening and voting algorithm is corrected, and a gateway IP address set is extracted from the data for verification.
(4.2) Traceroute data cross validation: and combining candidate sets in the Traceroute data of each target IP address, centering the gateway node, and drawing a topological graph. The IP addresses of foreign countries and domestic countries are sequentially arranged from left to right in the topological graph, and because the gateway IP address is taken as a center, when the right of the gateway IP address has nodes pointing to the gateway node and the left. The Traceroute data or the sought gateway IP address is proved to be in error.
The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.
Example (b):
in order to describe the present invention more specifically, the following is a detailed description of the technical solution of the present invention by way of example:
the system tests a series of Traceroute source data measured by monitoring points, and the format is as follows:
Figure GDA0003028911390000092
the inside contains monitoring points, destination IP addresses and Traceroute node information. Extracting the IP address of each hop, and uniformly inquiring each attribute: the source data contains RTT information without query; obtaining an AS number and national information by inquiring a BGP database; querying reverse domain name information through a "dig-x IP" command; inquiring the information of country, region and city through the IP2Location of the commercial database; inquiry of organization information (org, netname, etc.) by command of "whois IP"; and taking out the monitoring point information with the minimum time delay through the same target IP address of the multiple monitoring points Ping, wherein the measured results are as follows:
Figure GDA0003028911390000101
then, respectively marking the attribute (the 0 th hop is the destination IP address information) for the IP address in each Traceroute information:
Figure GDA0003028911390000102
firstly, data of different destination IP addresses are separately measured and stored separately by files. The target IP address information is in the measurement source monitoring information of the head of each group of Traceroute data.
traceroute from′https://www.netip.de/′to1.192.0.1
Traceroute data that does not reach the destination IP address, which may have an impact on finding the gateway IP address last, is then deleted.
Figure GDA0003028911390000111
And (3) deleting data of some measurement points if the measured data of some measurement points are wrong (such as the middle contains an intranet IP address, the fourth hop is necessarily a target IP address and the like).
Figure GDA0003028911390000112
For Traceroute data, if the autonomous domain has a ring (e.g. AS1- > AS2- > AS1- > AS3), the Traceroute data is deleted.
Figure GDA0003028911390000113
Take extracting the gateway IP address in the following Traceroute data as an example:
Figure GDA0003028911390000114
first, a candidate set is selected, and the result of the extracted candidate set a is shown as follows:
Figure GDA0003028911390000115
the RTT method comprises the following steps: from the candidate set a, it can be seen that the delay of 202.97.94.113 is 338.677ms, the last hop is 128.470ms, the change is large, and the IP address is taken as the gateway IP address.
The RDNS method comprises the following steps: the IP address of the gateway for this method cannot be found in candidate set a, taking another candidate set B as an example:
Figure GDA0003028911390000121
in the candidate set B, the reverse domain name of 87.245.236.150 has "GW-chinatelect" information, and its next hop IP address 202.97.58.69 is used as the gateway IP address.
The BGP method comprises the following steps: as can be seen from candidate set a, 202.97.50.25 is the IP address of the first BGP attribute being CN, and this IP address is used as the gateway IP address.
The GEO method comprises the following steps: as can be seen from candidate set a, 202.97.94.113 is the IP address of country China corresponding to the first geographic information, and this IP address is used as the gateway IP address.
WHOIS method: as can be seen from the candidate set a, the china organization information of the IP address 202.97.50.25 includes the china information for the first time, and the IP address is used as the gateway IP address.
The multipoint Ping time delay method comprises the following steps: from candidate set a, it can be seen that 202.97.94.113 is the IP address of the country corresponding to the first nearest monitored point, and this IP address is used as the gateway IP address.
In the candidate set A, the IP address weights are respectively
202.97.94.113:10.0
202.97.50.25:2.0
Therefore 202.97.94.113 is selected as the Traceroute data gateway IP address.
The minimum RTT verification method selects a gateway IP address judgment error condition as follows:
Figure GDA0003028911390000122
the selected gateway IP address is 202.97.62.53, but the delay difference from the first hop is only 4ms, which is less than the calculated minimum delay, and this gateway IP address may be selected incorrectly.
The topology map drawn by Traceroute data cross validation is shown in figure 2.
The column with the label in the middle is the selected gateway IP address, the left side is the foreign IP address, and the right side is the domestic IP address. It can be seen that the last gateway IP address selected is pointed to by some IP addresses to the right, where Traceroute data or the gateway IP address selected by the method is problematic.

Claims (4)

1. A method for identifying network boundaries, the method comprising:
(1) data marking process: the method comprises the steps of obtaining a Traceroute data set with an IP address attribute mark; the specific process is as follows: (1.1) extracting an IP address in each hop of data from Traceroute source data, and adding the IP address into a non-repeated IP address set;
(1.2) obtaining monitoring point information closest to a target through each target IP address of the multiple monitoring points Ping;
inputting a target IP address, and outputting the monitoring point geographic information with the minimum Ping target IP address time delay and the packet loss rate, wherein the process comprises the following steps:
initiating a Ping target IP address task;
obtaining time delay information of the Ping target IP address of each monitoring point;
the packet loss rate is calculated and,
Figure FDA0003045556290000011
selecting the monitoring point with the minimum time delay, converting the monitoring point into geographic information,
(1.3) query IP address attribute:
obtaining an AS number and country information corresponding to the IP address by inquiring a BGP database;
inquiring reverse domain name information corresponding to the IP address through a command of inquiring the reverse domain name;
inquiring the country, region and city information corresponding to the IP address through a commercial database;
inquiring organization information corresponding to the IP address through a whois information inquiring command;
(1.4) marking the attribute for the IP address in each Traceroute data, and storing the data of each hop in the following form: a hop IP _ addr, RTT, a reverse domain name, AS number, BGP country, geographic information, WHOIS monitoring point, wherein the packet loss rate hop represents the hop count of the IP address currently passing through, and the IP addr represents the IP address; RTT represents the round-trip delay of the current IP address from the source IP address; AS number represents the number of autonomous domain; BGP countries represent country information carried in BGP data; WHOIS represents organizational information queried with WHOIS commands;
(2) and (3) data screening: inputting a Traceroute data set with an IP address attribute mark in the process of data screening; outputting, namely respectively storing marked Traceroute data according to the target IP addresses after screening; the process is as follows:
(2.1) firstly, respectively measuring data of different destination IP addresses, and separately storing the data in files;
(2.2) then deleting Traceroute data which does not reach the destination IP address;
(2.3) deleting the data of the measurement points with errors;
(2.4) deleting the Traceroute data if the autonomous domain has a ring;
(3) the gateway extraction process comprises the following steps:
(3.1) extracting a candidate set, traversing each group of marked Traceroute data, and if one attribute mark of the IP address of the current node carries Chinese information, starting from the current node to all nodes of a certain node with all attribute marks carrying the Chinese information to serve as the candidate set; in addition, the first two hops and the second two hops of nodes outside the candidate set need to be added into the candidate set to prevent misjudgment; marking the Ping results of the multiple monitoring points;
(3.2) selecting respective gateway IP addresses according to the attributes of each node; the specific process is as follows:
(3.2.1) RTT method: traversing the candidate set, and if the change difference value of the current hop delay and the previous hop delay is larger than a threshold value, selecting the current hop IP address as the gateway IP address selected by the method;
(3.2.2) RDNS method: traversing the candidate set, and selecting a gateway IP address according to the reverse domain name of the IP address, wherein the reverse domain name comprises the gateway, the adjacent next autonomous domain and the adjacent country information;
(3.2.3) BGP method: traversing the candidate set, finding out the IP address with the first BGP country attribute being China, and selecting the current hop IP address as the gateway IP address;
(3.2.4) GEO method: traversing the candidate set, finding out the IP address of which the first geographic information is China, and selecting the current hop IP address as the gateway IP address;
(3.2.5) WHOIS Process: because the WHOIS organization information of the IP address contains Chinese characteristic information, traversing the candidate set, and selecting a first IP address pointing to China in a target country as a gateway IP address;
(3.2.6) the multipoint Ping time delay method: traversing the candidate set, finding the IP address with the first nearest monitoring point as China, and selecting the current hop IP address as the gateway IP address;
the gateway IP address weight obtained by each method is shown as the following table:
Method RTT method RDNS method BGP method GEO method WHOIS process Multipoint Ping time delay method Weight value 2.0 1.5 1.0 1.0 1.0 0.0 or 3.0 or 7.0
Taking the IP address with the maximum weight sum as the gateway IP address of the marked Traceroute data;
(4) and (3) gateway verification process: and performing gateway verification by adopting a minimum time delay verification method or a Traceroute data cross verification method.
2. The method according to claim 1, wherein the process of obtaining the gateway IP address weight by the methods comprises:
RTT is relative time delay information, and the weight is set to be 2.0;
the RDNS is IP address attribute information, contains geographical position information, and can update reverse domain name information when the IP address changes; the information is also used as the basis by the geographic positioning database, and the weight is set to be 1.5;
BGP/GEO/WHOIS information allocates IP addresses according to IP prefixes, the shared address space between AS and a route cannot be identified, and the weight is set to be 1.0;
the determination of the weight when the multi-monitoring-point Ping time delay method measures the gateway IP address needs to be divided into several situations: (1) when 70% -100% of the monitoring points have a return time delay result, the weight is 7.0; (2) when only 30% -70% of the monitoring points return a time delay result, the weight is 3.0, and the end value is not 70%; (3) when only 0% -30% of the monitoring points have a return time delay result, the weight is 0.0, and the end value is not included by 30%.
3. The method for identifying network boundaries according to claim 2, wherein a minimum delay verification method or a Traceroute data cross-validation method is adopted for gateway verification, and specifically:
(4.1) minimum latency validation: verifying the gateway IP address selected by each group of marked Traceroute data, and calculating the distance between the city where the first hop IP address is located and the city where the gateway IP address is located, so as to calculate the minimum round-trip delay between the cities;
if the delay difference between the two IP addresses in the marked Traceroute data is smaller than the minimum round-trip delay, the fact that the geographic position corresponding to the IP address is not in the corresponding city is proved, and the IP address is not the gateway IP address of the marked Traceroute data;
finding out an IP address set which fails in verification through minimum time delay verification, analyzing the reason of the failure in verification from the marked Traceroute data, returning to a data screening step to correct a data screening method or returning to a gateway extraction step 3.2 to correct a gateway IP address selection method, and then performing data screening and gateway extraction again; the steps are repeated until the IP address with the verification failure cannot be found;
the method for correcting the gateway IP address selection refers to a method for correcting the RDNS and/or WHOIS in the step 3.2, and specifically comprises the following steps:
in the RDNS method, information indicating a country boundary contained in a reverse domain name is updated;
in the WHOIS method, Chinese characteristic information contained in WHOIS organization information is updated;
(4.2) Traceroute data cross validation: combining candidate sets in Traceroute data of each target IP address, placing a gateway node in the center, and drawing a topological graph; foreign and domestic IP addresses are sequentially arranged from left to right in the topological graph, and due to the fact that the gateway IP address is taken as the center, when nodes in the domestic IP address on the right side of the gateway IP address point to the gateway node and the foreign IP address; the Traceroute data or the sought gateway IP address is proved to be in error.
4. A system for identifying network boundaries, the system comprising:
(1) the data marking module is used for acquiring a Traceroute data set with an IP address attribute mark; the specific process is as follows:
(1.1) extracting an IP address in each hop of data from Traceroute source data, and adding the IP address into a non-repeated IP address set;
(1.2) obtaining monitoring point information closest to a target through each target IP address of the multiple monitoring points Ping;
inputting a target IP address, and outputting the monitoring point geographic information with the minimum Ping target IP address time delay and the packet loss rate, wherein the process comprises the following steps:
initiating a Ping target IP address task;
obtaining time delay information of the Ping target IP address of each monitoring point;
the packet loss rate is calculated and,
Figure FDA0003045556290000041
selecting the monitoring point with the minimum time delay, converting the monitoring point into geographic information,
(1.3) query IP address attribute:
obtaining an AS number and country information corresponding to the IP address by inquiring a BGP database;
inquiring reverse domain name information corresponding to the IP address through a command of inquiring the reverse domain name;
inquiring the country, region and city information corresponding to the IP address through a commercial database;
inquiring organization information corresponding to the IP address through a whois information inquiring command;
(1.4) marking the attribute for the IP address in each Traceroute data, and storing the data of each hop in the following form: hop | IP _ addr | RTT | reverse domain name | AS number | BGP country | geographic information | WHOIS | monitoring point packet loss rate
hop represents the hop count of the IP address currently passed through, and the IP _ addr represents the IP address; RTT represents the round-trip delay of the current IP address from the source IP address; AS number represents the number of autonomous domain; BGP countries represent country information carried in BGP data; WHOIS represents organizational information queried with WHOIS commands;
(2) data screening module for
(2.1) firstly, respectively measuring data of different destination IP addresses, and separately storing the data in files;
(2.2) then deleting Traceroute data which does not reach the destination IP address;
(2.3) deleting the data of the measurement points with errors;
(2.4) deleting the Traceroute data if the autonomous domain has a ring;
(3) gateway extraction module for
(3.1) extracting a candidate set, traversing each group of marked Traceroute data, and if one attribute mark of the IP address of the current node carries Chinese information, starting from the current node to all nodes of a certain node with all attribute marks carrying the Chinese information to serve as the candidate set; in addition, the first two hops and the second two hops of nodes outside the candidate set need to be added into the candidate set to prevent misjudgment; marking the Ping results of the multiple monitoring points;
(3.2) selecting respective gateway IP addresses according to the attributes of each node; the specific process is as follows:
(3.2.1) RTT method: traversing the candidate set, and if the change difference value of the current hop delay and the previous hop delay is larger than a threshold value, selecting the current hop IP address as the gateway IP address selected by the method;
(3.2.2) RDNS method: traversing the candidate set, and selecting a gateway IP address according to the reverse domain name of the IP address, wherein the reverse domain name comprises the gateway, the adjacent next autonomous domain and the adjacent country information;
(3.2.3) BGP method: traversing the candidate set, finding out the IP address with the first BGP country attribute being China, and selecting the current hop IP address as the gateway IP address;
(3.2.4) GEO method: traversing the candidate set, finding out the IP address of which the first geographic information is China, and selecting the current hop IP address as the gateway IP address;
(3.2.5) WHOIS Process: because the WHOIS organization information of the IP address contains Chinese characteristic information, traversing the candidate set, and selecting a first IP address pointing to China in a target country as a gateway IP address;
(3.2.6) the multipoint Ping time delay method: traversing the candidate set, finding the IP address with the first nearest monitoring point as China, and selecting the current hop IP address as the gateway IP address;
the gateway IP address weight obtained by each method is shown as the following table:
Method RTT method RDNS method BGP method GEO method WHOIS process Multipoint Ping time delay method Weight value 2.0 1.5 1.0 1.0 1.0 0.0 or 3.0 or 7.0
Taking the IP address with the maximum weight sum as the gateway IP address of the marked Traceroute data;
(4) gateway authentication module for
And performing gateway verification by adopting a minimum time delay verification method or a Traceroute data cross verification method.
CN201810666649.5A 2018-06-25 2018-06-25 Network boundary identification method and system Active CN108848203B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810666649.5A CN108848203B (en) 2018-06-25 2018-06-25 Network boundary identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810666649.5A CN108848203B (en) 2018-06-25 2018-06-25 Network boundary identification method and system

Publications (2)

Publication Number Publication Date
CN108848203A CN108848203A (en) 2018-11-20
CN108848203B true CN108848203B (en) 2021-06-18

Family

ID=64203481

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810666649.5A Active CN108848203B (en) 2018-06-25 2018-06-25 Network boundary identification method and system

Country Status (1)

Country Link
CN (1) CN108848203B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111181910B (en) * 2019-08-12 2021-10-08 腾讯科技(深圳)有限公司 Protection method and related device for distributed denial of service attack
CN111865698B (en) * 2020-07-30 2023-10-17 中国电子信息产业集团有限公司第六研究所 Geographic information-based self-control domain-level Internet topology visualization method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101547114A (en) * 2008-03-25 2009-09-30 中国科学院计算技术研究所 Topology processing system and method in autonomous system
CN102204166A (en) * 2011-05-19 2011-09-28 华为技术有限公司 Method for detecting qos, mcs, mp, and system
CN104168341A (en) * 2014-08-15 2014-11-26 北京百度网讯科技有限公司 IP address locating method and CDN dispatching method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101547114A (en) * 2008-03-25 2009-09-30 中国科学院计算技术研究所 Topology processing system and method in autonomous system
CN102204166A (en) * 2011-05-19 2011-09-28 华为技术有限公司 Method for detecting qos, mcs, mp, and system
CN104168341A (en) * 2014-08-15 2014-11-26 北京百度网讯科技有限公司 IP address locating method and CDN dispatching method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
bdrmap: Inference of Borders Between IP Networks;Matthew Luckie,Amogh Dhamdhere;《ACM》;20161116;全文 *
MAP-IT: Multipass Accurate Passive Inferences from Traceroute;Alexander Marder,Jonathan M. Smith;《ACM》;20161116;全文 *
Mapping Peering Interconnections to a Facility;Vasileios Giotsas,Georgios Smaragdakis;《ACM》;20151204;全文 *

Also Published As

Publication number Publication date
CN108848203A (en) 2018-11-20

Similar Documents

Publication Publication Date Title
JP6920294B2 (en) Methods, systems and equipment for locating geographic locations using route tracking
CN108011746B (en) IP-level global Internet topology mapping method based on Traceroute and SNMP protocol
CN102215136B (en) Flow topology generation method and device
Huffaker et al. DRoP: DNS-based router positioning
Gunes et al. Analytical IP alias resolution
Nomikos et al. traIXroute: Detecting IXPs in traceroute paths
CN108848203B (en) Network boundary identification method and system
CN110995885B (en) IP positioning method based on router error training
CN104202211A (en) Autonomous system level network topology identification method combining active and passive measurement
CN101518017A (en) Autonomous System-based Edge Marking (ASEM) for Internet Protocol (IP) traceback
Tian et al. China's internet: Topology mapping and geolocating
CN110912756B (en) IP positioning-oriented network topology boundary routing IP identification algorithm
Zhao et al. IP Geolocation based on identification routers and local delay distribution similarity
CN105262690A (en) Autonomous system level network topology identification method
CN111064817B (en) City-level IP positioning method based on node sorting
Liu et al. RNBG: a ranking nodes based IP geolocation method
CN103036848B (en) The reverse engineering approach of agreement and system
CN114520799B (en) Urban IP positioning and error estimation method and system based on minimum circle coverage
Gharaibeh et al. Assessing co-locality of IP blocks
CN109088756B (en) Network topology completion method based on network equipment identification
Chandrasekaran et al. Alidade: Ip geolocation without active probing
CN110995587B (en) Method and device for positioning routing instability event source
Kardes et al. Graph based induction of unresponsive routers in internet topologies
US11792110B2 (en) Geolocation system and method
CN112187640B (en) L3VPN service point-to-point route based query method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Zhang Yu

Inventor after: Zhu Jinyu

Inventor after: Zeng Liangwei

Inventor after: Zhang Hongli

Inventor before: Zhang Yu

Inventor before: Zeng Liangwei

Inventor before: Zhang Hongli

GR01 Patent grant
GR01 Patent grant