CN113824810A - Target-driven IP address geographic position inference method - Google Patents

Target-driven IP address geographic position inference method Download PDF

Info

Publication number
CN113824810A
CN113824810A CN202110964934.7A CN202110964934A CN113824810A CN 113824810 A CN113824810 A CN 113824810A CN 202110964934 A CN202110964934 A CN 202110964934A CN 113824810 A CN113824810 A CN 113824810A
Authority
CN
China
Prior art keywords
node
target
address
anchor
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110964934.7A
Other languages
Chinese (zh)
Inventor
温胜昔
季宇凯
王占丰
陈潇霆
陈嘉欣
马潇霄
张一杭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Lexbell Information Technology Co ltd
Original Assignee
Nanjing Lexbell Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Lexbell Information Technology Co ltd filed Critical Nanjing Lexbell Information Technology Co ltd
Priority to CN202110964934.7A priority Critical patent/CN113824810A/en
Publication of CN113824810A publication Critical patent/CN113824810A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/60Types of network addresses
    • H04L2101/69Types of network addresses using geographic information, e.g. room number

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a target-driven IP address geographic position inference method, which comprises the steps of firstly, carrying out anchor node acquisition based on a target IP; then, screening and calibrating the anchor nodes; and finally, comprehensively detecting the target IP to construct a target portrait and carrying out high-precision positioning. The invention can effectively infer the use information of the IP address, effectively solves the problem of low efficiency of blindly acquiring the anchor node in a large range, can further improve the accuracy of the comprehensive inference of the IP use information by more effective anchor nodes, and comprehensively infers the use and positioning information of the IP to be detected according to the route approximation principle and the geographical position information by comparing and analyzing the topological routes of the anchor node and the IP to be detected.

Description

Target-driven IP address geographic position inference method
Technical Field
The invention relates to the technical field of target IP building-level positioning, in particular to a building-level IP positioning method based on target driving and IP comprehensive inference.
Background
Currently, a common method for IP positioning is to estimate its geographical location by means of various information such as the IP's path, reference nodes, bearer service information, Whois data, etc. The basic principle of positioning algorithm design is to reduce the measurement overhead as much as possible under the condition of ensuring the positioning accuracy, and meanwhile, the method has good expansibility and does not need the support of a client. Initial location algorithms speculate the geographic location of an IP device by querying or mining information implicit in the hostname from a DNS server. In recent years, probability-based positioning algorithms become research hotspots again, and positioning is performed by searching a distribution rule of time delay and geographic distance. Because there are many IP positioning algorithms, classification can be performed according to different standards such as whether support of the client is required, the positioning principle, and the like. In the existing positioning algorithm, the positioning algorithm based on the client has the highest precision, but often by means of infrastructures such as a GPS, a cellular base station, a WiFi access point, and the like, the data is derived from analysis of either Whois data or operator data or network data, and the positioning precision and accuracy of the data cannot be guaranteed, so that the application range of the IP positioning data is greatly influenced. Although researchers have proposed many IP positioning algorithms, due to the lack of a large number of reference nodes, extensive deployment is not possible to achieve highly accurate results.
Disclosure of Invention
The invention provides a target-driven IP address positioning algorithm for overcoming the problems in the prior art, realizes dynamic collection of positioning anchor nodes around an IP address positioning target, and then deduces the target IP geographic position through path similarity comparison. By the method, large-scale network space detection and anchor node acquisition can be avoided, so that the overhead of network space measurement is reduced, the high-precision positioning of the IP address is realized, and the positioning precision can reach the street level or even the building level precision. In order to achieve the technical purpose and achieve the technical effects, the invention is realized by the following technical scheme:
step 1: collecting anchor node information, and acquiring an anchor node from a target IP network segment;
step 2: calibrating anchor node information, and judging the effectiveness and node type of an anchor node according to node equipment type information;
and step 3: deducing the geographic position, namely deducing the geographic position of the IP to be positioned by measuring the similarity between the path of the IP to be positioned and the path of the anchor node;
a target-driven IP address geographic position inference method comprises the following steps:
the invention provides a target-driven IP address geographic position inference method aiming at the problems that an IP positioning algorithm lacks a large number of reference nodes and cannot be deployed in a large range to obtain high-accuracy positioning. The algorithm is characterized in that an effective anchor node set in a target IP network segment is established, classification is carried out according to the target IP address response network measurement condition, the IP responding to the measurement is measured by using Traceroute to obtain a network path to the target node, then N IP addresses closest to the node are selected by a path matching method, and the geographical position of the target IP is determined by using a centroid method or a nearest neighbor method. The positioning method has higher building-level positioning accuracy.
Drawings
FIG. 1 is a flow chart of a positioning algorithm of the present invention;
FIG. 2 is a schematic view of the measurement of the present invention;
table 1 shows the active IP and port in the network segment where the target IP of the present invention is located (network segment 220.180.112.x detection result of the unit related to heat-retaining the urban area, eucommia and government enterprise, in bosch, Anhui province);
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without any inventive step, are within the scope of the present invention.
As shown in fig. 1, in the target-driven IP address usage unit inference algorithm of the present invention, first, it is determined whether an active IP in a network segment where a target IP is located is an anchor node; then judging whether the alternative anchor node is an effective anchor node or not and judging whether the alternative anchor node is an effective anchor node or not according to the type of the target IP equipment; and finally, measuring a network path to the target node through Traceroute, selecting N IP addresses closest to the node through a path matching method, and determining the geographic position of the target IP by adopting a centroid method or a nearest neighbor method. The detailed algorithm flow comprises the following steps:
the method comprises the following steps: through the detection of the well-known port and the registered port, the active IP address of the C-type network segment where the IP address is located is found, then the network service related to the position is detected, if no effective anchor node exists in the C segment where the target IP is located, the anchor node search is carried out from the two adjacent network segments C-1 and C +1 until the anchor node which is active enough is found. The device probing gets an open service expressed as:
Pi={P1,P2,P3,P4,…,Pm} (1)
step two: and judging whether the IP is a relay node or not according to the position of the node in the network, wherein the IP which is not positioned in the last hop in the path library is regarded as the relay node, and the IP is an end node if the IP is not positioned in the last hop in the path library. And for all the end nodes, judging whether the equipment types are NAT gateways, CDN nodes and servers according to the result of classifying the equipment types by the Bayesian judgment network, and if the equipment types are the NAT gateways, the CDN nodes and the servers, judging that the equipment types are effective anchor nodes.
Wherein the probability that the equipment Ei is Gi is;
Figure RE-GDA0003370077130000021
assuming that the services or ports opened by each network entity are independent, according to bayesian theory, equation (2) can be expressed as:
P(M|c)=P({m1,m2,…,mn}|c)=P(m1|c)P(m2|c)…P(mn|c) (3)
and clustering the effective reference nodes according to the network path similarity of the anchor nodes. If the geographic distance of the cluster Ci is larger than the radius of a city, the cluster Ci is indicated as a cloud node, otherwise, the cluster Ci is an independent host.
distance(Ai,Aj)=distance(Ai,LCAi,j)+distance(Ai,LCAi,j)
distance(Ci)=Max{distance(loc(IPm),loc(IPn))} (4)
Wherein, distance (Ai, Aj) is the distance between anchor nodes Ai and Aj, distance (Ci) is the distance between the clusters, IPm, IPn is equal to Ci.
In this example, all active IPs in the network segment 220.180.112.x/24 where the IP is located are detected, and the information of the active IP and its home unit is shown in table 1.
Figure RE-GDA0003370077130000031
TABLE 1
Step three: the IP address is divided into an IP responding to the measurement and an IP not responding to the measurement by responding to the situation of the network measurement by the target IP address. The measured IP that does not respond may be replaced by measuring the last hop IP address on the path; fingerprint information of the nodes is detected for the nodes responding to the measurement. Judging the properties of the target node according to the method in the step 2, and if the target node is a cloud computing node, acquiring the geographic position of a cloud service provider by calling a Baidu interface; if the host is an independent host, the network route to the target node is measured through Traceroute, then N IP addresses closest to the node are selected through a route matching method, and the geographical position of the target IP is determined by adopting a centroid method or a nearest neighbor method.
Using the centroid method, the target IP address is represented as:
Figure BDA0003223575180000032
wherein, (latp, longp) represents the coordinate of the node to be determined, and (lati, longi) represents the coordinate of the node i.
Using the optimal proximity rule:
Figure RE-GDA0003370077130000042
finally, as for Traceroute measurement of the IP addresses, the path last-but-last hops of 3 to-be-inferred IP addresses are '61.132.186.166', which is similar to the topological paths of the 5 anchor node IP addresses of 220.180.112.229, 220.180.112.232, 220.180.112.248, 220.180.112.249 and 220.180.112.149.
It can be seen from the data that 31 websites of relevant government and enterprise units in all websites of the C section where the IP to be detected is located, 3 websites of other types are located, and the websites of relevant government and enterprise units occupy 91%, and most websites are websites of relevant government and enterprise units, so that the IP of the network segment can be inferred to be mainly used by the relevant government and enterprise units, meanwhile, the 3 IP addresses can be further verified to be IP addresses used by the government and enterprise units according to the topological path approximation principle, and the real geographical position of the IP addresses is located in the Bozhou city data center with a high probability.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims (4)

1. A target-driven IP address geographic location inference method, characterized by: the method comprises the following steps:
step 1: collecting anchor node information, detecting an active IP address by using a network segment where a target IP is located, acquiring fingerprint characteristics of the active IP if the active IP exists, and judging whether the active IP is an anchor node;
step 2: calibrating the anchor nodes, and judging whether each alternative anchor node is an effective anchor node and a corresponding server node type according to whether the anchor node has a definite geographical position;
and step 3: and (3) geographic position inference, namely inferring the geographic position of different types of target nodes by measuring the similarity between the path of the IP to be positioned and the path of the anchor node, wherein the geographic position inference can be replaced by a centroid method or the geographic position of the adjacent anchor node.
2. The target-driven IP address geographic location inference method of claim 1, wherein: in the step 1, through detecting Well Known Ports (Well Known Ports) and Registered Ports (Registered Ports), the active IP address of the C-type network segment where the IP address is located is found, and then the network service related to the position is detected, if no effective anchor node exists in the C segment where the target IP is located, the anchor node search is carried out from two adjacent network segments C-1 and C +1 until a sufficient active anchor node is found,
wherein, the device detection obtains the open service as:
Pi={P1,P2,P3,P4,…,Pm} (1) 。
3. the target-driven IP address geographic location inference method of claim 1, wherein: in the step 2, whether the IP is a relay node is judged according to the position of the node in the network, all the IPs which are not positioned in the last hop in the path library are regarded as the relay node, and otherwise, the IP is a terminal node; judging whether the end nodes are NAT gateways, CDN nodes and servers according to the result of classifying the device types of the Bayesian judgment network for all the end nodes, and judging the end nodes to be effective anchor nodes if the end nodes are the server nodes;
wherein the probability that the equipment Ei is Gi is;
Figure RE-FDA0003370077120000011
assuming that the services or ports opened by each network entity are independent, according to bayesian theory, equation (2) can be expressed as:
P(M|c)=P({m1,m2,…,mn}|c)=P(m1|c) P(m2|c) … P(mn|c) (3)
clustering the effective reference nodes according to the network path similarity of the anchor nodes; if the geographic distance of the cluster Ci is greater than the radius of the city, the cluster Ci is a cloud node, otherwise, the cluster Ci is an independent host;
distance(Ai,Aj)=distance(Ai,LCAi,j)+distance(Ai,LCAi,j)
distance(Ci)=Max{distance(loc(IPm),loc(IPn))} (4)
wherein, distance (Ai, Aj) is the distance between anchor nodes Ai and Aj, distance (Ci) is the distance between the clusters, IPm, IPn is equal to Ci.
4. The target-driven IP address geographic location inference method of claim 1, wherein: in the step 3, the IP address is divided into an IP responding to the measurement and an IP not responding to the measurement according to the condition of the target IP address responding to the network measurement; the IP of the measurement which does not respond can be replaced by the IP address of the last hop on the measurement path, and the fingerprint information of the node is detected for the node which responds to the measurement; judging the properties of the target node according to the method in the step 2, if the target node is a cloud computing node, acquiring the geographic position of a cloud service provider by calling a Baidu interface, if the target node is an independent host, judging according to network topology proximity, firstly measuring a network path to the target node through Traceroute, then selecting N IP addresses nearest to the node through a path matching method, and determining the geographic position of the target IP through a centroid method or a nearest method; using the centroid method, the target IP address is represented as:
Figure RE-FDA0003370077120000021
using the best proximity rule, the target IP address is then expressed as:
Figure RE-FDA0003370077120000022
wherein, (latp, longp) represents the coordinate of the node to be determined, and (lati, longi) represents the coordinate of the node i.
CN202110964934.7A 2021-08-23 2021-08-23 Target-driven IP address geographic position inference method Pending CN113824810A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110964934.7A CN113824810A (en) 2021-08-23 2021-08-23 Target-driven IP address geographic position inference method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110964934.7A CN113824810A (en) 2021-08-23 2021-08-23 Target-driven IP address geographic position inference method

Publications (1)

Publication Number Publication Date
CN113824810A true CN113824810A (en) 2021-12-21

Family

ID=78913386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110964934.7A Pending CN113824810A (en) 2021-08-23 2021-08-23 Target-driven IP address geographic position inference method

Country Status (1)

Country Link
CN (1) CN113824810A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114785719A (en) * 2022-04-13 2022-07-22 北京亚鸿世纪科技发展有限公司 IP region attribution method for forming region fingerprint through ping command

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090279452A1 (en) * 2005-10-25 2009-11-12 Nec Corporation Hierarchical mobility management system, access router, anchor node, mobile communication system and route setting method
CN107920115A (en) * 2017-11-17 2018-04-17 南京莱克贝尔信息技术有限公司 A kind of City-level IP localization methods based on time delay and geographical consistency constraint
CN110012128A (en) * 2019-04-12 2019-07-12 中原工学院 Network entity terrestrial reference screening technique based on hop count
CN110300368A (en) * 2019-05-24 2019-10-01 中国人民解放军63880部队 A kind of IP geo-positioning system overall process method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090279452A1 (en) * 2005-10-25 2009-11-12 Nec Corporation Hierarchical mobility management system, access router, anchor node, mobile communication system and route setting method
CN107920115A (en) * 2017-11-17 2018-04-17 南京莱克贝尔信息技术有限公司 A kind of City-level IP localization methods based on time delay and geographical consistency constraint
CN110012128A (en) * 2019-04-12 2019-07-12 中原工学院 Network entity terrestrial reference screening technique based on hop count
CN110300368A (en) * 2019-05-24 2019-10-01 中国人民解放军63880部队 A kind of IP geo-positioning system overall process method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114785719A (en) * 2022-04-13 2022-07-22 北京亚鸿世纪科技发展有限公司 IP region attribution method for forming region fingerprint through ping command
CN114785719B (en) * 2022-04-13 2024-05-10 北京亚鸿世纪科技发展有限公司 IP region attribution method for forming region fingerprint through ping command

Similar Documents

Publication Publication Date Title
US8180887B2 (en) Geolocation mapping of network devices
Katz-Bassett et al. Towards IP geolocation using delay and topology measurements
CN110474843B (en) IP positioning method based on route hop count
US7496663B2 (en) System and method for detecting status changes in a network using virtual coordinate mapping
US7296088B1 (en) System and method for determining the geographic location of internet hosts
US7827279B2 (en) Selecting nodes close to another node in a network using location information for the nodes
US20170005877A1 (en) Data object and networking node locators
Zhao et al. IP Geolocation based on identification routers and local delay distribution similarity
Arif et al. Internet host geolocation using maximum likelihood estimation technique
CN110557286A (en) Method for effectively measuring and constructing IPv6 network topology
CN107920115B (en) City-level IP positioning method based on time delay and geographic consistency constraint
Ziviani et al. Toward a measurement-based geographic location service
CN113824810A (en) Target-driven IP address geographic position inference method
CN111711707B (en) IP address positioning method based on neighbor relation
CN111064817A (en) City-level IP positioning method based on node sorting
Hillmann et al. On the path to high precise ip geolocation: A self-optimizing model
Chen et al. A landmark calibration-based IP geolocation approach
Gueye et al. Leveraging buffering delay estimation for geolocation of Internet hosts
Jain et al. Internet distance prediction using node-pair geography
Xu et al. Netvigator: Scalable network proximity estimation
Hillmann et al. Dragoon: advanced modelling of IP geolocation by use of latency measurements
Xiang et al. No-jump-into-latency in china's internet! toward last-mile hop count based ip geo-localization
Wang et al. Target driven IP Geolocation Algorithm
CN110300368B (en) IP geographical positioning system overall processing method
Komosny et al. Estimation of Internet Node Location by Latency Measurements-The Underestimation Problem

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination