CN113824810A

CN113824810A - Target-driven IP address geographic position inference method

Info

Publication number: CN113824810A
Application number: CN202110964934.7A
Authority: CN
Inventors: 温胜昔; 季宇凯; 王占丰; 陈潇霆; 陈嘉欣; 马潇霄; 张一杭
Original assignee: Nanjing Lexbell Information Technology Co ltd
Current assignee: Nanjing Lexbell Information Technology Co ltd
Priority date: 2021-08-23
Filing date: 2021-08-23
Publication date: 2021-12-21

Abstract

The invention discloses a target-driven IP address geographic position inference method, which comprises the steps of firstly, carrying out anchor node acquisition based on a target IP; then, screening and calibrating the anchor nodes; and finally, comprehensively detecting the target IP to construct a target portrait and carrying out high-precision positioning. The invention can effectively infer the use information of the IP address, effectively solves the problem of low efficiency of blindly acquiring the anchor node in a large range, can further improve the accuracy of the comprehensive inference of the IP use information by more effective anchor nodes, and comprehensively infers the use and positioning information of the IP to be detected according to the route approximation principle and the geographical position information by comparing and analyzing the topological routes of the anchor node and the IP to be detected.

Description

Target-driven IP address geographic position inference method

Technical Field

The invention relates to the technical field of target IP building-level positioning, in particular to a building-level IP positioning method based on target driving and IP comprehensive inference.

Background

Currently, a common method for IP positioning is to estimate its geographical location by means of various information such as the IP's path, reference nodes, bearer service information, Whois data, etc. The basic principle of positioning algorithm design is to reduce the measurement overhead as much as possible under the condition of ensuring the positioning accuracy, and meanwhile, the method has good expansibility and does not need the support of a client. Initial location algorithms speculate the geographic location of an IP device by querying or mining information implicit in the hostname from a DNS server. In recent years, probability-based positioning algorithms become research hotspots again, and positioning is performed by searching a distribution rule of time delay and geographic distance. Because there are many IP positioning algorithms, classification can be performed according to different standards such as whether support of the client is required, the positioning principle, and the like. In the existing positioning algorithm, the positioning algorithm based on the client has the highest precision, but often by means of infrastructures such as a GPS, a cellular base station, a WiFi access point, and the like, the data is derived from analysis of either Whois data or operator data or network data, and the positioning precision and accuracy of the data cannot be guaranteed, so that the application range of the IP positioning data is greatly influenced. Although researchers have proposed many IP positioning algorithms, due to the lack of a large number of reference nodes, extensive deployment is not possible to achieve highly accurate results.

Disclosure of Invention

The invention provides a target-driven IP address positioning algorithm for overcoming the problems in the prior art, realizes dynamic collection of positioning anchor nodes around an IP address positioning target, and then deduces the target IP geographic position through path similarity comparison. By the method, large-scale network space detection and anchor node acquisition can be avoided, so that the overhead of network space measurement is reduced, the high-precision positioning of the IP address is realized, and the positioning precision can reach the street level or even the building level precision. In order to achieve the technical purpose and achieve the technical effects, the invention is realized by the following technical scheme:

step 1: collecting anchor node information, and acquiring an anchor node from a target IP network segment;

step 2: calibrating anchor node information, and judging the effectiveness and node type of an anchor node according to node equipment type information;

and step 3: deducing the geographic position, namely deducing the geographic position of the IP to be positioned by measuring the similarity between the path of the IP to be positioned and the path of the anchor node;

a target-driven IP address geographic position inference method comprises the following steps:

the invention provides a target-driven IP address geographic position inference method aiming at the problems that an IP positioning algorithm lacks a large number of reference nodes and cannot be deployed in a large range to obtain high-accuracy positioning. The algorithm is characterized in that an effective anchor node set in a target IP network segment is established, classification is carried out according to the target IP address response network measurement condition, the IP responding to the measurement is measured by using Traceroute to obtain a network path to the target node, then N IP addresses closest to the node are selected by a path matching method, and the geographical position of the target IP is determined by using a centroid method or a nearest neighbor method. The positioning method has higher building-level positioning accuracy.

Drawings

FIG. 1 is a flow chart of a positioning algorithm of the present invention;

FIG. 2 is a schematic view of the measurement of the present invention;

table 1 shows the active IP and port in the network segment where the target IP of the present invention is located (network segment 220.180.112.x detection result of the unit related to heat-retaining the urban area, eucommia and government enterprise, in bosch, Anhui province);

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without any inventive step, are within the scope of the present invention.

As shown in fig. 1, in the target-driven IP address usage unit inference algorithm of the present invention, first, it is determined whether an active IP in a network segment where a target IP is located is an anchor node; then judging whether the alternative anchor node is an effective anchor node or not and judging whether the alternative anchor node is an effective anchor node or not according to the type of the target IP equipment; and finally, measuring a network path to the target node through Traceroute, selecting N IP addresses closest to the node through a path matching method, and determining the geographic position of the target IP by adopting a centroid method or a nearest neighbor method. The detailed algorithm flow comprises the following steps:

the method comprises the following steps: through the detection of the well-known port and the registered port, the active IP address of the C-type network segment where the IP address is located is found, then the network service related to the position is detected, if no effective anchor node exists in the C segment where the target IP is located, the anchor node search is carried out from the two adjacent network segments C-1 and C +1 until the anchor node which is active enough is found. The device probing gets an open service expressed as:

Pi＝{P1,P2,P3,P4,…,Pm} (1)

step two: and judging whether the IP is a relay node or not according to the position of the node in the network, wherein the IP which is not positioned in the last hop in the path library is regarded as the relay node, and the IP is an end node if the IP is not positioned in the last hop in the path library. And for all the end nodes, judging whether the equipment types are NAT gateways, CDN nodes and servers according to the result of classifying the equipment types by the Bayesian judgment network, and if the equipment types are the NAT gateways, the CDN nodes and the servers, judging that the equipment types are effective anchor nodes.

Wherein the probability that the equipment Ei is Gi is;

assuming that the services or ports opened by each network entity are independent, according to bayesian theory, equation (2) can be expressed as:

P(M|c)＝P({m1,m2,…,mn}|c)＝P(m1|c)P(m2|c)…P(mn|c) (3)

and clustering the effective reference nodes according to the network path similarity of the anchor nodes. If the geographic distance of the cluster Ci is larger than the radius of a city, the cluster Ci is indicated as a cloud node, otherwise, the cluster Ci is an independent host.

distance(A_i,A_j)＝distance(A_i,LCA_i,j)+distance(A_i,LCA_i,j)

distance(Ci)＝Max{distance(loc(IPm),loc(IPn))} (4)

Wherein, distance (Ai, Aj) is the distance between anchor nodes Ai and Aj, distance (Ci) is the distance between the clusters, IPm, IPn is equal to Ci.

In this example, all active IPs in the network segment 220.180.112.x/24 where the IP is located are detected, and the information of the active IP and its home unit is shown in table 1.

TABLE 1

Step three: the IP address is divided into an IP responding to the measurement and an IP not responding to the measurement by responding to the situation of the network measurement by the target IP address. The measured IP that does not respond may be replaced by measuring the last hop IP address on the path; fingerprint information of the nodes is detected for the nodes responding to the measurement. Judging the properties of the target node according to the method in the step 2, and if the target node is a cloud computing node, acquiring the geographic position of a cloud service provider by calling a Baidu interface; if the host is an independent host, the network route to the target node is measured through Traceroute, then N IP addresses closest to the node are selected through a route matching method, and the geographical position of the target IP is determined by adopting a centroid method or a nearest neighbor method.

Using the centroid method, the target IP address is represented as:

wherein, (latp, longp) represents the coordinate of the node to be determined, and (lati, longi) represents the coordinate of the node i.

Using the optimal proximity rule:

finally, as for Traceroute measurement of the IP addresses, the path last-but-last hops of 3 to-be-inferred IP addresses are '61.132.186.166', which is similar to the topological paths of the 5 anchor node IP addresses of 220.180.112.229, 220.180.112.232, 220.180.112.248, 220.180.112.249 and 220.180.112.149.

It can be seen from the data that 31 websites of relevant government and enterprise units in all websites of the C section where the IP to be detected is located, 3 websites of other types are located, and the websites of relevant government and enterprise units occupy 91%, and most websites are websites of relevant government and enterprise units, so that the IP of the network segment can be inferred to be mainly used by the relevant government and enterprise units, meanwhile, the 3 IP addresses can be further verified to be IP addresses used by the government and enterprise units according to the topological path approximation principle, and the real geographical position of the IP addresses is located in the Bozhou city data center with a high probability.

The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims

1. A target-driven IP address geographic location inference method, characterized by: the method comprises the following steps:

step 1: collecting anchor node information, detecting an active IP address by using a network segment where a target IP is located, acquiring fingerprint characteristics of the active IP if the active IP exists, and judging whether the active IP is an anchor node;

step 2: calibrating the anchor nodes, and judging whether each alternative anchor node is an effective anchor node and a corresponding server node type according to whether the anchor node has a definite geographical position;

and step 3: and (3) geographic position inference, namely inferring the geographic position of different types of target nodes by measuring the similarity between the path of the IP to be positioned and the path of the anchor node, wherein the geographic position inference can be replaced by a centroid method or the geographic position of the adjacent anchor node.

2. The target-driven IP address geographic location inference method of claim 1, wherein: in the step 1, through detecting Well Known Ports (Well Known Ports) and Registered Ports (Registered Ports), the active IP address of the C-type network segment where the IP address is located is found, and then the network service related to the position is detected, if no effective anchor node exists in the C segment where the target IP is located, the anchor node search is carried out from two adjacent network segments C-1 and C +1 until a sufficient active anchor node is found,

wherein, the device detection obtains the open service as:

P_i＝{P₁,P₂,P₃,P₄,…,P_m} (1) 。

3. the target-driven IP address geographic location inference method of claim 1, wherein: in the step 2, whether the IP is a relay node is judged according to the position of the node in the network, all the IPs which are not positioned in the last hop in the path library are regarded as the relay node, and otherwise, the IP is a terminal node; judging whether the end nodes are NAT gateways, CDN nodes and servers according to the result of classifying the device types of the Bayesian judgment network for all the end nodes, and judging the end nodes to be effective anchor nodes if the end nodes are the server nodes;

wherein the probability that the equipment Ei is Gi is;

P(M|c)＝P({m1,m2,…,mn}|c)＝P(m1|c) P(m2|c) … P(mn|c) (3)

clustering the effective reference nodes according to the network path similarity of the anchor nodes; if the geographic distance of the cluster Ci is greater than the radius of the city, the cluster Ci is a cloud node, otherwise, the cluster Ci is an independent host;

distance(A_i,A_j)＝distance(A_i,LCA_i,j)+distance(A_i,LCA_i,j)

distance(Ci)＝Max{distance(loc(IPm),loc(IPn))} (4)

4. The target-driven IP address geographic location inference method of claim 1, wherein: in the step 3, the IP address is divided into an IP responding to the measurement and an IP not responding to the measurement according to the condition of the target IP address responding to the network measurement; the IP of the measurement which does not respond can be replaced by the IP address of the last hop on the measurement path, and the fingerprint information of the node is detected for the node which responds to the measurement; judging the properties of the target node according to the method in the step 2, if the target node is a cloud computing node, acquiring the geographic position of a cloud service provider by calling a Baidu interface, if the target node is an independent host, judging according to network topology proximity, firstly measuring a network path to the target node through Traceroute, then selecting N IP addresses nearest to the node through a path matching method, and determining the geographic position of the target IP through a centroid method or a nearest method; using the centroid method, the target IP address is represented as:

using the best proximity rule, the target IP address is then expressed as: