CN111985569B - Anonymous node positioning method based on multi-source point clustering idea - Google Patents

Anonymous node positioning method based on multi-source point clustering idea Download PDF

Info

Publication number
CN111985569B
CN111985569B CN202010851544.4A CN202010851544A CN111985569B CN 111985569 B CN111985569 B CN 111985569B CN 202010851544 A CN202010851544 A CN 202010851544A CN 111985569 B CN111985569 B CN 111985569B
Authority
CN
China
Prior art keywords
anonymous
clustering
node
anonymous node
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010851544.4A
Other languages
Chinese (zh)
Other versions
CN111985569A (en
Inventor
夏勇
栾吉海
李宁
张兆心
赵东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology Weihai
Original Assignee
Harbin Institute of Technology Weihai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology Weihai filed Critical Harbin Institute of Technology Weihai
Priority to CN202010851544.4A priority Critical patent/CN111985569B/en
Publication of CN111985569A publication Critical patent/CN111985569A/en
Application granted granted Critical
Publication of CN111985569B publication Critical patent/CN111985569B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/50Address allocation
    • H04L61/5007Internet protocol [IP] addresses

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to an anonymous node positioning method based on a multi-source point clustering idea, which aims to reduce the interference of anonymous nodes in an IP path obtained by Traceroute on real network routing nodes and comprises the following steps: acquiring domestic ip addresses, geographic positions and longitude and latitude; using ping command to detect and filter the liveness of ip, and extracting the live ip address; storing the IP of the detected geographical position into a database; deploying a server near a clustering center obtained by a k-means algorithm, and carrying out traceroute detection on target nodes in the same category; acquiring a time delay curve, extracting the characteristics of the time delay curve, performing hierarchical clustering, merging IP paths obtained by traceroute according to the structure of a chromatographic tree, merging points which may be the same anonymous node, and recording IP addresses of a previous hop and a next hop; and calculating a set center consisting of the IP of the last hop and the IP of the next hop of the anonymous node pair, and calculating the longitude and latitude by using the Euclidean distance to be used as the physical position of the anonymous node.

Description

Anonymous node positioning method based on multi-source point clustering idea
Technical Field
The invention relates to the technical field of computers, in particular to an anonymous node positioning method based on a multi-source point clustering idea.
Background
In consideration of security and service characteristics, a large number of application services in the existing Internet network are developed in an anonymous manner, and a normal communication opposite terminal cannot know the position of an information sender according to the anonymous identification. However, anonymous communication is often adopted for illegal information and junk information on the network, which may destroy network security, and for national information regulatory departments or individuals, it is sometimes necessary to trace the source of the anonymous communication to locate the source of the illegal information and the junk information. Because alias and anonymous interference exist in the network, the result obtained by the topology measurement of the IP level has a small difference with the real network environment, and a method for positioning the anonymous node is urgently needed in order to reduce the interference of the anonymous node in an IP path obtained by Traceroute to the real network routing node.
Disclosure of Invention
The invention provides an anonymous node positioning method based on a multi-source point clustering idea of k-means clustering and hierarchical clustering, which aims to reduce interference of anonymous nodes in an IP path obtained by Traceroute on real network routing nodes.
The invention provides an anonymous node positioning method based on a multi-source point clustering idea, which comprises the following steps:
A. acquiring domestic ip addresses, geographic positions and longitude and latitude;
B. using ping command to detect and filter the liveness of ip, and extracting the live ip address;
C. storing the IP of the detected geographical position into a database, and taking the additional information as other bases of classification;
D. deploying a server near a clustering center obtained by a k-means algorithm, and carrying out traceroute detection on target nodes in the same category;
E. meanwhile, multiple Ping command detections are carried out on the target IP from the source point to obtain a time delay curve, and the characteristics of the time delay curve are extracted;
F. performing hierarchical clustering on the extracted features to obtain a chromatographic tree, merging IP paths obtained by traceroute according to the structure of the chromatographic tree, merging points which are the same anonymous node, and recording the IP addresses of the previous hop and the next hop;
G. and calculating a set center consisting of the IP of the last hop and the IP of the next hop of the anonymous node pair, and calculating the longitude and latitude by using the Euclidean distance to be used as the physical position of the anonymous node.
Preferably, the specific method of step a is: judging whether the IP address is alive by Ping and other commands, and acquiring latitude and longitude information by using IPtoregin.
Preferably, the additional information in step C includes an economic grade, a degree of development and a city grade of a city.
Preferably, the specific steps of step C are as follows:
a. collecting surviving IP addresses;
b. obtaining longitude and latitude information of the city by using an external API (application program interface), and recording the city;
c. recording city grades published on the network by using python;
d. using longitude, latitude and city grade as the characteristic of clustering;
e. and setting the clustering number of the K-means clustering method as 3, or automatically calculating the optimal K value through an algorithm.
Preferably, the K-means clustering method in step e specifically comprises: selecting K objects from the data as initial clustering centers:
(1) Calculating the distance from each clustering object to the clustering center for division;
(2) Calculating each cluster center again;
(3) And repeating the steps until the requirements are met.
Preferably, the specific steps of obtaining the delay curve and obtaining the curve characteristics in step E are as follows:
(A) A large number of IP belonging to the same class are subjected to Ping operation for multiple times in a short time, and the variation of time delay is recorded to draw a characteristic curve;
(B) Extracting the characteristic value of the curve by wavelet decomposition.
Preferably, step F performs hierarchical clustering on the characteristic curves, then fuses paths obtained by a plurality of traceroutes according to a hierarchical clustering result, and merges points that are the same anonymous node, and the specific steps are as follows:
1) Performing hierarchical clustering on the characteristic values obtained by wavelet decomposition, and recording printing information;
2) Selecting two Traceroute paths to be fused for anonymous fusion;
3) And recording the IP addresses of the last hop and the next hop of the fused anonymous node set as a basis for positioning the anonymous node.
Preferably, the criterion for step 2) fusion is:
a) Merging anonymous nodes with the same father node and child node into one node;
b) Merging anonymous nodes without father nodes but with same child nodes into one node;
c) Merge the last point of the same parent node but without child nodes into one node.
Preferably, the positioning of the anonymous node in the step G includes the specific steps of:
a) The previous hop and the next hop of the anonymous node set have more than or equal to 2 known IP addresses, a range is calculated and determined by Euclidean distance according to the information of the longitude and latitude coordinates of the known IP, and the center of the range is taken as the physical address of the anonymous node;
b) When the last hop and the next hop of the anonymous node set have only one known IP, the point and the IP address of the destination node are directly averaged to be used as the physical address of the point.
The beneficial effects of the invention are: the invention can restore a relatively real network environment, and because alias and anonymous interference exist in the network, the result obtained by the topological measurement of the routing level has a small difference with the real network environment.
Drawings
FIG. 1 is a schematic flow chart of the operation of the present invention;
FIG. 2 is a schematic thermodynamic diagram of the present invention testing surviving IP;
FIG. 3 is a schematic diagram of a data listing for hierarchical clustering in accordance with the present invention;
FIG. 4 is a schematic view of data visualization of hierarchical clustering in accordance with the present invention.
Detailed Description
The present invention is further described below with reference to the drawings and examples so that those skilled in the art can easily practice the present invention.
Example 1: the invention provides a method for positioning anonymous nodes, which is an operation flow diagram of the invention as shown in figure 1, and the invention specifically comprises the following steps:
A. the method comprises the following steps of obtaining domestic ip addresses, cities and longitude and latitude, wherein the format of the domestic ip addresses is as follows: IP address, geographic location, latitude and longitude.
B. And (4) carrying out survivability detection and filtration on the ip by using a ping command, and extracting the survivable ip address.
C. The IP with the detectable geographic position is stored in a database, and other information can be attached to the database to serve as other bases for classification.
D. And deploying a server near the clustering center obtained by a k-means algorithm to perform traceroute detection on the target nodes in the same class.
E. And meanwhile, carrying out Ping command detection on the target IP from the source point for multiple times, acquiring a time delay curve, and carrying out feature extraction on the time delay curve.
F. And then, carrying out hierarchical clustering on the extracted features to obtain the chromatographic tree. And merging the IP paths obtained by traceroute according to the structure of the chromatographic tree. Merging the points which are possibly the same anonymous node, and recording the ip addresses of the previous hop and the next hop.
G. And calculating a set center consisting of the IP of the last hop and the IP of the next hop of the anonymous node pair, wherein the latitude and the longitude are calculated by using the Euclidean distance as the physical position of the anonymous node.
The above is the basic flow of the present invention, and the specific flow of each step will be further described below:
in the step A, the IP address and the longitude and latitude information thereof are obtained, and because the IP address range of China is known, whether the IP address is alive or not can be judged by Ping and other commands, and the longitude and latitude information is obtained by utilizing IPtoregin, evian science and technology and other Api.
In step B, the IP is survivability detected, and since the IP is not always detected, the existing IP is survivability detected again. And the accuracy of the geographical position of the anonymous node obtained subsequently is ensured.
In step C, other information is added, which may include economic level, development degree, and several lines of cities. The choice of the probing source point is important, but there is not much information to refer to before the probing source point is chosen. The detection source point needs to select points with more surrounding IP nodes as source points as much as possible, so that the topological structure as complete as possible can be obtained, and the position of the anonymous node can be deduced more accurately. In economically developed locations, there will be more surrounding IP. The level of economic development can be added to the classification feature. The concrete steps of the step C are as follows:
a. live IP addresses are collected.
b. And obtaining longitude and latitude information of the city by using an external API (application program interface), and recording the city.
c. The city grades published on the internet are recorded by using python, for example, the first-line city of Beijing is recorded as 1, the Haerbin is recorded as 2, and the city grades are considered because the city with higher economic development in the same region is more beneficial to subsequent operation and accuracy as the center.
d. (longitude, latitude, city level) is taken as the feature of the cluster.
e. K can be set to be 3 in consideration of economic feasibility, and an optimal K value can be automatically calculated through an algorithm.
The K-means clustering method of step e is further explained:
(1) Selecting K objects from the data as initial clustering centers;
(2) Calculating the distance from each clustering object to a clustering center for division;
(3) Calculating each cluster center again;
(4) And repeating the steps until the requirements are met.
In step E, a time delay curve is obtained, and the steps of obtaining the curve characteristics are as follows:
(A) Performing Ping operation on a large number of IP belonging to the same class for multiple times in a short time, recording the variation of time delay, and drawing a characteristic curve;
(B) Wavelet decomposition is used to extract the characteristic values of the curve.
Further, for step (a), since the condition of the network fluctuates in each time period, it is required to perform intra-class detection of multiple source points in a short time, and thus the advantages of such detection are as follows:
because the nodes to be detected are too large, it cannot be guaranteed that a large number of time delay curves can be obtained at the same time. Therefore, the intra-class detection is adopted, so that the local integrity can be ensured, and the detection pressure can be shared.
Although the clustering is not classified according to the network environment because the source point is pre-selected, the geographic location of the IP in the same area has similar characteristics. This also facilitates later hierarchical clustering.
And step F is the core of anonymous fusion, hierarchical clustering is carried out on the characteristic curves, paths obtained by a plurality of traceroutes are fused according to the hierarchical clustering result, and points which may be the same anonymous node are merged. The method comprises the following specific steps:
1) Performing hierarchical clustering on the characteristic values obtained by wavelet decomposition, and recording printing information;
2) Two Traceroute paths to be fused are selected for anonymous fusion, wherein the fusion criterion mainly comprises 3 points:
a) Merging anonymous nodes with the same parent node and child node into one node;
b) Merging anonymous nodes without father nodes but with same child nodes into one node;
c) Merge the last point of the same parent node but without child nodes into one node.
And recording the IP addresses of the previous hop and the next hop of the fused anonymous node set as a basis for positioning the anonymous node.
In the positioning of the anonymous node in the step G, the two situations are totally divided:
a) There are more than or equal to 2 known IP addresses for the last hop, the next hop of the set of anonymous nodes. The Euclidean distance is calculated according to the information of known IP longitude and latitude coordinates, and as a range can be determined by a plurality of up-and-down-hop IP addresses, the position of an anonymous node is inevitably in the formed range, so that the center of the anonymous node can be taken as the physical address of the anonymous pair.
b) When the previous hop or the next hop has only one known IP, the point and the IP address of the destination node are directly averaged to be used as the physical address, and the two points are determined according to the following conditions:
one) accuracy is guaranteed since the IP within the class is measured.
Second) since the physical positions of the previous hop and the next hop do not converge too far, and generally, the situation only occurs in the case that the last node is anonymous, the coordinates of the node are located by using the previous hop and the destination node as the basis.
Example 2: the present invention takes probing the IP within its class from only one point as an example:
and step 0, preprocessing and k-means clustering. The clustering centers and the IPs within their classes are recorded. As shown in fig. 2, the selected partial survival IP thermodynamic diagram shows that the IP near shanghai is abundant, so this embodiment only defines the probe source point in shanghai.
Step 1, a Ping command of a system is called by python, more than 10 Ping operations are simultaneously carried out on the IP of the intra-class object, and a delay curve is drawn.
And 2, performing wavelet basis decomposition on the time delay curve by using the pywt packet of python to obtain a characteristic vector of the time delay.
And 3, calling a system Traceroute command by using python to acquire a Traceroute path.
And 4, clustering the characteristic values by using the python hierarchical clustering packet. Finally, hierarchical clustering and a chromatographic tree are obtained, anonymous fusion is carried out by matching with a traceroute path, and the result is shown in fig. 3-4.
And step 5, selecting a corresponding traceroute path from the database for fusion according to the first two columns of the data. The resulting data format is: { (anonymous node pair), (IP over and under hops) }
And 6, positioning the physical address of the anonymous node by utilizing the up-down hop information.
The above description is only for the purpose of illustrating preferred embodiments of the present invention and is not to be construed as limiting the present invention, and it is apparent to those skilled in the art that various modifications and variations can be made in the present invention. All modifications, equivalents, improvements and the like which come within the scope of the invention as defined by the claims should be understood as falling within the scope of the invention.

Claims (9)

1. An anonymous node positioning method based on a multi-source point clustering idea is characterized by comprising the following steps:
A. acquiring domestic ip addresses, geographic positions and longitude and latitude;
B. using ping command to detect and filter the liveness of ip, and extracting the live ip address;
C. storing the IP of the detected geographical position into a database, and taking the additional information as other bases of classification;
D. deploying a server near a clustering center obtained by a k-means algorithm, and carrying out traceroute detection on target nodes in the same category;
E. meanwhile, carrying out Ping command detection on a target IP from a source point for multiple times to obtain a time delay curve, and carrying out feature extraction on the time delay curve;
F. performing hierarchical clustering on the extracted features to obtain a chromatographic tree, merging IP paths obtained by traceroute according to the structure of the chromatographic tree, merging points which are the same anonymous node, and recording the IP addresses of the previous hop and the next hop;
G. and calculating a set center consisting of the IP of the last hop and the IP of the next hop of the anonymous node pair, and calculating the longitude and latitude by using the Euclidean distance to be used as the physical position of the anonymous node.
2. The anonymous node location method based on the multi-source point clustering idea of claim 1, wherein the specific method of step a is as follows: judging whether the IP address is alive by Ping and other commands, and acquiring latitude and longitude information by using IPtoregin.
3. The anonymous node location method based on the multi-source clustering idea of claim 1, wherein the additional information in step C comprises economic level, development degree and city level of a city.
4. The anonymous node location method based on the multi-source point clustering idea of claim 3, wherein the specific steps of step C are as follows:
a. collecting surviving IP addresses;
b. obtaining longitude and latitude information of the city by using an external API (application program interface), and recording the city;
c. recording city grades published on the network by using python;
d. using longitude, latitude and city grade as the characteristic of clustering;
e. and setting the clustering number of the K-means clustering method as 3, or automatically calculating the optimal K value through an algorithm.
5. The anonymous node location method based on the multi-source clustering idea of claim 4, wherein the K-means clustering method of step e comprises: selecting K objects from the data as initial clustering centers:
(1) Calculating the distance from each clustering object to the clustering center for division;
(2) Calculating each cluster center again;
(3) And repeating the steps until the requirements are met.
6. The anonymous node location method based on the multi-source point clustering idea of claim 1, wherein the specific steps of obtaining the delay curve and obtaining the curve characteristics in step E are as follows:
(A) Repeatedly Ping a large number of IP belonging to the same class in a short time, recording the variation of time delay and drawing a characteristic curve;
(B) Wavelet decomposition is used to extract the eigenvalues of the curve.
7. The method for locating anonymous nodes based on the idea of multisource point clustering according to claim 6, wherein the step F carries out hierarchical clustering on the characteristic curve, then carries out fusion on paths obtained by a plurality of corresponding traceroutes according to the result of the hierarchical clustering, and merges the points which are the same anonymous node, and the method comprises the following specific steps:
1) Performing hierarchical clustering on the characteristic values obtained by wavelet decomposition, and recording printing information;
2) Selecting two Traceroute paths to be fused for anonymous fusion;
3) And recording the IP addresses of the previous hop and the next hop of the fused anonymous node set as a basis for positioning the anonymous node.
8. The anonymous node location method based on the multi-source point clustering idea of claim 7, wherein the step 2) is based on the fused criterion:
a) Merging anonymous nodes with the same parent node and child node into one node;
b) Merging anonymous nodes without father nodes but with same child nodes into one node;
c) Merge the last point of the same parent node but without child nodes into one node.
9. The anonymous node positioning method based on the multi-source point clustering idea of claim 1, wherein the anonymous node positioning in the step G specifically comprises the following steps:
a) The previous hop and the next hop of the anonymous node set have more than or equal to 2 known IP addresses, a range is calculated and determined by Euclidean distance according to the information of the longitude and latitude coordinates of the known IP, and the center of the range is taken as the physical address of the anonymous node;
b) When the last hop and the next hop of the anonymous node set have only one known IP, the point and the IP address of the destination node are directly averaged to be used as the physical address of the point.
CN202010851544.4A 2020-08-21 2020-08-21 Anonymous node positioning method based on multi-source point clustering idea Active CN111985569B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010851544.4A CN111985569B (en) 2020-08-21 2020-08-21 Anonymous node positioning method based on multi-source point clustering idea

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010851544.4A CN111985569B (en) 2020-08-21 2020-08-21 Anonymous node positioning method based on multi-source point clustering idea

Publications (2)

Publication Number Publication Date
CN111985569A CN111985569A (en) 2020-11-24
CN111985569B true CN111985569B (en) 2022-10-14

Family

ID=73444147

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010851544.4A Active CN111985569B (en) 2020-08-21 2020-08-21 Anonymous node positioning method based on multi-source point clustering idea

Country Status (1)

Country Link
CN (1) CN111985569B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553541B (en) * 2021-06-04 2023-10-13 扬州大学 Independent path analysis-based source positioning method under independent cascade model
CN113395211B (en) * 2021-06-08 2022-11-18 哈尔滨工业大学(威海) Routing IP positioning optimization method based on clustering idea

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404664A (en) * 2008-11-05 2009-04-08 湖南大学 Network positioning and optimizing algorithm based on node clustering
CN103218397B (en) * 2013-03-12 2016-03-02 浙江大学 A kind of social networks method for secret protection based on non-directed graph amendment
CN104702667B (en) * 2015-01-30 2018-04-27 武汉大学 A kind of method and device of application service system extension
CN106254153B (en) * 2016-09-19 2019-12-10 腾讯科技(深圳)有限公司 Network anomaly monitoring method and device
CN108011746B (en) * 2017-10-25 2021-06-29 北京知道未来信息技术有限公司 IP-level global Internet topology mapping method based on Traceroute and SNMP protocol
CN110012120A (en) * 2019-03-14 2019-07-12 罗向阳 A kind of IP City-level location algorithm based on PoP network topology

Also Published As

Publication number Publication date
CN111985569A (en) 2020-11-24

Similar Documents

Publication Publication Date Title
CN111985569B (en) Anonymous node positioning method based on multi-source point clustering idea
TWI639324B (en) Method and device for determining IP address segment and its corresponding latitude and longitude
CN104699835A (en) Method and device used for determining webpages including POI (point of interest) data
CN108875806B (en) False forest fire hot spot mining method based on space-time data
CN108304423A (en) A kind of information identifying method and device
CN112013862B (en) Pedestrian network extraction and updating method based on crowdsourcing trajectory
CN110995885B (en) IP positioning method based on router error training
US20170053031A1 (en) Information forecast and acquisition method based on webpage link parameter analysis
CN108898159B (en) False forest fire hot spot filtering method based on DBSCAN algorithm
CN114070760B (en) Mapping method and related device for network space asset
US9542471B2 (en) Method of building a geo-tree
CN111064817B (en) City-level IP positioning method based on node sorting
Li et al. Street-Level Landmarks Acquisition Based on SVM Classifiers.
CN112119614B (en) Geographic positioning system and method
CN106446102B (en) Terminal positioning method and device based on map fence
CN109547294B (en) Networking equipment model detection method and device based on firmware analysis
WO2017000817A1 (en) Method and device for acquiring matching relationship between data
CN114520799A (en) Urban IP positioning and error estimation method and system based on minimum circle coverage
CN108199878B (en) Personal identification information identification system and method in high-performance IP network
CN113395211B (en) Routing IP positioning optimization method based on clustering idea
CN113590909A (en) Method for positioning geographic position of domain name root mirror image node based on multi-source information
CN110401639B (en) Method and device for judging abnormality of network access, server and storage medium thereof
CN113242332A (en) Improved method for forming street-level positioning library
CN109450927B (en) System and method for quickly identifying access camera
CN109039815B (en) Routing interruption node detection method and device based on historical data perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant