CN114884850A - Method for determining IP address attribution by combining route tracking instruction characteristics with graph calculation analysis - Google Patents

Method for determining IP address attribution by combining route tracking instruction characteristics with graph calculation analysis Download PDF

Info

Publication number
CN114884850A
CN114884850A CN202210375880.5A CN202210375880A CN114884850A CN 114884850 A CN114884850 A CN 114884850A CN 202210375880 A CN202210375880 A CN 202210375880A CN 114884850 A CN114884850 A CN 114884850A
Authority
CN
China
Prior art keywords
node
address
nodes
sample
steep slope
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210375880.5A
Other languages
Chinese (zh)
Inventor
林飞
陈维
易永波
古元
毛华阳
华仲峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Act Technology Development Co ltd
Original Assignee
Beijing Act Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Act Technology Development Co ltd filed Critical Beijing Act Technology Development Co ltd
Priority to CN202210375880.5A priority Critical patent/CN114884850A/en
Publication of CN114884850A publication Critical patent/CN114884850A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method for determining IP address attribution by combining route tracking instruction characteristics with graph calculation analysis, which relates to the technical field of information and comprises the following steps: 1) dialing an IP outside a measurement domain by a routing tracking instruction to acquire graph characteristics and finishing a preliminary training sample; 2) calculating border level routes and perfecting training samples; 3) inputting the training samples into a vector machine, and training through multi-feature configuration to obtain an IP address attribution automatic judgment model. The method does not need huge IP library maintenance burden, avoids large-scale data processing, large data cluster requirements and data storage burden, has high timeliness of data analysis, reduces the manual labeling burden of large-scale IP data in an applicable scene, and can be suitable for rapid area positioning of ipv6 addresses.

Description

Method for determining IP address attribution by combining route tracking instruction characteristics with graph calculation analysis
Technical Field
The invention relates to the technical field of information.
Background
The traceroute instruction traceroute is a tool for detecting the number of gateways that a host sending a packet passes through to a target host. The principle of traceroute instruction is to try to send out probe packets with the minimum TTL to trace the data packets to the gateway through which the target host passes, and then to listen for an acknowledgement from the gateway ICMP. The size of the data packet sent by the traceroute instruction is defaulted to 38 bytes. TTL is called time to live. ICMP is an Internet control message protocol. It is a sub-protocol of the TCP/IP protocol suite that is used to pass control messages between IP hosts and routers.
At present, methods for judging whether an IP address is an outbound address or an inbound address are all realized by an API provided by a third party, and the API realizes an inquiry instruction for an IP address library, so that the judgment of whether the IP address is an inbound address or an outbound address depends on the accuracy and capacity of the IP address library. And many third party libraries are charged according to the number of times of inquiry, so that the cost is high and the accuracy is not high enough. The invention provides a method for determining the attribution of an IP address by combining the characteristic of a routing tracking instruction with the analysis of a graph algorithm, which can judge the attribution of the IP address in home and abroad by using a very low routing tracking instruction.
The method records the characteristics generated by the overseas IP of the routing tracking instruction to generate a characteristic set, forms a judgment model by utilizing a machine learning characteristic set, and can distinguish the overseas IP address and the overseas IP address by inputting the characteristic set into the judgment model only through once universal traceroute with very low cost. The feature set comprises: a delay steep hill feature, a repeat past border routing feature, an overall response time duration feature, and other ancillary features.
Delay of steep slope interpretation: each path node of the traceroute instruction has a response time, and when a large area network is spanned, the network delay rises obviously, namely, the delay space suddenly rises to a steep slope, so that the traceroute instruction is named as a delay steep slope. Through countless traceroute cases, the delay steep slope on the outbound path is very obvious, not only the IP of the out-of-domain countries with far geographical distances has the obvious delay steep slope, but also the very close out-of-domain countries has the obvious delay steep slope.
The algorithm for approximating centrality in the field of graph algorithms is a common algorithm, the core of the algorithm is to calculate the average distance of other nodes, and the point with high score of approximating centrality has the shortest distance with other nodes. In network data formed by traceroute, only the frontier centrality score is high.
The nodes found by the intermediate center algorithm in the graph algorithm field are the nodes through which the shortest path of each pair of nodes in the graph passes. In a graph of the route trace instruction traceroute outbound IP address, the border route satisfies the feature of intermediate centrality.
The SVM refers to a support vector machine, and is a common discrimination method. In the field of machine learning, a supervised learning model is typically used for pattern recognition, classification, and regression analysis. Through a proper amount of samples and an SVM (support vector machine) learning method, the characteristic parameters can be divided into the most appropriate and accurate values through a hyperplane. The method supports the optimal division of the total delay time and the steep slope into time length ranges and forms the most reasonable combination of the characteristics.
Disclosure of Invention
In view of the defects of the prior art, the method for determining the attribution of the IP address by combining the route tracking instruction characteristic with the graph calculation analysis comprises the following steps: 1) dialing an IP outside a measurement domain by a routing tracking instruction to acquire graph characteristics and finishing a preliminary training sample; 2) calculating border level routes and perfecting training samples; 3) inputting the training sample into a vector machine, and training through multi-feature configuration to obtain an IP address attribution automatic judgment model;
1) the routing tracking instruction dials the IP outside the measuring area to obtain the graph characteristics, and the preliminary training sample is completed
The method comprises the steps of collecting 100000, determining addresses around the world in various foreign countries to obtain a foreign IP address sample set, and obtaining public resources of the foreign IP address set;
deploying detection points, wherein the detection points at least comprise a China telecommunication public network IP detection point, a China Mobile public network IP detection point and a China Unicom public network IP detection point; the server of each detection point carries out a routing tracking instruction request on a foreign IP address sample set, and each routing tracking instruction generates a unique dial testing number which is called as the dial testing number of the routing tracking instruction;
thirdly, the server of each detection point establishes a sample graph library according to the return information of the route tracking instruction, and the sample graph library comprises: a. recording each IP address passed by the route tracing instruction as a node, recording the front-back relationship of the IP addresses passed by the route tracing instruction in sequence as a directed path, for example, 30 nodes and 29 directed paths are formed when one route tracing instruction passes by 30 hops; b. the data stored by the nodes of the sample gallery includes: the node IP address, the response time on the node, the directed path to which the node belongs and the dial testing number of the route tracking instruction to which the node belongs; c. eliminating the asterisk if the asterisk appears in returned data during the dial test of the routing tracking instruction;
fourthly, storing a sample gallery as a primary training sample;
2) computing border level routes and perfecting training samples
Inputting a preliminary training sample into a middle centrality algorithm, and calculating a node in a central position of information traffic; the intermediate centrality algorithm is:
Figure 641886DEST_PATH_IMAGE001
u is a node, p is the number of shortest paths between the node s and the node t, p (u) is the number of shortest paths between s and the node t through the node u, and the higher the value of B (u), the higher the probability that the node u is taken as a central node;
secondly, IP addresses of the first 2000 nodes with the maximum B (u) values are obtained in a preliminary training sample, and the IP addresses are input into a near-centrality algorithm to calculate the nodes with the strongest information transmission capability; the approximate centrality algorithm is:
Figure 870481DEST_PATH_IMAGE002
d (u, v) is the distance between u and v, u is the target node, v is other nodes except the target node, and the larger the value of C (u), the stronger the information transmission capability of the node u;
computing the score M of the node becoming the boundary route,
Figure 918246DEST_PATH_IMAGE003
taking the first 2000 nodes with the maximum M value as border routing nodes to obtain a border routing node library;
fourthly, adding a border route node library into the preliminary training sample to generate an improved training sample;
3) inputting the training sample into a vector machine, training by multi-feature configuration to obtain an IP address attribution automatic judgment model
Judging whether nodes of the same dial test number contain border routing nodes or not according to the dial test number, outputting 1 by a vector machine when the nodes of the same dial test number contain the border routing nodes, and outputting 0 by the vector machine when the nodes of the same dial test number do not contain the border routing nodes; outputting 1 by the vector machine, and generating steep slope undetermined sample data from data stored in nodes with the same dial testing number; if the vector machine outputs 0, the target IP address is judged to be an internal IP address;
secondly, calculating the delay percentage of the undetermined sample data of the steep slope, dividing the response time of the former node by the response time of the latter node to obtain the delay percentage, judging that the delayed steep slope occurs when the delay percentage is more than 200 percent, and generating the undetermined sample data of the steep slope with the delayed steep slope when the delayed steep slope occurs to respond to the undetermined sample data; when the delay steep slope does not appear, judging the target IP address as an internal IP address by using the sample data to be determined of the steep slope without the delay steep slope;
thirdly, judging the response time of the last node in the response pending sample data, and generating comprehensive pending sample data by responding to the pending sample data when the response time of the last node is more than or equal to 200 milliseconds; when the response time of the last node is less than 200 milliseconds, judging that the target IP address responding to the undetermined sample data is an internal IP address;
comparing the node which is judged to be the border route in the comprehensive undetermined sample data with the node with the delayed steep slope, and judging that the target IP of the comprehensive undetermined sample data is the overseas IP when the node behind the node as the border route has the delayed steep slope through the comparison of the directed paths; when a second node appears a delay steep slope after the node serving as the border route, judging that the target IP of the comprehensive undetermined sample data is an overseas IP; and judging that the target IP of the comprehensive undetermined sample data is the domestic IP under other conditions.
Advantageous effects
The method does not need huge IP library maintenance burden, avoids large-scale data processing, large data cluster requirements and data storage burden, has high timeliness of data analysis, reduces the manual labeling burden of large-scale IP data in an applicable scene, and can be suitable for quick area positioning of ipv6 addresses.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
Example one
Referring to fig. 1, the steps of implementing the method for determining IP address attribution by combining the characteristics of the route tracking instruction with the graphical analysis provided by the invention include: s01, dialing an IP outside a measurement domain by a routing tracking instruction to acquire graph characteristics, and finishing a preliminary training sample; s02 calculating border level route and perfecting training sample; s03, inputting the training sample into a vector machine, and training through multi-feature configuration to obtain an IP address attribution automatic judgment model;
s01 routing tracking instruction dials out-of-domain IP acquisition graph features to complete preliminary training sample
The method comprises the steps of collecting 100000, determining addresses around the world in various foreign countries to obtain a foreign IP address sample set, and obtaining public resources of the foreign IP address set;
deploying detection points, wherein the detection points at least comprise a China telecommunication public network IP detection point, a China Mobile public network IP detection point and a China Unicom public network IP detection point; the server of each detection point carries out a routing tracking instruction request on a foreign IP address sample set, and each routing tracking instruction generates a unique dial testing number which is called as the dial testing number of the routing tracking instruction;
thirdly, the server of each detection point establishes a sample graph library according to the return information of the route tracking instruction, and the sample graph library comprises: a. recording each IP address passed by the route tracing instruction as a node, recording the front-back relationship of the IP addresses passed by the route tracing instruction in sequence as a directed path, for example, 30 nodes and 29 directed paths are formed when one route tracing instruction passes by 30 hops; b. the data stored by the nodes of the sample gallery includes: the method comprises the steps of setting a node IP address, response time on a node, a directed path to which the node belongs and a dial test number of a route tracking instruction to which the node belongs; c. eliminating the asterisk if the asterisk appears in returned data during the dial test of the routing tracking instruction;
fourthly, storing a sample gallery as a primary training sample;
s02 computing border level routes and refining training samples
Inputting a preliminary training sample into a middle centrality algorithm, and calculating a node in a central position of information traffic; the intermediate centrality algorithm is:
Figure 808842DEST_PATH_IMAGE001
u is a node, p is the number of shortest paths between the node s and the node t, p (u) is the number of shortest paths between s and the node t through the node u, and the higher the value of B (u), the higher the probability that the node u is taken as a central node;
secondly, IP addresses of the first 2000 nodes with the maximum B (u) values are obtained in a preliminary training sample, and the IP addresses are input into a near-centrality algorithm to calculate the nodes with the strongest information transmission capability; the approximate centrality algorithm is:
Figure 322739DEST_PATH_IMAGE002
d (u, v) is the distance between u and v, u is the target node, v is other nodes except the target node, and the larger the value of C (u), the stronger the information transmission capability of the node u;
computing the score M of the node becoming the boundary route,
Figure 734172DEST_PATH_IMAGE003
taking the first 2000 nodes with the maximum M value as border routing nodes to obtain a border routing node library;
fourthly, adding a border route node library into the preliminary training sample to generate an improved training sample;
s03, inputting the training sample into a vector machine, training by multi-feature configuration to obtain an IP address attribution automatic judgment model
Judging whether nodes of the same dial test number contain border routing nodes or not according to the dial test number, outputting 1 by a vector machine when the nodes of the same dial test number contain the border routing nodes, and outputting 0 by the vector machine when the nodes of the same dial test number do not contain the border routing nodes; outputting 1 by the vector machine, and generating steep slope undetermined sample data from data stored in nodes with the same dial testing number; if the vector machine outputs 0, the target IP address is judged to be an internal IP address;
secondly, calculating the delay percentage of the undetermined sample data of the steep slope, dividing the response time of the former node by the response time of the latter node to obtain the delay percentage, judging that the delayed steep slope occurs when the delay percentage is more than 200 percent, and generating the undetermined sample data of the steep slope with the delayed steep slope when the delayed steep slope occurs to respond to the undetermined sample data; when the delay steep slope does not appear, judging the target IP address as an internal IP address by using the sample data to be determined of the steep slope without the delay steep slope;
thirdly, judging the response time of the last node in the response pending sample data, and generating comprehensive pending sample data by responding to the pending sample data when the response time of the last node is more than or equal to 200 milliseconds; when the response time of the last node is less than 200 milliseconds, judging that the target IP address responding to the undetermined sample data is an internal IP address;
comparing the node which is judged to be the border route in the comprehensive undetermined sample data with the node with the delayed steep slope, and judging that the target IP of the comprehensive undetermined sample data is the overseas IP when the node behind the node as the border route has the delayed steep slope through the comparison of the directed paths; when a second node appears a delay steep slope after the node serving as the border route, judging that the target IP of the comprehensive undetermined sample data is an overseas IP; and judging that the target IP of the comprehensive undetermined sample data is the domestic IP under other conditions.

Claims (1)

1. The method for determining the IP address attribution by combining the route tracking instruction characteristic with the graph calculation analysis is characterized by comprising the following steps of: 1) dialing an IP outside a measurement domain by a routing tracking instruction to acquire graph characteristics and finishing a preliminary training sample; 2) calculating border level routes and perfecting training samples; 3) inputting the training sample into a vector machine, and training through multi-feature configuration to obtain an IP address attribution automatic judgment model;
1) the routing tracking instruction dials the IP outside the measuring area to obtain the graph characteristics, and the preliminary training sample is completed
Collecting 100000 addresses determined to all over the world in various foreign countries to obtain a foreign IP address sample set;
deploying a plurality of detection points, wherein a server of each detection point carries out a routing tracking instruction request on a foreign IP address sample set, and each routing tracking instruction generates a unique dialing and testing number which is called a dialing and testing number of the routing tracking instruction;
thirdly, the server of each detection point establishes a sample graph library according to the return information of the route tracking instruction, and the sample graph library comprises: a. recording each IP address passed by the route tracking instruction as a node, and recording the front and back relations of the IP addresses passed by the route tracking instruction in sequence as a directed path; b. the data stored by the nodes of the sample gallery includes: the method comprises the steps of setting a node IP address, response time on a node, a directed path to which the node belongs and a dial test number of a route tracking instruction to which the node belongs; c. rejecting the asterisk if the asterisk appears in returned data during the dial test of the routing tracking instruction;
fourthly, storing a sample gallery as a primary training sample;
2) computing border level routes and perfecting training samples
Inputting a preliminary training sample into a middle centrality algorithm, and calculating a node in a central position of information traffic; the intermediate centrality algorithm is:
Figure 638718DEST_PATH_IMAGE001
u is a node, p is the number of shortest paths between the node s and the node t, p (u) is the number of shortest paths between s and t passing through the node u, and the larger the value of B (u), the larger the probability that the node u is taken as a central node;
secondly, IP addresses of the first 2000 nodes with the maximum B (u) values are obtained in a preliminary training sample, and the IP addresses are input into a near-centrality algorithm to calculate the nodes with the strongest information transmission capability; the approximate centrality algorithm is:
Figure 59335DEST_PATH_IMAGE002
d (u, v) is the distance between u and v, u is the target node, v is other nodes except the target node, and the larger the value of C (u), the stronger the information transmission capability of the node u;
computing the score M of the node becoming the boundary route,
Figure 269179DEST_PATH_IMAGE003
taking the first 2000 nodes with the maximum M value as border routing nodes to obtain a border routing node library;
fourthly, adding a border route node library into the preliminary training sample to generate an improved training sample;
3) inputting the training sample into a vector machine, training by multi-feature configuration to obtain an IP address attribution automatic judgment model
Judging whether nodes of the same dial test number contain border routing nodes or not according to the dial test number, outputting 1 by a vector machine when the nodes of the same dial test number contain the border routing nodes, and outputting 0 by the vector machine when the nodes of the same dial test number do not contain the border routing nodes; outputting 1 by the vector machine, and generating steep slope undetermined sample data from data stored in nodes with the same dial testing number; if the vector machine outputs 0, the target IP address is judged to be an internal IP address;
secondly, calculating the delay percentage of the undetermined sample data of the steep slope, dividing the response time of the former node by the response time of the latter node to obtain the delay percentage, judging that the delayed steep slope occurs when the delay percentage is more than 200 percent, and generating the undetermined sample data of the steep slope with the delayed steep slope when the delayed steep slope occurs to respond to the undetermined sample data; when the delay steep slope does not appear, judging the target IP address as an internal IP address by using the sample data to be determined of the steep slope without the delay steep slope;
thirdly, judging the response time of the last node in the response pending sample data, and generating comprehensive pending sample data by responding to the pending sample data when the response time of the last node is more than or equal to 200 milliseconds; when the response time of the last node is less than 200 milliseconds, judging that the target IP address responding to the undetermined sample data is an internal IP address;
comparing the node which is broken into the border route in the comprehensive undetermined sample data with the node on which the delay steep slope appears, and judging that the target IP of the comprehensive undetermined sample data is an overseas IP when a delay steep slope appears on a node behind the node serving as the border route through comparison of directed paths; when a second node appears a delay steep slope after the node serving as the border route, judging that the target IP of the comprehensive undetermined sample data is an overseas IP; and judging that the target IP of the comprehensive undetermined sample data is the domestic IP under other conditions.
CN202210375880.5A 2022-04-12 2022-04-12 Method for determining IP address attribution by combining route tracking instruction characteristics with graph calculation analysis Pending CN114884850A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210375880.5A CN114884850A (en) 2022-04-12 2022-04-12 Method for determining IP address attribution by combining route tracking instruction characteristics with graph calculation analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210375880.5A CN114884850A (en) 2022-04-12 2022-04-12 Method for determining IP address attribution by combining route tracking instruction characteristics with graph calculation analysis

Publications (1)

Publication Number Publication Date
CN114884850A true CN114884850A (en) 2022-08-09

Family

ID=82670190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210375880.5A Pending CN114884850A (en) 2022-04-12 2022-04-12 Method for determining IP address attribution by combining route tracking instruction characteristics with graph calculation analysis

Country Status (1)

Country Link
CN (1) CN114884850A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090066540A1 (en) * 2007-09-07 2009-03-12 Dimitri Marinakis Centralized route calculation for a multi-hop streetlight network
CN101588291A (en) * 2008-05-22 2009-11-25 原创信通电信技术(北京)有限公司 Method for determining packet transmission route in IP telecommunication network system
EP2541838A1 (en) * 2011-06-30 2013-01-02 Quova, Inc. System and method for predicting the geographic location of an internet protocol address
CN105119827A (en) * 2015-07-14 2015-12-02 中国互联网络信息中心 Determination method of router geographic position
US11215467B1 (en) * 2020-08-03 2022-01-04 Kpn Innovations, Llc. Method of and system for path selection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090066540A1 (en) * 2007-09-07 2009-03-12 Dimitri Marinakis Centralized route calculation for a multi-hop streetlight network
CN101588291A (en) * 2008-05-22 2009-11-25 原创信通电信技术(北京)有限公司 Method for determining packet transmission route in IP telecommunication network system
EP2541838A1 (en) * 2011-06-30 2013-01-02 Quova, Inc. System and method for predicting the geographic location of an internet protocol address
CN105119827A (en) * 2015-07-14 2015-12-02 中国互联网络信息中心 Determination method of router geographic position
US11215467B1 (en) * 2020-08-03 2022-01-04 Kpn Innovations, Llc. Method of and system for path selection

Similar Documents

Publication Publication Date Title
US10637771B2 (en) System and method for real-time load balancing of network packets
CN107786440B (en) Method and device for forwarding data message
Tozal et al. Tracenet: an internet topology data collector
CN110224883B (en) Gray fault diagnosis method applied to telecommunication bearer network
EP3293917B1 (en) Path probing using an edge completion ratio
CN111181798A (en) Network delay measuring method and device, electronic equipment and storage medium
JP7397893B2 (en) Identifying traceroute nodes and corresponding devices
CN110336716B (en) High-efficiency target host end-hop router detection method
CN111245969B (en) Large-scale network alias analysis method oriented to IP positioning
CN105119827A (en) Determination method of router geographic position
CN108512816B (en) Traffic hijacking detection method and device
CN111064817A (en) City-level IP positioning method based on node sorting
CN113709236A (en) Judgment method and system for neighbor node corresponding to initial node in digital currency transaction network based on block chain
CN113706305A (en) Initial node judgment method and system for digital currency network transaction based on block chain
CN114884850A (en) Method for determining IP address attribution by combining route tracking instruction characteristics with graph calculation analysis
Hillmann et al. On the path to high precise ip geolocation: A self-optimizing model
EP3360288A1 (en) Analysis of network performance
CN114520784B (en) Dynamic content acceleration access method and device
CN104253712B (en) A kind of method that P2P Network Recognitions are carried out using deep packet inspection technical
CN113706304A (en) Block chain-based digital currency transaction node IP tracing method and system
Hillmann et al. Dragoon: advanced modelling of IP geolocation by use of latency measurements
CN113689219A (en) Authenticity verification method and system for transaction node of digital currency network based on block chain
Averkin You are the weakest link-identifying single-rack points of failure in the DNS
Oujezsky et al. Aequor Tracer–Network Analysis Application
CN116684294A (en) Internet asset topology active test method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination