CN115529290A - IP street level positioning method and device based on graph neural network - Google Patents
IP street level positioning method and device based on graph neural network Download PDFInfo
- Publication number
- CN115529290A CN115529290A CN202211144672.0A CN202211144672A CN115529290A CN 115529290 A CN115529290 A CN 115529290A CN 202211144672 A CN202211144672 A CN 202211144672A CN 115529290 A CN115529290 A CN 115529290A
- Authority
- CN
- China
- Prior art keywords
- node
- embedding
- graph
- edge
- address
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 25
- 238000005259 measurement Methods 0.000 claims abstract description 32
- 230000006870 function Effects 0.000 claims description 51
- 239000011159 matrix material Substances 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 22
- 238000013507 mapping Methods 0.000 claims description 15
- 238000007781 pre-processing Methods 0.000 claims description 11
- 239000000523 sample Substances 0.000 claims description 10
- 230000002776 aggregation Effects 0.000 claims description 9
- 238000004220 aggregation Methods 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 9
- 238000005457 optimization Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 7
- 238000001514 detection method Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 230000000644 propagated effect Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 4
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 abstract description 4
- 238000013135 deep learning Methods 0.000 description 6
- 239000000306 component Substances 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention belongs to the technical field of target IP positioning, and discloses an IP street level positioning method and device based on a graph neural network, wherein the method comprises the following steps: firstly, the traceroute original measurement data of a computer network is expressed into a graph with attributes; then, converting the attribute graph into initial node embedding through an encoder; subsequently, the initial node embedding is refined by modeling the connection information; finally, the decoder maps the fine embedding to node positions. The invention relieves the convergence problem of GNN by considering prior knowledge, improves the geographic position prediction precision, and experiments on different real data sets show that: the present invention outperforms the most advanced rule-based and learning-based baselines by 16% to 28% over the median error distance of all data sets.
Description
Technical Field
The invention relates to the technical field of target IP positioning, in particular to an IP street level positioning method and device based on a graph neural network.
Background
The target IP street level positioning is based on the IP city level positioning or the known IP city position, and further finds out the more accurate position of the IP address in the city. IP street level positioning has practical requirements in a plurality of fields such as tracing network attack, providing localization service and the like. However, most of the existing methods are only suitable for ideal network environments. In an actual network, the positioning accuracy is not high because the relationship between the measurement data and the geographical location does not conform to the assumptions of the existing methods. Therefore, how to develop an IP street level positioning technology with higher accuracy in an actual network has important value.
Depending on how the "measurement data-geographical location" mapping is depicted, IP street-level location techniques can be divided into rule-based IP street-level location techniques and deep learning-based IP street-level location techniques.
IP street level positioning techniques based on predefined rules generally assume that there is some strong correlation between measurement data, such as time delay or routing path, and geographical distance, and define conversion rules from measurement data to geographical distance according to the respective assumptions. The advantages of this type of method are easy to understand. However, the accuracy is generally not high because the assumption of the method is often not in accordance with the actual situation of most practical networks.
In recent times, some scholars have attempted to model the mapping between measurement data and geographic locations using deep learning. Deep learning is good at describing the nonlinear relationship between various measurement data and geographic distances from a large amount of data, and is more suitable for the complex situation of an actual network. The patent applicant proposes that delay and routing information of a target network can be input into a multi-layer perceptron MLP, and the geographic position of a target IP can be predicted after the delay and routing information is processed by the MLP. Such methods have a higher positioning accuracy than positioning techniques based on predefined rules. However, the MLP method cannot model the topology structure of the computer network, and the positioning accuracy still needs to be improved.
Disclosure of Invention
The invention provides an IP street level positioning method and device based on a graph neural network, aiming at the problem that the positioning accuracy of the existing IP street level positioning technology still needs to be improved, the problem of GNN convergence is relieved by considering prior knowledge, and the geographic position prediction accuracy is improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides an IP street level positioning method based on a graph neural network, which comprises the following steps:
step 1, representing traceroute original measurement data of a computer network into a graph with attributes;
step 2, generating initial graph node embedding of each IP address based on the graph with the attribute through an encoder; the IP address comprises a landmark IP address, and the geographic position corresponding to the landmark IP address is known;
step 3, the geographical position information is injected into the initial graph node embedding by modeling the link information between the computer network nodes;
step 4, embedding and mapping the graph nodes processed in the step 3 to the geographic positions corresponding to the graph nodes, namely IP addresses, through a decoder;
step 5, comparing the difference between the geographic position obtained by mapping and the actual geographic position, and optimizing the parameters of the model through back propagation and gradient descent;
and 6, predicting the geographic position of the IP addresses of other unknown geographic positions in the computer network based on the optimized model.
Further, the step 1 comprises:
step 1.1, constructing a network topology graph:
representing the topology of a computer network using IP addresses and links between IP addresses, all IP addresses in traceroute raw measurement data of a computer network are converted into graph nodes, each node v i The node ID is distinguished by node ID, and the value of the node ID is 1-N V In which N is V The number of all IP addresses in traceroute original measurement data is obtained;
converting a direct physical link between two IP addresses into a graph edge;
step 1.2, attribute extraction:
each graph node is associated with two node attributes: node latency and node IP address; for each node, the detection host repeats measuring delay for multiple times, and the minimum delay is selected as the node delay; finally, combining the node ID and the node attribute as the initial characteristics of the node;
each edge is associated with an edge delay, a head node IP address and a tail node IP address, a node close to the detection host is taken as a head node of the edge, another node of the edge is taken as a tail node, and the edge delay is calculated by subtracting the node delay of the head node from the tail node.
Further, in step 1.1, in the traceroute process, when the IP address and delay of the router cannot be obtained by the probe host, there are two methods for establishing an edge on the router: (i) Neglecting the router, and establishing an edge between two routers before and after the router on a routing path; (ii) The router is mapped to the IP addresses found in the other re-measured routing paths.
Further, the step 3 comprises:
for node v i And its neighboring node v j From v j To v i Is defined as
Wherein m is i←j Is v j Is sent to v i Message of f message As a function of the message,indicating neighboring node embedding, e ij Representing a node v i And node v j The edge of f message The realization method comprises the following steps:
wherein the edge network f edge Is a two-layer MLP that embeds edgesIs converted into a matrixFor computing neighbor node embeddingf edge The framework of (1) is as follows:
whereinAs weight for controlling the embedding of adjacent nodesThe impact on node i;anda weight matrix of an MLP layer;
first embedding the initial edge by the above formulaConverting the temporary embedding into a one-dimensional weight embedding, and reconstructing the one-dimensional weight embedding into a two-dimensional orientation weight matrixTo control neighbor nodes v j To node v i Capturing a nonlinear relationship by using an activation function of the ReLU;
U vi denotes v i Set of neighboring nodes, aggregation function f aggregate Collect all the slaveIs propagated to v i Message of (3), update function f update Using collected messagesUpdate v i V. embedding of i The aggregate update function of (a) is defined as:
the update function is implemented as follows:
by summarizing the embedding before a node updateAnd messages aggregated from neighborsTo update the node v i And (4) performing nonlinear feature transformation by using Relu.
Further, in step 5, the following loss function is adopted:
wherein N (v) train Representing all the nodes in the training data set,respectively representing the real geographical position and the predicted geographical position of the IP in the training set after being transformed by the decoder, and theta represents trainable model parameters.
Another aspect of the present invention provides an IP street-level positioning apparatus based on a graph neural network, including:
the preprocessing module is used for representing traceroute original measurement data of a computer network into a graph with attributes;
the encoding module is used for generating initial graph node embedding of each IP address based on the attributed graph through an encoder; the IP address comprises a landmark IP address, and the geographic position corresponding to the landmark IP address is known;
the message transmission module is used for injecting the geographical position information into the initial graph node embedding by modeling the link information between the computer network nodes;
the decoding module is used for embedding and mapping the graph nodes processed by the message transmission module to the geographic positions corresponding to the graph nodes, namely the IP addresses through a decoder;
the model training optimization module is used for comparing the difference between the geographic position obtained by mapping and the actual geographic position and optimizing the parameters of the model through back propagation and gradient descent;
and the geographic position prediction module is used for predicting the geographic position of the IP address of other unknown geographic positions in the computer network based on the optimized model.
Further, the preprocessing module is specifically configured to:
step 1.1, constructing a network topology graph:
representing the topology of a computer network using IP addresses and links between IP addresses, all IP addresses in traceroute raw measurement data of a computer network are converted into graph nodes, each node v i The node ID is distinguished by node ID, and the value of the node ID is 1-N V In which N is V The number of all IP addresses in traceroute original measurement data is obtained;
converting a direct physical link between two IP addresses into a graph edge;
step 1.2, attribute extraction:
each graph node is associated with two node attributes: node latency and node IP address; for each node, the detection host repeats measuring delay for multiple times, and the minimum delay is selected as the node delay; finally, combining the node ID and the node attribute as the initial characteristics of the node;
each edge is associated with an edge delay, a head node IP address and a tail node IP address, a node close to the probe host is taken as a head node of the edge, another node of the edge is taken as a tail node, and the edge delay is calculated by subtracting the node delay of the head node from the tail node.
Further, in step 1.1, in the traceroute process, when the IP address and delay of the router cannot be obtained by the probe host, there are two methods for establishing an edge on the router: (i) Neglecting the router, and establishing an edge between two routers before and after the router on a routing path; (ii) The router is mapped to the IP addresses found in the other re-measured routing paths.
Further, the message passing module is specifically configured to:
for node v i And its neighboring node v j From v j To v i Is defined as
Wherein m is i←j Is v is j Is sent to v i Message of f message As a function of the message,indicating neighboring node embedding, e ij Representing a node v i And node v j A side of formation f message The realization method comprises the following steps:
wherein the edge network f edge Is a two-layer MLP that embeds edgesIs converted into a matrixFor computing neighbor node embeddingf edge The framework of (1) is as follows:
whereinAs weight for controlling the embedding of adjacent nodesThe impact on node i;anda weight matrix of two MLP layers;
first embedding the initial edge by the above formulaConverting the temporary embedding into a one-dimensional weight embedding, and reconstructing the one-dimensional weight embedding into a two-dimensional orientation weight matrixTo control neighbor node v j To node v i Capturing a nonlinear relationship by using an activation function of the ReLU;
U vi denotes v i Set of neighboring nodes, aggregation function f aggregate Collect all the slaveIs propagated to v i Message of (3), update function f update Using collected messagesUpdate v i V. embedding of i The aggregate update function of (a) is defined as:
the update function is implemented as follows:
by summarizing embedding before node updateAnd messages aggregated from neighborsTo update the node v i And (4) performing nonlinear feature transformation by using Relu.
Further, in the model training optimization module, the following loss function is adopted:
wherein N (v) train Representing all the nodes in the training data set,respectively representing the real geographical position and the predicted geographical position of the IP in the training set after the transformation of the decoder, and theta represents trainable model parameters.
Compared with the prior art, the invention has the following beneficial effects:
the invention utilizes the graph neural network to improve the generalization capability of the IP positioning. Firstly, the traceroute original measurement data of a computer network is expressed into a graph with attributes; then, converting the attribute graph into initial node embedding through an encoder; subsequently, the initial node embedding is refined by modeling the connection information; finally, the decoder maps the fine embedding to node positions. The invention relieves the convergence problem of the neural network of the map by considering the prior knowledge, thereby improving the prediction precision of the geographic position. Experiments on different real data sets showed that: the present invention outperforms the most advanced rule-based and learning-based baselines by 16% to 28% over the median error distance of all data sets.
Drawings
Fig. 1 is a flowchart illustrating an IP street level positioning method based on a graph neural network according to an embodiment of the present invention.
FIG. 2 is a diagrammatic illustration of a computer network mapping to tape attributes in accordance with an embodiment of the present invention;
FIG. 3 is a block diagram of an embodiment of a neural network-based IP street level positioning method;
fig. 4 is a schematic structural diagram of an IP street-level positioning apparatus based on a graph neural network according to an embodiment of the present invention.
Detailed Description
The invention is further illustrated by the following examples in conjunction with the drawings and the accompanying drawings:
as shown in fig. 1, an IP street level positioning (GNN-GEO) method based on graph neural network includes:
step S101, representing traceroute original measurement data of a computer network into a graph with attributes;
step S102, generating initial graph node embedding of each IP address based on the attributed graph through an encoder; the IP address comprises a landmark IP address, and the geographic position corresponding to the landmark IP address is known;
step S103, injecting the geographical position information into the initial graph node embedding by modeling the link information between the computer network nodes;
step S104, embedding and mapping the graph nodes processed in the step S103 to the geographic positions corresponding to the graph nodes, namely IP addresses through a decoder;
step S105, comparing the difference between the geographic position obtained by mapping and the actual geographic position, and optimizing the parameters of the model through back propagation and gradient descent;
and S106, predicting the geographic position of the IP addresses of other unknown geographic positions in the computer network based on the optimized model.
Specifically, the method is researched around an IP street level positioning technology in an actual Internet complex environment, the IP positioning is converted into a graph node attribute prediction problem under a deep learning view angle, and the attribute prediction problem is solved by utilizing a graph neural network architecture, so that a target IP is positioned and obtained.
1. IP positioning problem translation
The method redefines the IP street level positioning problem based on measurement in the deep learning field of the graph into an attribute prediction problem with the attribute graph nodes.
As shown in fig. 1, the present invention utilizes IP addresses and links between IP addresses to represent the topology of a computer network. By link is meant a direct physical link between two IP addresses. The location of each IP address is represented as a pair (latitude, longitude). In addition, each IP address is associated with an attribute that can be measured, such as latency from the probe source to the IP address. Each link is also associated with a property that can be measured, such as the latency of that link. All IP addresses can be divided into four groups: the host IP address, landmark IP address, target IP address, and router IP address are probed. Wherein, the detecting host is generally a detecting host or a server controlled by a researcher and provided with network measuring software, and the geographic position and the IP address of the landmark are also known; to the eyesNetwork equipment marked with unknown geographic location, known IP address and requiring geographic location estimation by a positioning algorithm; the router is an intermediate router discovered when the probe host acquires the measurement data of the landmark and the target IP. Thus, one computer network can be mapped to one band property map G = { V, E, X }. Where node v represents an IP address, edge e represents a link, and x represents measurable properties of the node and edge (e.g., latency, etc.). Suppose that the longitude and latitude are used to represent the geographical location l of a certain node v v And the node sets belonging to the landmark and the target host are respectively V l ,V t Then the measurement-based IP positioning can be formalized to find a prediction function f that can predict the target location:
wherein the content of the first and second substances,is the estimated location of the target IP and the inputs are the geographic location of the landmark, the nodes, edges, and measurement attributes of the graph G. The process of finding the prediction function f is a typical graph node property prediction problem.
To solve the problem of node attribute prediction of the graph, the graph with the attributes needs to be accurately modeled, namely the measurement data G and the real positions { l ] of the target IP are accurately described v ,v∈V t And obtaining a node characteristic representation which can contain signals related to the geographic position according to the relationship between the geographic position and the mobile terminal, and finally estimating the correct geographic position based on the node characteristic representation.
2. GNN-GEO Process framework
The GNN-GEO method framework is shown in fig. 3, and consists of four components: a preprocessing layer, an encoder, a Messaging (MP) layer, and a decoder. The preprocessing layer maps raw measurement data of the computer network to a graphical representation G = (V, E, XV, XE). The encoder generates initial feature embedding of G, the MP layer refines the initial feature embedding by using a graph signal, all nodes V output the refined embedding, and finally the decoder maps the refined node embedding to a position. We will describe each component individually, followed by optimization details. It is worth mentioning that we can derive different GNN-based IP geolocation methods from the GNN-Geo framework by changing some details of each component.
1. A pretreatment layer: the measurement data of the patent is mainly tracking data from a detection host to a road sign and a target. The task of the pre-processing layer is to convert the original traceroute data into the initial embedding G = (V, E, XV, XE) of the encoder. The method mainly comprises two subtasks, namely (i) constructing a graph G by using graph nodes V and links E; (ii) extract node attribute X V and link attribute X E.
(1) Constructing a network topology map
And (3) node: the topology of the computer network is represented using IP addresses and links between IP addresses. All IP addresses in the original traceroute data are converted to graph nodes. Each node v i Distinguished by the node ID. Value of 1-N V In which N is V The number of all IP addresses in traceroute data.
Side: if the probing host finds a direct physical link between two IP addresses, then this link is converted to graph edge e j . During traceroute, the probing host may not be able to obtain the IP addresses and delays of some routers, which are referred to as "anonymous routers". Neglecting the anonymous router, on the routing path, establish the edge between two routers before and after the anonymous router; (ii) The anonymous router is mapped to the IP address found in the other re-measured routing path.
(2) Attribute extraction
And (4) node attribute: in this patent, each graph node is associated with two node attributes: node latency and node IP address. The node latency is the direct latency of probing a host to a node. For each node, the probing master repeats measuring the delay a number of times. The smallest is chosen as the node delay because it contains the smallest congestion, closer to the true propagation delay. Finally, the node IDs andthe node attributes serve as initial characteristics of the node. v. of i Representing a node v i V denotes the initial characteristics of all nodes.
Edge attribute: each edge may be associated with several characteristics such as edge delay, head node IP address, and tail node IP address. Here, a node near the probe host is referred to as a head node of the edge, and another node is referred to as a tail node. The edge delay is calculated by subtracting the node delay of the head node from the tail node. We keep the edge delay even though it may be negative.
2. Encoder for encoding a video signal
The purpose of the encoder is to form an initial embedding of low-dimensional graph nodes and edges for the MP layer. And generating embedding of graph nodes and edges by using the initial node characteristics V and the initial edge characteristics E of the preprocessor.
For each non-zero feature in V and E, we associate it with an embedding vector. A set of low-dimensional embeddings of graph nodes and graph edges, respectively, may then be obtained. We connect the embedding of a node (or edge) to a vector to describe the node (or edge). In particular, node v i And edge e j The low-dimensional embedding of (2) is:
in the formula, Q v ∈R Nv×G An embedded matrix, Q, representing all nodes V v ∈R Ne×K An embedded matrix, N, representing all edges E V The number of node features is indicated, and G indicates the embedding size. N is a radical of e The number of edge features is indicated, and K indicates the embedding size. Then the encoder willAnd h E Input into the next component MP layer for embedded enhancement.
3. Messaging (MP) layer
This is the core component of the GNN-Geo framework. The input is a graph nodeAnd an edge h E Initial low-dimensional embedding. Its purpose is to increase initial graph node embedding by explicitly modeling edges between nodes and graph/edge attributesThe final map node embedding will be sent to the decoder for position estimation. The MP layer consists of messaging, aggregation, and update functions.
The message function: for node v i And its neighboring node v j From v j To v i Is defined as
Wherein m is i←j Is v is j Is sent to v i Message of f message As a function of the message. f. of message The inputs of (a) are: (i) Adjacent node embedding h vj ;(ii)v i ,v j Edge e between two nodes ij Is embedded. This example will f message The realization method comprises the following steps:
wherein the edge network f edge Is a two-layer MLP that embeds edgesIs converted into a matrixMessage for computing neighbor nodesThe framework of (1) is as follows:
whereinAs weight for controlling neighbor embeddingImpact on node i.Andis a weight matrix of two layers of MLP. They first embed the initial edge(size K) to temporal embedding (size 2K) and then to one-dimensional embedding. Embedding and reconstructing one-dimensional weight into two-dimensional orientation weight matrix W eij (size G) to control neighbor node v j To node v i The nonlinear relationship is captured using the activation function of the ReLU.
Aggregation and update function: u shape vi Denotes v i Set of neighboring nodes, aggregation function f aggregate Collect all the slaveIs propagated to v i Of updating function f update Using collected messagesUpdate v i V. embedding of i The aggregate update function of (a) is defined as:
whereinRepresenting a node v i Update embedding after embedding for the initial node of the encoder. Aggregation function f aggregate It may be a simple symmetric function such as Mean, max or Sum. The update function is implemented as follows:
we proceed by summarizing the embedding before a nodeAnd messages it aggregates from neighborsTo update the node v i . And performing nonlinear feature transformation by using Relu.
4. Decoder
The goal of the decoder is to embed by refinement of the nodesTo estimate the position. We need to predict two values (latitude and longitude) for all nodes. Our decoder is implemented as follows:
whereinFor all nodes V to pass throughThe transformed estimated position matrix, sigmoid, is an activation function with an output of [0,1 ]]. If the input value is too large, sigmoid may become saturated, which makes learning more difficult. BatchNorm refers to batch normalization. W is a group of loc Weight parameter matrix for full connection layer in decoder, b loc Is the bias parameter matrix of the fully connected layer in the decoder. To alleviate the saturation problem of the Sigmoid function and prevent overfitting, the present embodiment employs a batch normalization method. Then we compare the real geographical location of the training set IP (after transformation)And predicted geographical location of training set IP (after transformation)To train GNN-Geo. After training is finished, the geographic position of the target IPThe transformed geographic location can be scaled from the inverse of the two "0-1 scalersAnd (4) converting to obtain.
5. Model training
To learn the model parameters, we use loc train Andmean square error loss (MSE) between. In this patent, we optimize L 2 The regularized MSE loss is as follows:
wherein N (v) train Representing all nodes in the training dataset. The contains all trainable model parameters of GNN-Geo. Deep learning methods often suffer from over-fitting problems. Except at batches in the decoderPhysical normalization, we also use L 2 Regularization to prevent overfitting. Note that batch normalization is only used in training and must be disabled during testing. Lambda control L 2 Strength of regularization. The prediction model is optimized using an Adam optimizer and the model parameters are updated using the gradient of the loss function. Until the final loss is minimized, the model optimization is stopped, and the model obtained at this time can be applied to the positioning of the IP of other unknown geographic positions in the network.
To verify the effect of the present invention, the following experiment was performed:
TABLE 1 comparison of GNN-Geo method to baseline method (unit: km)
Note: bold line indicates GNN-G e o's performance, the best metric between baselines is underlined, and the italicized line represents the best baseline approach for the mean error distance.
As shown in table 1, the results clearly show that the average error distance, the median error distance, and the maximum error distance of the GNN-Geo method are better than those of all baseline methods in all three regional data sets, i.e., the positioning accuracy of GNN-Geo is higher than that of baseline in all three indexes. It was also found that the best baseline method was different for all three regions. Therefore, compared with the prior method, the GNN-Geo shows better generalization capability under different network environments.
On the basis of the above embodiments, as shown in fig. 4, another aspect of the present invention provides an IP street level positioning apparatus based on a graph neural network, including:
the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for representing traceroute original measurement data of a computer network into a graph with attributes;
the encoding module is used for generating initial graph node embedding of each IP address based on the attributed graph through an encoder; the IP address comprises a landmark IP address, and the geographic position corresponding to the landmark IP address is known;
the message transmission module is used for injecting the geographical position information into the initial graph node embedding by modeling the link information between the computer network nodes;
the decoding module is used for embedding and mapping the graph nodes processed by the message transmission module to the geographic positions corresponding to the graph nodes, namely the IP addresses through a decoder;
the model training optimization module is used for comparing the difference between the geographic position obtained by mapping and the actual geographic position and optimizing the parameters of the model through back propagation and gradient descent;
and the geographic position prediction module is used for predicting the geographic position of the IP address of other unknown geographic positions in the computer network based on the optimized model.
Further, the preprocessing module is specifically configured to:
step 1.1, constructing a network topology graph:
representing the topology of a computer network using IP addresses and links between IP addresses, all IP addresses in traceroute raw measurement data of a computer network are converted into graph nodes, each node v i The node ID is distinguished by node ID and takes the value of 1 to N V In which N is V The number of all IP addresses in traceroute original measurement data is obtained;
converting a direct physical link between two IP addresses into a graph edge;
step 1.2, attribute extraction:
each graph node is associated with two node attributes: node latency and node IP address; for each node, the detection host repeats measuring delay for multiple times, and the minimum delay is selected as the node delay; finally, combining the node ID and the node attribute as the initial characteristics of the node;
each edge is associated with an edge delay, a head node IP address and a tail node IP address, a node close to the probe host is taken as a head node of the edge, another node of the edge is taken as a tail node, and the edge delay is calculated by subtracting the node delay of the head node from the tail node.
Further, in step 1.1, in the traceroute process, when the IP address and delay of the router cannot be obtained by the probe host, there are two methods for establishing an edge on the router: (i) Neglecting the router, and establishing an edge between two routers before and after the router on a routing path; (ii) The router is mapped to the IP address found in the other re-measured routing path.
Further, the message passing module is specifically configured to:
for node v i And its neighboring node v j From v j To v i Is defined as
Wherein m is i←j Is v is j Is sent to v i Message of f message As a function of the message,indicating neighboring node embedding, e ij Representing a node v i And node v j A side of formation f message The realization method comprises the following steps:
wherein the edge network f edge Is a two-layer MLP that embeds edgesIs converted into a matrixFor computing neighbor node embeddingf edge The framework of (1) is as follows:
whereinAs weight for controlling the embedding of adjacent nodesThe impact on node i;anda weight matrix of two MLP layers;
first embedding the initial edge by the above formulaConverting the temporary embedding into a one-dimensional weight embedding, and reconstructing the one-dimensional weight embedding into a two-dimensional orientation weight matrixTo control neighbor nodes v j To node v i Capturing a nonlinear relation by utilizing an activation function of the ReLU;
U vi denotes v i Set of neighboring nodes, aggregation function f aggregate Collect all the slaveIs propagated to v i Of updating function f update Using collected messagesUpdate v i V. embedding of i The aggregate update function of (a) is defined as:
the update function is implemented as follows:
by summarizing embedding before node updateAnd messages aggregated from neighborsTo update the node v i And (4) performing nonlinear feature transformation by using Relu.
Further, in the model training optimization module, the following loss function is adopted:
wherein N (v) train Representing all the nodes in the training data set,respectively representing the real geographical position and the predicted geographical position of the IP in the training set after being transformed by the decoder, and theta represents trainable model parameters.
In summary, the present invention utilizes the graph neural network to improve the generalization capability of IP positioning. Firstly, the traceroute original measurement data of a computer network is expressed into a graph with attributes; then, converting the attribute graph into initial node embedding through an encoder; subsequently, the initial node embedding is refined by modeling the connection information; finally, the decoder maps the fine embedding to node positions. The method relieves the convergence problem of the neural network of the map by considering the prior knowledge, thereby improving the accuracy of the geographic position prediction. Experiments on different real data sets showed that: the present invention outperforms the most advanced rule-based and learning-based baselines by 16% to 28% over the median error distance of all data sets.
The above shows only the preferred embodiments of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.
Claims (10)
1. An IP street level positioning method based on a graph neural network is characterized by comprising the following steps:
step 1, representing traceroute original measurement data of a computer network into a graph with attributes;
step 2, generating initial graph node embedding of each IP address based on the graph with the attribute through an encoder; the IP address comprises a landmark IP address, and the geographic position corresponding to the landmark IP address is known;
step 3, by modeling the link information between the computer network nodes, injecting the geographical position information into the initial graph node embedding;
step 4, embedding and mapping the graph nodes processed in the step 3 to the geographic positions corresponding to the graph nodes, namely IP addresses, through a decoder;
step 5, comparing the difference between the geographic position obtained by mapping and the actual geographic position, and optimizing the parameters of the model through back propagation and gradient descent;
and 6, predicting the geographic position of the IP addresses of other unknown geographic positions in the computer network based on the optimized model.
2. The IP street-level positioning method based on graph neural network according to claim 1, wherein the step 1 comprises:
step 1.1, constructing a network topology graph:
representing the topology of a computer network using IP addresses and links between IP addresses, all IP addresses in traceroute raw measurement data of a computer network are converted into graph nodes, each node v i The node ID is distinguished by node ID, and the value of the node ID is 1-N V In which N is V The number of all IP addresses in traceroute original measurement data is calculated;
converting a direct physical link between two IP addresses into a graph edge;
step 1.2, attribute extraction:
each graph node is associated with two node attributes: node latency and node IP address; for each node, the detection host repeats measuring delay for multiple times, and the minimum delay is selected as the node delay; finally, combining the node ID and the node attribute as the initial characteristics of the node;
each edge is associated with an edge delay, a head node IP address and a tail node IP address, a node close to the probe host is taken as a head node of the edge, another node of the edge is taken as a tail node, and the edge delay is calculated by subtracting the node delay of the head node from the tail node.
3. The IP street level positioning method based on graph neural network as claimed in claim 2, wherein in step 1.1, when the IP address and delay of the router can not be obtained by the probing host in the traceroute process, there are two methods to establish an edge on the router: (i) Neglecting the router, and establishing an edge between two routers before and after the router on a routing path; (ii) The router is mapped to the IP addresses found in the other re-measured routing paths.
4. The IP street level positioning method based on graph neural network as claimed in claim 1, wherein the step 3 comprises:
for node v i And its neighboring node v j From v j To v i Is defined as
Wherein m is i←j Is v j Is sent to v i Message of (f) message As a function of the message,indicating neighboring node embedding, e ij Representing a node v i And node v j A side of formation f message The realization method comprises the following steps:
wherein the edge network f edge Is a two-layer MLP that embeds edgesIs converted into a matrixFor computing neighbor node embeddingf edge The framework of (1) is as follows:
whereinAs weight for controlling the embedding of adjacent nodesThe impact on node i;anda weight matrix of an MLP layer;
first embed the initial edge by the above formulaConverting the temporary embedding into a one-dimensional weight embedding, and reconstructing the one-dimensional weight embedding into a two-dimensional orientation weight matrixTo control neighbor nodes v j To node v i Capturing a nonlinear relation by utilizing an activation function of the ReLU;
U vi denotes v i Set of neighboring nodes, aggregation function f aggregate Collect all the slaveIs propagated to v i Of updating function f update Using collected messagesUpdating v i V. embedding of i The aggregate update function of (a) is defined as:
whereinRepresenting a node v i Update embedding after embedding for the initial node of the encoder;
the update function is implemented as follows:
5. The IP street-level positioning method based on graph neural network according to claim 1, characterized in that in the step 5, the following loss function is adopted:
6. An IP street level positioning device based on a graph neural network, comprising:
the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for representing traceroute original measurement data of a computer network into a graph with attributes;
the encoding module is used for generating initial graph node embedding of each IP address based on the attributed graph through an encoder; the IP address comprises a landmark IP address, and the geographic position corresponding to the landmark IP address is known;
the message transmission module is used for injecting the geographical position information into the initial graph node embedding by modeling the link information between the computer network nodes;
the decoding module is used for embedding and mapping the graph nodes processed by the message transmission module to the geographic positions corresponding to the graph nodes, namely IP addresses through a decoder;
the model training optimization module is used for comparing the difference between the geographic position obtained by mapping and the actual geographic position and optimizing the parameters of the model through back propagation and gradient descent;
and the geographic position prediction module is used for predicting the geographic position of the IP address of other unknown geographic positions in the computer network based on the optimized model.
7. The IP street level positioning apparatus based on graph neural network as claimed in claim 6, wherein the preprocessing module is specifically configured to:
step 1.1, constructing a network topology graph:
representing the topology of a computer network using IP addresses and links between IP addresses, all IP addresses in traceroute raw measurement data of a computer network are converted into graph nodes, each node v i The node ID is distinguished by node ID, and the value of the node ID is 1-N V In which N is V The number of all IP addresses in traceroute original measurement data is obtained;
converting a direct physical link between two IP addresses into a graph edge;
step 1.2, attribute extraction:
each graph node is associated with two node attributes: node latency and node IP address; for each node, the detection host repeats measuring delay for multiple times, and the minimum delay is selected as the node delay; finally, combining the node ID and the node attribute as the initial characteristics of the node;
each edge is associated with an edge delay, a head node IP address and a tail node IP address, a node close to the probe host is taken as a head node of the edge, another node of the edge is taken as a tail node, and the edge delay is calculated by subtracting the node delay of the head node from the tail node.
8. The IP street-level locator of claim 7, wherein in step 1.1, when the IP address and delay of a router cannot be obtained by a probing host in a traceroute process, there are two methods for establishing an edge on the router: (i) Neglecting the router, and establishing an edge between two routers before and after the router on a routing path; (ii) The router is mapped to the IP addresses found in the other re-measured routing paths.
9. The IP street level positioning apparatus of claim 6, wherein the messaging module is specifically configured to:
for node v i And its neighboring node v j From v j To v i Is defined as
Wherein m is i←j Is v j Is sent to v i Message of f message As a function of the message,indicating neighboring node embedding, e ij Representing a node v i And node v j The edge of f message The realization method comprises the following steps:
wherein the edge network f edge Is a two-layer MLP that embeds edgesIs converted into a matrixFor computing neighbor node embeddingf edge The framework of (1) is as follows:
whereinAs weight for controlling the embedding of adjacent nodesThe impact on node i;anda weight matrix of two MLP layers;
first embedding the initial edge by the above formulaConverting the temporary embedding into a one-dimensional weight embedding, and reconstructing the one-dimensional weight embedding into a two-dimensional orientation weight matrixTo control neighbor nodes v j To node v i Capturing a nonlinear relationship by using an activation function of the ReLU;
U vi denotes v i Set of neighboring nodes, aggregation function f aggregate Collect all the slaveIs propagated to v i Of updating function f update Using collected messagesUpdate v i V. embedding of i The aggregate update function of (a) is defined as:
whereinRepresenting a node v i Update embedding after embedding for the initial node of the encoder;
the update function is implemented as follows:
10. The IP street-level positioning apparatus based on graph neural network as claimed in claim 6, wherein the model training optimization module employs the following loss function:
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211047724 | 2022-08-30 | ||
CN2022110477242 | 2022-08-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115529290A true CN115529290A (en) | 2022-12-27 |
Family
ID=84698019
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211144672.0A Pending CN115529290A (en) | 2022-08-30 | 2022-09-20 | IP street level positioning method and device based on graph neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115529290A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111709474A (en) * | 2020-06-16 | 2020-09-25 | 重庆大学 | Graph embedding link prediction method fusing topological structure and node attributes |
CN112286996A (en) * | 2020-11-23 | 2021-01-29 | 天津大学 | Node embedding method based on network link and node attribute information |
CN113158543A (en) * | 2021-02-02 | 2021-07-23 | 浙江工商大学 | Intelligent prediction method for software defined network performance |
-
2022
- 2022-09-20 CN CN202211144672.0A patent/CN115529290A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111709474A (en) * | 2020-06-16 | 2020-09-25 | 重庆大学 | Graph embedding link prediction method fusing topological structure and node attributes |
CN112286996A (en) * | 2020-11-23 | 2021-01-29 | 天津大学 | Node embedding method based on network link and node attribute information |
CN113158543A (en) * | 2021-02-02 | 2021-07-23 | 浙江工商大学 | Intelligent prediction method for software defined network performance |
Non-Patent Citations (1)
Title |
---|
ZHIYUAN WANG, FAN ZHOU, WENXUAN ZENG, GOCE TRAJCEVSKI, CHUNJING XIAO, YONG WANG, KAI CHEN: "Connecting the Hosts: Street-Level IP Geolocation with Graph Neural Networks", pages 4121 - 4131, XP058951958, Retrieved from the Internet <URL:https://dl.acm.org/doi/10.1145/3534678.3539049> DOI: 10.1145/3534678.3539049 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109493346B (en) | Stomach cancer pathological section image segmentation method and device based on multiple losses | |
Wang et al. | On credibility estimation tradeoffs in assured social sensing | |
James et al. | Synchrophasor recovery and prediction: A graph-based deep learning approach | |
CN109635748B (en) | Method for extracting road characteristics in high-resolution image | |
CN109214422B (en) | Parking data repairing method, device, equipment and storage medium based on DCGAN | |
Liu et al. | A distributed deep reinforcement learning method for traffic light control | |
CN105574191A (en) | Online social network multisource point information tracing system and method thereof | |
CN113537788B (en) | Urban traffic jam recognition method based on virus propagation theory | |
WO2023206954A1 (en) | Parasitic parameter extraction method based on object detection network | |
CN109033239B (en) | Road network structure generation method based on least square optimization | |
CN114708479B (en) | Self-adaptive defense method based on graph structure and characteristics | |
Sun et al. | Road crack detection network under noise based on feature pyramid structure with feature enhancement (road crack detection under noise) | |
CN115529290A (en) | IP street level positioning method and device based on graph neural network | |
CN116091260A (en) | Cross-domain entity identity association method and system based on Hub-node | |
Lu et al. | Feature pyramid-based graph convolutional neural network for graph classification | |
CN114814776B (en) | PD radar target detection method based on graph attention network and transfer learning | |
Huang et al. | Drone-based car counting via density map learning | |
CN112381056B (en) | Cross-domain pedestrian re-identification method and system fusing multiple source domains | |
CN114648560A (en) | Distributed image registration method, system, medium, computer device and terminal | |
CN114821420A (en) | Time sequence action positioning method based on multi-time resolution temporal semantic aggregation network | |
CN107018027A (en) | A kind of link prediction method based on Bayesian Estimation and common neighbor node degree | |
Li et al. | Enhancing Feature Fusion Using Attention for Small Object Detection | |
WO2020024206A1 (en) | Dcgan-based parking data repairing method and apparatus, and device and storage medium | |
CN116629356B (en) | Encoder and Gaussian mixture model-based small-sample knowledge graph completion method | |
Xiao et al. | Pavement distress image automatic classification based on density-based neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |