CN114640620A - Method for deducing Internet AS connection relation based on incomplete information - Google Patents

Method for deducing Internet AS connection relation based on incomplete information Download PDF

Info

Publication number
CN114640620A
CN114640620A CN202210561377.9A CN202210561377A CN114640620A CN 114640620 A CN114640620 A CN 114640620A CN 202210561377 A CN202210561377 A CN 202210561377A CN 114640620 A CN114640620 A CN 114640620A
Authority
CN
China
Prior art keywords
connection relation
connection
bgp
credible
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210561377.9A
Other languages
Chinese (zh)
Inventor
蔡冰
嵇程
邢欣
张丽霞
袁艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Branch Center National Computer Network And Information Security Management Center
Original Assignee
Jiangsu Branch Center National Computer Network And Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Branch Center National Computer Network And Information Security Management Center filed Critical Jiangsu Branch Center National Computer Network And Information Security Management Center
Priority to CN202210561377.9A priority Critical patent/CN114640620A/en
Publication of CN114640620A publication Critical patent/CN114640620A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Evolutionary Computation (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a method for deducing Internet AS connection relation based on incomplete information, which comprises the steps of initially judging path information among ASs collected by BGP collection points in each group to obtain a consistent AS connection relation set and an AS connection relation set which cannot be judged; each group judges the same AS connection relation in the consistent AS connection relation set to obtain a credible p2p connection relation or p2c connection relation, and adds the credible connection relation or p2c connection relation into the credible AS connection relation set; adding other AS connection relations in the consistent AS connection relation set into an AS connection relation set which cannot be judged; and (3) carrying out classification model training by using the credible AS connection relation set, judging the AS connection relation in the AS connection relation set which cannot be judged by using the trained classification model, and outputting a judgment result. The method can infer the AS connection relation of which the relation type cannot be judged, and construct an accurate AS connection relation.

Description

Method for deducing Internet AS connection relation based on incomplete information
Technical Field
The invention belongs to the technical field of network space measurement, and particularly relates to a method for deducing an Internet AS connection relation based on incomplete information.
Background
Today, the internet consists of 6 million Autonomous Systems (AS). Based on the requirement of internet connection, these ases propagate prefixes and exchange routing information with each other to control the routing of traffic. Connections in an AS-level topology are typically represented in an AS-to-AS fashion, including peer-to-peer (p 2 p), customer-to-provider (c 2 p), and provider-to-customer (p 2 c). With the unprecedented growth in the size and complexity of the internet, understanding the relationships between different ASs is very important for understanding, operating and integrating the internet, such AS studying the robustness of the internet, detecting route hijacking, route leakage and route bottlenecks, designing various route-based attacks and their countermeasures, and deploying secure routing mechanisms.
However, it is not easy to have sufficient knowledge of the AS relationships, since they are usually confidential and must be inferred from various relevant information. At present, almost all inference algorithms use a set of BGP (Border Gateway Protocol) collection points to collect BGP routes as their main data for analysis, and then supplement some other relevant information, including internet information exchange centers, BGP communities, and so on. However, the data collected by the BGP collection point is a fragmented set of data, and the inference algorithm faces several fundamental challenges. First, these observations are coupled with non-negligible noise, i.e. routing due to routing anomalies or configuration errors. Second, routes from a BGP collection point are only part of the global internet and thus have data limitations. Third, BGP acquisition points are usually concentrated at upper level positions in the internet hierarchy, and their overlapping or non-overlapping views bring observation bias when aggregated. These challenges can lead to false conclusions and severe impact on relationship reasoning between AS connections.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem of providing a method for deducing the connection relation of an Internet AS (application server) based on incomplete information aiming at the defects of the prior art.
In order to solve the technical problem, the invention discloses a method for deducing the connection relation of an Internet AS based on incomplete information, which comprises the following steps.
Step S1, obtaining the path information between the public BGP collecting points and the AS collected by each BGP collecting point, and grouping the BGP collecting points; and initially judging the path information between the ASs collected by the BGP collection points in each group to obtain a consistent AS connection relation set and an AS connection relation set which cannot be judged.
Step S2, each group judges the same AS connection relation in the consistent AS connection relation set to obtain a credible p2p connection relation or p2c connection relation, and adds the credible connection relation into the credible AS connection relation set; and adding other AS connection relations in the consistent AS connection relation set into the AS connection relation set which cannot be judged.
And step S3, performing classification model training by using the credible AS connection relation set, judging the AS connection relation in the AS connection relation set which cannot be judged by using the trained classification model, and outputting a judgment result.
Further, step S1 includes the following steps.
And step S101, downloading public RouteView data from the Internet, obtaining all BGP collection points, and extracting BGP routing data collected by the BGP collection points, wherein the BGP routing data comprises path information of Internet inter-domain routes passing through each AS in the message forwarding process.
Step S102, grouping all BGP acquisition points in a random mode, wherein the number of the BGP acquisition points contained in each group isN
Step S103, in each group, the AS-Rank algorithm is respectively adopted to calculate the path information between the ASs collected by each BGP collection point in the group, and a first relation between corresponding AS connections is obtained.
And step S104, in each group, voting judgment is carried out on the first relations among the same AS connection calculated by all BGP acquisition points in the group one by one in an ensemble learning mode so AS to ensure the accuracy of an AS connection relation inference result, the first relation among the AS connections with the highest voting result is judged AS a consistent AS connection relation, the consistent AS connection relations are divided into a consistent AS connection relation set, and other AS connection relations are temporarily divided into an undetermined AS connection relation set.
Further, in the step S102, the number of BGP collection points included in each packet isNThe settings are as follows.
Step S1021, downloading the published known AS connection relation from CAIDA (center for Applied Internet Data analysis) international measurement organization, and using the known AS connection relation AS a reference Data set.
Step S1022, aiming at each BGP acquisition point obtained in step S101, calculating a second relation between each AS connection in the reference data set by adopting an AS-Rank algorithm, comparing the second relation between the AS connections with the AS connection relation corresponding to the reference data set, and obtaining the misjudgment rate of judging the AS connection relation corresponding to the BGP acquisition point and the average misjudgment rate of judging all the AS connection relations in the reference data set by the BGP acquisition pointp
Step S1023, evaluating by using binomial theorem to determine the number of BGP collection points in each groupN
Further, step S1023 includes: determining the number of BGP collection points in each group by taking the probability that at least half of the BGP collection points can correctly judge the AS connection relation AS a target and the probability is not lower than 95 percentNThe formula is as follows.
arg min(N)
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE001
Xindicating the number of BGP collection points that can correctly determine AS connectivity,
Figure DEST_PATH_IMAGE002
the probability that at least half of BGP collection points can correctly judge the AS connection relation is shown.
Further, step S2 includes the following steps.
Step S201, each group determines the same AS connection relationship in the consistent AS connection relationship set.
Step S202, if the first ratio threshold value is exceededt 1When all the collection point groups determine a certain AS connection relationship AS a p2p connection relationship, the AS connection relationship is determined AS a credible p2p connection relationship.
Step S203, if the second ratio threshold value is exceededt 2When the AS connection relation is judged to be the p2c/c2p connection relation, the AS connection relation is judged to be the credible p2c/c2p connection relation.
Step S204, adding the credible connection relation of p2p and the credible connection relation of p2p/c2p into a credible AS connection relation set, and adding other AS connection relations in a consistent AS connection relation set into an AS connection relation set which cannot be judged.
Further, the first proportional threshold in step S202t 1The value of (A) is the average value of p2p link proportions which can be obtained by all BGP acquisition points; second proportional threshold in step S203t 2The value of (A) is the average of the p2c/c2p link ratios that all BGP collection points can obtain.
Further, the classification model in step S3 adopts a bayesian network model based on expectation maximization. Considering that the AS connection data collected by all BGP collection points have large unbalanced distribution characteristics, the Bayesian network model can effectively eliminate the influence of the characteristics on AS connection relationship inference.
Further, step S3 includes the following steps.
Step S301, establishing corresponding feature vectors for each AS connection relation in the credible AS connection relation set and the AS connection relation set which cannot be judged.
Step S302, the feature vector of each credible AS connection relation in all credible AS connection relation sets and the corresponding AS connection relation are used AS training set samples.
Step S303, training a bayesian network model based on expectation maximization.
Step S304, inputting the feature vector of each AS connection relation in the AS connection relation set which cannot be judged into the trained Bayesian network model, wherein the output result of the model is the judgment result of the AS connection relation.
Further, the feature vector in step S301 includes 7 attribute values: all AS connection relations meeting the consistency requirement and respectively associated with two ASs in the AS connection relations are the proportion of p2p/p2c/c2p types, the proportion of three roles of a provider, a peer and a client in all consistency AS connection relations respectively used by the two ASs in the AS connection relations, the distance between the two ASs in the AS connection relations and the currently disclosed top-Level AS (namely the number of AS connections needing to be crossed between the two ASs and the top-Level AS such AS the known and disclosed Level 3 Parent LLC), the number of BGP acquisition points observing the two AS connection relations (namely the number of BGP acquisition points acquiring and deducing the related information of the two AS connection relations), and the hierarchical distribution condition observing the BGP acquisition points respectively belonging to the two AS connection relations (namely the proportional distribution of the BGP acquiring and deducing the related information of the two AS connection relations in different AS levels), The number of these two ASs co-occurring in the same IXP (Internet eXchange Point) and the number of these two AS co-occurring in the same hardware facility.
Further, the step S304 includes: recording the type of the AS connection relation in the AS connection relation set which cannot be judged ASC(including three types of p2p, p2c and c2 p),P(f i |Pa(f i ), C) Indicates that the AS connection relation decision type isCUnder the condition of (1), the first in the feature vectoriAn attribute value off i Wherein, the probability ofPa(f i ) Is shown andf i set of related attributes, 1 ≦i ≤7,P(C) Representing the AS connection relation type in all the training set samples ASCThen the respective attribute values are respectivelyf 1, f 2,..., f 7Is of typeCProbability of (2)P(C, f 1, f 2,..., f 7) As follows.
Figure DEST_PATH_IMAGE003
Then the final result of the AS connection relationship
Figure DEST_PATH_IMAGE004
Of the various types output for the bayesian network, the one with the highest probability is:
Figure DEST_PATH_IMAGE005
has the advantages that: according to the method for inferring the connection relationship of the Internet AS based on the incomplete information, the connection relationship of the Internet AS based on the incomplete information is inferred, observation deviation is relieved by adopting ensemble learning, BGP acquisition points are divided into groups, preliminary inference results are collected, the connection relationship which can be agreed by most groups is found by using a voting mode, and in the consistent connection relationship, only the connection relationship with specific observation distribution is considered AS a credible connection relationship. And then inputting the rest connection relations and voting results thereof into the Bayesian network. The bayesian network reveals interdependencies between various connection-related features, as well as uneven distribution of observation features caused by biased BGP acquisition point distribution. The structure of the Bayesian network is determined by a greedy algorithm, and the parameters are estimated by an expectation-maximization approach. And finally, deducing the AS connection relation through the trained classifier. The method can effectively solve the problem that inherent limitation and deviation of data provided by BGP (Border gateway protocol) acquisition points are difficult to effectively solve in the existing AS connection relationship inference process, and can infer the AS connection relationship which can not judge the relationship type, thereby constructing an accurate AS connection relationship.
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
Fig. 1 is a flowchart illustrating a method for inferring an internet AS connection relationship based on incomplete information according to an embodiment of the present application.
Fig. 2 is a flowchart illustrating step S1 of a method for inferring an internet AS connection relationship based on incomplete information according to an embodiment of the present application.
Fig. 3 is a flowchart illustrating step S2 of a method for inferring an internet AS connection relationship based on incomplete information according to an embodiment of the present application.
Fig. 4 is a flowchart illustrating step S3 of a method for inferring an internet AS connection relationship based on incomplete information according to an embodiment of the present application.
Detailed Description
In order that the objects and advantages of the invention will be more clearly understood, the following description is given in conjunction with the accompanying examples. It is to be understood that the following text is merely illustrative of one or more specific embodiments of the invention and does not strictly limit the scope of the invention as specifically claimed.
The method for deducing the connection relation of the internet AS based on the incomplete information, disclosed by the embodiment of the invention, AS shown in figure 1, comprises the following steps.
Step S1, obtaining the path information between the public BGP collecting points and the AS collected by each BGP collecting point, and grouping the BGP collecting points; and initially judging the path information between the ASs collected by the BGP collection points in each group to obtain a consistent AS connection relation set and an AS connection relation set which cannot be judged.
Step S2, each group judges the same AS connection relation in the consistent AS connection relation set to obtain a credible p2p connection relation or p2c connection relation, and adds the credible connection relation into the credible AS connection relation set; and adding other AS connection relations in the consistent AS connection relation set into the AS connection relation set which cannot be judged.
And step S3, performing classification model training by using the credible AS connection relation set, judging the AS connection relation in the AS connection relation set which cannot be judged by using the trained classification model, and outputting a judgment result.
The running environment of the embodiment of the application can be a PC with an Intel-Windows architecture, the PC with a Core eight-Core CPU with the main frequency of 2.5GHz and above of the hardware, the internal memory is more than or equal to 8GB, the hard disk is 500GB, and the Windows 7 operating system is run.
In the present embodiment, as shown in fig. 2, step S1 includes the following steps.
And step S101, downloading public RouteView data from the Internet, obtaining all BGP collection points, and extracting BGP routing data collected by the BGP collection points, wherein the BGP routing data comprises path information of Internet inter-domain routes passing through each AS in the message forwarding process.
Step S102, grouping all BGP acquisition points in a random mode, wherein the number of the BGP acquisition points contained in each group isNNThe settings are as follows.
Step S1021, downloads its known AS connection relationships (including three types p2p, p2c, c2p, wherein if the AS connection relationship is p2c, it is also c2 p) from CAIDA international measurement organization, and uses these known AS connection relationships AS reference data set.
Step S1022, aiming at each BGP acquisition point obtained in step S101, calculating a second relation (including three types of p2p, p2c and c2 p) between each AS connection in the reference data set by adopting an AS-Rank algorithm, comparing the second relation between the AS connections with a corresponding AS connection relation in the reference data set, and obtaining a misjudgment rate of judging the corresponding AS connection relation by the BGP acquisition point and an average misjudgment rate of judging all AS connection relations in the reference data set by the BGP acquisition pointp
Step S1023, evaluating and determining by using a binomial theoremNumber of BGP collection points per packetN
Determining the number of BGP collection points in each group by taking the probability that at least half of the BGP collection points can correctly judge AS connection relation AS a target that is not lower than 95%NThe formula is as follows.
arg min(N)
Wherein the content of the first and second substances,
Figure 744030DEST_PATH_IMAGE001
Xindicating the number of BGP collection points that can correctly determine AS connectivity,
Figure 723487DEST_PATH_IMAGE002
the probability that at least half of the BGP collection points can correctly judge the AS connection relation is shown.
Step S103, in each group, respectively adopting an AS-Rank algorithm to calculate the path information between the ASs collected by each BGP collection point in the group, and obtaining a first relation (comprising three types of p2p, p2c and c2 p) between corresponding AS connections.
And step S104, in each group, voting judgment is carried out on the first relations among the same AS connection calculated by all BGP acquisition points in the group one by one in an ensemble learning mode, the first relation among the AS connections with the highest voting result is judged to be consistent AS connection relations, the consistent AS connection relations are divided into consistent AS connection relation sets, and other AS connection relations are temporarily divided into AS connection relation sets which cannot be judged.
In the present embodiment, as shown in fig. 3, step S2 includes the following steps.
Step S201, each group determines the same AS connection relationship in the consistent AS connection relationship set.
Step S202, if the first ratio threshold is exceededt 1When the collection point groups determine a certain AS connection relationship AS a p2p connection relationship, the AS connection relationship is determined AS a credible p2p connection relationship; wherein the first proportional threshold valuet 1Has a value ofAverage of p2p link proportions available to a BGP acquisition Point, e.g. first ratio thresholdt 1The value is 20%.
Step S203, if the second ratio threshold value is exceededt 2When the collection point groups determine a certain AS connection relationship AS a p2c/c2p connection relationship, the AS connection relationship is determined AS a credible p2c/c2p connection relationship; wherein the second proportional threshold valuet 2Is the average of the p2c/c2p link proportions that all BGP acquisition points can obtain, e.g., a second proportion thresholdt 2The value is 80%.
Step S204, adding the credible p2p connection relation and the credible p2p/c2p connection relation into a credible AS connection relation set, adding other AS connection relations in a consistent AS connection relation set into an undeterminable AS connection relation set, wherein the other AS connection relations are AS connection relations which are not in the credible AS connection relation set.
In this embodiment, the classification model in step S3 is a bayesian network model based on expectation maximization. As shown in fig. 4, step S3 includes the following steps.
Step S301, establishing corresponding characteristic vectors for each AS connection relation in the credible AS connection relation set and the AS connection relation set which cannot be judged; the feature vector includes 7 attribute values: all AS connection relations meeting the consistency requirement and respectively associated with two ASs in the AS connection relations are the proportion of p2p/p2c/c2p types, the proportion of three roles of a provider, a peer and a client in all consistency AS connection relations respectively used by the two ASs in the AS connection relations, the distance between the two ASs in the AS connection relations and the currently disclosed top-Level AS (namely the number of AS connections needing to be crossed between the two ASs and the top-Level AS such AS the known and disclosed Level 3 Parent LLC), the number of BGP acquisition points observing the two AS connection relations (namely the number of BGP acquisition points acquiring and deducing the related information of the two AS connection relations), and the hierarchical distribution condition observing the BGP acquisition points respectively belonging to the two AS connection relations (namely the proportional distribution of the BGP acquiring and deducing the related information of the two AS connection relations in different AS levels), The number of these two ASs co-occurring in the same IXP (internet switching center) and the number of these two ASs co-occurring in the same hardware facility.
Step S302, the feature vector of each credible AS connection relation in all credible AS connection relation sets and the corresponding AS connection relation are used AS training set samples.
Step S303, training a bayesian network model based on expectation maximization.
Step S304, inputting the feature vector of each AS connection relation in the AS connection relation set which cannot be judged into the trained Bayesian network model, wherein the output result of the model is the judgment result of the AS connection relation.
Recording the type of the AS connection relation in the AS connection relation set which cannot be judged ASC(including three types of p2p, p2c and c2 p),P(f i |Pa(f i ), C) Indicating that the type of the AS connection relation decision isCUnder the condition of (2), the first in the feature vectoriAn attribute value off i Wherein, the probability ofPa(f i ) Is shown andf i set of related attributes, 1 ≦i ≤7,P(C) Representing the AS connection relation type in all the training set samples ASCThen the respective attribute values aref 1, f 2,..., f 7Is of typeCProbability of (2)P(C, f 1, f 2,..., f 7) As follows.
Figure 565542DEST_PATH_IMAGE003
Then the final result of the AS connection relationship
Figure 23068DEST_PATH_IMAGE004
Of the various types output for the bayesian network, the one with the highest probability is:
Figure 962074DEST_PATH_IMAGE005
in a specific implementation, the present application provides a computer storage medium and a corresponding data processing unit, where the computer storage medium is capable of storing a computer program, and the computer program, when executed by the data processing unit, may execute the inventive content of the method for inferring a connection relationship between an internet AS based on incomplete information and some or all of the steps in each embodiment provided in the present invention. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.
It is clear to those skilled in the art that the technical solutions in the embodiments of the present invention can be implemented by means of a computer program and its corresponding general-purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be substantially or partially embodied in the form of a computer program, that is, a software product, which may be stored in a storage medium and includes several instructions to enable a device (which may be a personal computer, a server, a single chip microcomputer, MUU, or a network device) including a data processing unit to execute the method in each embodiment or some parts of the embodiments of the present invention.
The present invention provides a method for deducing the connection relation of the internet AS based on incomplete information, and a plurality of methods and ways for implementing the technical scheme, and the above description is only a specific embodiment of the present invention, it should be noted that, for those skilled in the art, a plurality of improvements and modifications may be made without departing from the principle of the present invention, and these improvements and modifications should also be regarded AS the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims (10)

1. A method for deducing an Internet AS connection relation based on incomplete information is characterized by comprising the following steps:
step S1, obtaining the path information between the public BGP collecting points and the AS collected by each BGP collecting point, and grouping the BGP collecting points; performing initial judgment on the path information between the ASs acquired by the BGP acquisition points in each group to obtain a consistent AS connection relation set and an AS connection relation set which cannot be judged;
step S2, each group judges the same AS connection relation in the consistent AS connection relation set to obtain a credible p2p connection relation or p2c connection relation, and adds the credible connection relation into the credible AS connection relation set; adding other AS connection relations in the consistent AS connection relation set into an AS connection relation set which cannot be judged;
and step S3, performing classification model training by using the credible AS connection relation set, judging the AS connection relation in the AS connection relation set which cannot be judged by using the trained classification model, and outputting a judgment result.
2. The method of claim 1, wherein the step S1 comprises the following steps:
step S101, downloading public RouteView data from the Internet to obtain all BGP collection points, and extracting BGP routing data collected by the BGP collection points, wherein the BGP routing data comprises path information of Internet inter-domain routes passing through each AS in the message forwarding process;
step S102, grouping all BGP acquisition points in a random mode, wherein the number of the BGP acquisition points contained in each group isN
Step S103, in each group, an AS-Rank algorithm is respectively adopted to calculate the path information between the ASs collected by each BGP collection point in the group, and a first relation between corresponding AS connections is obtained;
and step S104, in each group, voting judgment is carried out on the first relations among the same AS connection calculated by all BGP acquisition points in the group one by one in an ensemble learning mode, the first relation among the AS connections with the highest voting result is judged to be consistent AS connection relations, the consistent AS connection relations are divided into consistent AS connection relation sets, and other AS connection relations are temporarily divided into AS connection relation sets which cannot be judged.
3. The method AS claimed in claim 2, wherein the number of BGP collection points included in each packet in step S102 is equal to or greater than the number of BGP collection points included in each packetNThe settings were as follows:
step S1021, downloading the known AS connection relations from CAIDA international measurement organization, and using the known AS connection relations AS reference data sets;
step S1022, aiming at each BGP acquisition point obtained in step S101, calculating a second relation between each AS connection in the reference data set by adopting an AS-Rank algorithm, comparing the second relation between the AS connections with the AS connection relation corresponding to the reference data set, and obtaining the misjudgment rate of judging the AS connection relation corresponding to the BGP acquisition point and the average misjudgment rate of judging all the AS connection relations in the reference data set by the BGP acquisition pointp
Step S1023, evaluating by using binomial theorem to determine the number of BGP collection points in each groupN
4. The method of claim 3, wherein the step S1023 comprises: determining the number of BGP collection points in each group by taking the probability that at least half of the BGP collection points can correctly judge the AS connection relation AS a target and the probability is not lower than 95 percentNThe formula is as follows:
arg min(N)
wherein the content of the first and second substances,
Figure 5642DEST_PATH_IMAGE001
Xindicating the number of BGP collection points that can correctly determine AS connectivity,
Figure 735832DEST_PATH_IMAGE002
indicating at least half of the BGP productionThe number of the set points can correctly judge the probability of the AS connection relation.
5. The method of claim 4, wherein the step S2 comprises the following steps:
step S201, each group judges the same AS connection relation in a consistent AS connection relation set;
step S202, if the first ratio threshold is exceededt 1When the collection point groups determine a certain AS connection relationship AS a p2p connection relationship, the AS connection relationship is determined AS a credible p2p connection relationship;
step S203, if the second ratio threshold value is exceededt 2When the collection point groups determine a certain AS connection relationship AS a p2c/c2p connection relationship, the AS connection relationship is determined AS a credible p2c/c2p connection relationship;
step S204, adding the credible p2p connection relation and the credible p2p/c2p connection relation into a credible AS connection relation set, and adding other AS connection relations in a consistent AS connection relation set into an undeterminable AS connection relation set.
6. The method of claim 5, wherein the first proportional threshold in step S202 is set AS the first proportional thresholdt 1The value of (A) is the average value of p2p link proportions which can be obtained by all BGP acquisition points; second proportional threshold in step S203t 2The value of (A) is the average of the p2c/c2p link ratios that all BGP collection points can obtain.
7. The method of claim 6, wherein the classification model in step S3 is a Bayesian network model based on expectation maximization.
8. The method of claim 7, wherein the step S3 comprises the following steps:
step S301, establishing corresponding characteristic vectors for each AS connection relation in the credible AS connection relation set and the AS connection relation set which cannot be judged;
step S302, the characteristic vector of each credible AS connection relation in all credible AS connection relation sets and the corresponding AS connection relation are used AS training set samples;
step S303, training a Bayesian network model based on expectation maximization;
step S304, inputting the feature vector of each AS connection relation in the AS connection relation set which cannot be judged into the trained Bayesian network model, wherein the output result of the model is the judgment result of the AS connection relation.
9. The method of claim 8, wherein the feature vector in step S301 includes 7 attribute values: all AS connection relations meeting the consistency requirement and respectively associated with two ASs in the AS connection relations are p2p or p2c/c2p type proportions, proportions that two ASs in the AS connection relations respectively serve AS three roles of a provider, a peer or a client in all consistency AS connection relations, distances between two ASs in the AS connection relations and a top-level AS which is disclosed at present, the number of BGP acquisition points observing the two AS connection relations, the hierarchical distribution situation that the BGP acquisition points observing the two AS connection relations respectively belong to, the number of the two ASs appearing in the same Internet exchange center together and the number of the two ASs appearing in the same hardware facility together.
10. The method of claim 9, wherein the step S304 comprises:
recording the type of the AS connection relation in the AS connection relation set which cannot be judged ASCP(f i |Pa(f i ), C) Indicating the type of connection relation determination in ASIs shaped asCUnder the condition of (1), the first in the feature vectoriAn attribute value off i Wherein, the probability ofPa(f i ) Is shown andf i set of related attributes, 1 ≦i ≤7,P(C) Representing the AS connection relation type in all the training set samples ASCThen the respective attribute values are respectivelyf 1, f 2,..., f 7AS connection relation to be determined of typeCProbability of (2)P(C, f 1, f 2,..., f 7) The following:
Figure 577886DEST_PATH_IMAGE003
then the final result of the AS connection relationship
Figure 284680DEST_PATH_IMAGE004
The most probable one of the various types output by the bayesian network is:
Figure 161369DEST_PATH_IMAGE005
CN202210561377.9A 2022-05-23 2022-05-23 Method for deducing Internet AS connection relation based on incomplete information Pending CN114640620A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210561377.9A CN114640620A (en) 2022-05-23 2022-05-23 Method for deducing Internet AS connection relation based on incomplete information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210561377.9A CN114640620A (en) 2022-05-23 2022-05-23 Method for deducing Internet AS connection relation based on incomplete information

Publications (1)

Publication Number Publication Date
CN114640620A true CN114640620A (en) 2022-06-17

Family

ID=81953068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210561377.9A Pending CN114640620A (en) 2022-05-23 2022-05-23 Method for deducing Internet AS connection relation based on incomplete information

Country Status (1)

Country Link
CN (1) CN114640620A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112995183A (en) * 2021-03-05 2021-06-18 清华大学 Internet routing information leakage detection method
CN113111910A (en) * 2021-03-05 2021-07-13 清华大学 Inference method for business relation between internet autonomous systems
EP3945710A1 (en) * 2020-07-31 2022-02-02 CatchPoint Systems, Inc. Method and system to reduce a number of border gateway protocol neighbors crossed to reach target autonomous systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3945710A1 (en) * 2020-07-31 2022-02-02 CatchPoint Systems, Inc. Method and system to reduce a number of border gateway protocol neighbors crossed to reach target autonomous systems
CN112995183A (en) * 2021-03-05 2021-06-18 清华大学 Internet routing information leakage detection method
CN113111910A (en) * 2021-03-05 2021-07-13 清华大学 Inference method for business relation between internet autonomous systems

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZITONG JIN等: ""TopoScope: Recover AS Relationships From Fragmentary Observations"", 《IMC’20》 *

Similar Documents

Publication Publication Date Title
Fitni et al. Implementation of ensemble learning and feature selection for performance improvements in anomaly-based intrusion detection systems
Kumar et al. Detection of distributed denial of service attacks using an ensemble of adaptive and hybrid neuro-fuzzy systems
Karami et al. A fuzzy anomaly detection system based on hybrid PSO-Kmeans algorithm in content-centric networks
Boshmaf et al. Graph-based sybil detection in social and information systems
US20140317736A1 (en) Method and system for detecting fake accounts in online social networks
CN107683586A (en) Method and apparatus for rare degree of the calculating in abnormality detection based on cell density
CN113821793B (en) Multi-stage attack scene construction method and system based on graph convolution neural network
Patcha et al. Network anomaly detection with incomplete audit data
Silva et al. A statistical analysis of intrinsic bias of network security datasets for training machine learning mechanisms
Lin et al. Machine learning with variational autoencoder for imbalanced datasets in intrusion detection
Tan et al. An Internet Traffic Identification Approach Based on GA and PSO-SVM.
Hostiadi et al. Hybrid model for bot group activity detection using similarity and correlation approaches based on network traffic flows analysis
Krenc et al. AS-level BGP community usage classification
Mughaid et al. Utilizing machine learning algorithms for effectively detection iot ddos attacks
Tian et al. A dynamic online traffic classification methodology based on data stream mining
Wang et al. ICDF: Intrusion collaborative detection framework based on confidence
CN114640620A (en) Method for deducing Internet AS connection relation based on incomplete information
CN114362972B (en) Botnet hybrid detection method and system based on flow abstract and graph sampling
Li et al. ProbInfer: Probability-based AS path inference from multigraph perspective
de Souza et al. Network traffic classification using AdaBoost dynamic
CN108347447B (en) P2P botnet detection method and system based on periodic communication behavior analysis
CN113938292A (en) Vulnerability attack flow detection method and detection system based on concept drift
Sarabi et al. Smart internet probing: Scanning using adaptive machine learning
Xu et al. Research of P2P traffic identification based on naive bayes and decision tables combination algorithm
Drašar Behavioral detection of distributed dictionary attacks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220617