CN113312895A - Organization mapping method and device of autonomous system AS and electronic equipment - Google Patents

Organization mapping method and device of autonomous system AS and electronic equipment Download PDF

Info

Publication number
CN113312895A
CN113312895A CN202110554627.1A CN202110554627A CN113312895A CN 113312895 A CN113312895 A CN 113312895A CN 202110554627 A CN202110554627 A CN 202110554627A CN 113312895 A CN113312895 A CN 113312895A
Authority
CN
China
Prior art keywords
entities
similarity
attribute
pair
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110554627.1A
Other languages
Chinese (zh)
Inventor
张沛
黄小红
严欢
白峻东
舒坤博
徐鹏举
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202110554627.1A priority Critical patent/CN113312895A/en
Publication of CN113312895A publication Critical patent/CN113312895A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides an organization mapping method and device of an autonomous system AS and electronic equipment. The method comprises the following steps: acquiring data of multidimensional attributes of each AS entity in a plurality of AS entities to form an AS entity set, wherein the multidimensional attributes comprise a plurality of attribute components, and one of the attribute components indicates an organization mechanism to which the AS entity belongs; for each pair of attribute components with the same type of every two AS entities in the AS entity set, adopting an algorithm corresponding to the type of the pair of attribute components to calculate the similarity between the pair of attribute components so AS to obtain an attribute similarity vector between the two AS entities; determining similarity between the two AS entities based on the attribute similarity vector; and according to the similarity between every two AS entities in the AS entity set, merging the organization mechanisms of the AS entities to realize the organization mechanism mapping of the AS entities. Therefore, the accuracy of the organization mechanism mapping of the autonomous system AS can be improved, and the situations of misjudgment and missed judgment can be reduced.

Description

Organization mapping method and device of autonomous system AS and electronic equipment
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to an organization mapping method and apparatus for an autonomous system AS, and an electronic device.
Background
An Autonomous System (AS) is a basic component unit of a network space and a carrier of information resources, and is interconnected with a route transmission device through a Border Gateway Protocol (BGP) to implement global network intercommunication, thereby implementing communication of different organizations. Different technical management departments manage and operate different autonomous domains, each autonomous domain has a specific organization to which the autonomous domain belongs, similarly, each organization can simultaneously have a plurality of autonomous systems obtained by applying for AS numbers, the AS of the same organization has certain similarity in network management and safety protection strategies, an organization mapping topology of the AS is formed, the organization mapping topology reflects the alliance governance relationship of the autonomous domains to a certain extent, and a covering network for governing the network space is formed.
However, the conventional autonomous domain organization mapping is limited to individual attributes, and the definition of the organization is limited to a single business entity, and there are cases where misjudgment and misjudgment are made regarding the autonomous domain assignment of a transnational organization and an umbrella-type organization. In addition, the information of the autonomous system is diversified in the internet registration mechanism, and the situations of data ambiguity and inconsistency exist, which brings certain difficulty for the mapping of the autonomous domain organization mechanism.
Disclosure of Invention
In view of the above, the present disclosure is directed to an organization mapping method and apparatus for an autonomous system AS, and an electronic device, which can solve or partially solve the above problems.
Based on the above object, a first aspect of the present disclosure provides an organization mapping method for an autonomous system AS, including:
acquiring data of multidimensional attributes of each AS entity in a plurality of AS entities to form an AS entity set, wherein the multidimensional attributes comprise a plurality of attribute components, and one of the attribute components indicates an organization mechanism to which the AS entity belongs;
for each two AS entities in the set of AS entities,
for each pair of attribute components with the same type of the two AS entities, calculating the similarity between the pair of attribute components by adopting an algorithm corresponding to the type of the pair of attribute components to obtain an attribute similarity vector between the two AS entities;
determining similarity between the two AS entities based on the attribute similarity vector;
according to the similarity between every two AS entities in the AS entity set, conducting organization and organization merging on the AS entities so AS to achieve organization and organization mapping of the AS entities.
A second aspect of the present disclosure provides an AS entity organizational structure mapping apparatus based on entity similarity, including:
an obtaining module, configured to obtain data of multidimensional attributes of each AS entity in a plurality of AS entities to form an AS entity set, where the multidimensional attributes include a plurality of attribute components, and one of the plurality of attribute components indicates an organization to which the AS entity belongs;
a similarity determining module, configured to calculate, for each two AS entities in the AS entity set, a similarity between each pair of attribute components of the two AS entities by using an algorithm corresponding to the type of the pair of attribute components, so AS to obtain an attribute similarity vector between the two AS entities; determining similarity between the two AS entities based on the attribute similarity vector;
and the mapping module is used for merging the organization mechanisms of the AS entities according to the similarity between every two AS entities in the AS entity set so AS to realize the organization mechanism mapping of the AS entities.
A third aspect of the disclosure provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor, the processor implementing the method according to the first aspect when executing the computer program.
A fourth aspect of the present disclosure provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect.
AS can be seen from the foregoing, the organizational structure mapping method, apparatus, and electronic device for an autonomous system AS provided by the present disclosure can calculate the similarity of two AS entities with respect to various attribute components according to the types of various attribute components in the obtained multidimensional attribute data of the AS entities, determine an attribute similarity vector between the two AS entities according to each calculated similarity, determine whether the two AS entities are similar according to the attribute similarity vector, and then map the organization of the similar AS entities together. Therefore, the accuracy of the organization mechanism mapping of the autonomous system AS can be improved, and the situations of misjudgment and missed judgment caused by low accuracy of the organization mechanism mapping of the autonomous system AS are avoided.
Drawings
In order to more clearly illustrate the technical solutions in the present disclosure or related technologies, the drawings needed to be used in the description of the embodiments or related technologies are briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of an organizational structure mapping method of an autonomous system AS according to an embodiment of the present disclosure;
FIG. 2 is an expanded schematic diagram illustrating the steps 100 of an organization mapping method of an autonomous system AS according to an embodiment of the disclosure;
FIG. 3 is a schematic diagram illustrating an execution flow of a first algorithm of an organization mapping method of an autonomous system AS according to an embodiment of the disclosure;
FIG. 4 is a schematic diagram illustrating an execution flow of a second algorithm of the organization mapping method of the autonomous system AS according to the embodiment of the disclosure;
FIG. 5 is a schematic diagram illustrating an execution flow of a third algorithm of an organization mapping method of an autonomous system AS according to an embodiment of the disclosure;
FIG. 6 is an expanded schematic diagram illustrating the steps 300 of a method for organizational structure mapping of an autonomous system AS according to an embodiment of the disclosure;
FIG. 7 is a flowchart of an organizational structure mapping method of an autonomous system AS according to another embodiment of the disclosure;
FIG. 8 is a block diagram illustrating an organization mapping apparatus of an autonomous system AS according to an embodiment of the present disclosure;
fig. 9 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
It is to be noted that technical terms or scientific terms used in the embodiments of the present disclosure should have a general meaning as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of the terms "comprising" or "including" and the like in the embodiments of the present disclosure is intended to mean that the elements or items listed before the term cover the elements or items listed after the term and their equivalents, without excluding other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, direct connections, indirect connections, wired connections, and wireless connections.
The AS entity has multidimensional attributes including an AS name, an organization, a country of affiliation, a description, a technical contact, an administrative contact, a contact telephone, a contact mailbox, an organization name of an affiliation, a technical contact, an administrative contact and the like, and the multidimensional attributes of the autonomous domain reflect the clustering relation of the AS entity to a certain extent. The Internet Registry (RIR) is an organization that manages Internet resources in a particular region of the world, including the autonomous domain, IP addresses, and routing registration information described above. The multidimensional attributes of the autonomous system may be obtained from an internet registry.
The autonomous system topology abstracts the AS entity level topology into an undirected graph, wherein each AS entity is regarded AS a single point in the graph, the interconnection relationship among the AS entities through a border gateway is regarded AS an edge in the graph, and the importance degree and the hierarchical relationship of the AS entities are shown by the degree of the node and the business relationship among the AS entities. In order to provide more accurate AS entity network topology models for Internet researchers, a topology modeling researcher successively provides a large number of topology models, such AS a random network model, a hierarchical model, a small world model, a power law model, a local world model and the like. Based on the topology models, a plurality of AS entity level topology generation algorithms are generated, including a tree layout algorithm, a grid layout algorithm and a force guidance layout algorithm, and the problems of reasonable layout and calculation efficiency of AS entity topology nodes are solved.
In the related art, there are two methods for mapping organizations of the autonomous system, one is that the registration organization fills data information by itself, such as pch (packet Clearing house), which is an international organization responsible for providing operation support and security guarantee for key internet infrastructure, including the core of internet switching points and domain name systems; and the other method is to realize organizational structure mapping through AS entity clustering, obtain autonomous system data from an RIR WHOIS database, firstly, create an object for each AS entity in each database, then consider other objects linked to a given AS entity in the RIR database, allocate the fields of the objects to the specified AS entity objects, and finally, analyze the similarity between the objects by using a machine-based learning algorithm to complete the organizational structure mapping of the autonomous system AS.
However, the autonomous domain organization mapping is limited to individual attributes, the organization definition is limited to a single business entity, and misjudgment may occur in regard to the autonomous domain attribution of a transnational organization or an umbrella-type organization. In addition, the information of the autonomous system is diversified in the internet registration mechanism, and the situations of data ambiguity and inconsistency exist, which brings certain difficulty for the mapping of the autonomous domain organization mechanism.
AS shown in fig. 1, the organization mapping method of an autonomous system AS provided in this embodiment includes:
step 100, obtaining data of multidimensional attributes of each AS entity in a plurality of AS entities to form an AS entity set, where the multidimensional attributes include a plurality of attribute components, and one of the plurality of attribute components indicates an organization to which the AS entity belongs.
In this step, the entity attribute information of the AS entity generally includes a plurality of types, each of which constitutes data whose one dimension forms a multidimensional attribute. The data of the multi-dimensional attributes includes at least one of: AS name, organization, country of ownership, description, technical contact, administrative contact, contact phone, contact mailbox, and organization name of the affiliate, technical contact, administrative contact, etc.
And performing centralized integration on the acquired data of the multidimensional attributes of each AS entity, performing listing according to the AS names, filling the acquired attribute information of each dimension of the AS entity into a table by taking the AS names AS a first column and taking other dimensions and each dimension AS a column to form an AS entity set.
In step 200, for each two AS entities in the set of AS entities, the similarity between the two AS entities is determined.
The method specifically comprises the following steps:
for each pair of attribute components with the same type of every two AS entities in the AS entity set, adopting an algorithm corresponding to the type of the pair of attribute components to calculate the similarity between the pair of attribute components so AS to obtain an attribute similarity vector between the two AS entities; based on the attribute similarity vector, the similarity between the two AS entities is determined.
Step 300, according to the similarity between every two AS entities in the AS entity set, performing organizational merging on the multiple AS entities to realize organizational mapping of the multiple AS entities.
In this step, similar AS entities are grouped into a similar group according to the similarity, so that a plurality of similar groups can be obtained, wherein the AS entities in the similar group cannot be repeated, but the AS entities between the similar groups can be repeated. Each similarity group is the result of the organizational structure mapping of a plurality of AS entities.
By the scheme, the organizational structure mapping method, the organizational structure mapping device and the electronic equipment of the autonomous system AS can calculate the similarity of the two AS entities aiming at various attribute components according to the types of the various attribute components in the obtained data of the multidimensional attributes of the AS entities, determine the attribute similarity vector between the two AS entities according to the calculated similarity, further determine whether the two AS entities are similar according to the attribute similarity vector, and map the organization structures of the similar AS entities together. Therefore, the accuracy of the organization mechanism mapping of the AS entity can be improved, and the situations of misjudgment and missed judgment caused by low accuracy of the organization mechanism mapping of the AS entity are avoided.
In a specific embodiment, as shown in fig. 2, step 100 specifically includes:
step 110, obtaining data of multidimensional attributes of each of a plurality of AS entities from an Internet registry. Wherein the data of the multi-dimensional attributes includes at least one of: string attribute, text attribute, and list attribute information.
The number of the character string attribute, the text attribute and the list attribute information is at least one.
And step 120, setting missing attribute components in the multidimensional attribute to be null values, and carrying out normalization processing on the multidimensional attribute to form an AS entity set.
In this step, in order to facilitate subsequent calculation processing of the multidimensional attribute data, the missing part is replaced with a null value, which may be a space character or "0".
And performing list storage on the acquired character string attributes, text attributes and list attribute information according to the arrangement sequence (which can be arranged randomly or arranged according to initials) of the AS entities to form an AS entity set.
Or constructing a corresponding table for each AS entity, adding corresponding character string attributes, text attributes and list attribute information to the corresponding AS entity table, and integrating and storing all AS entity tables in a folder to form an AS entity set.
Through the scheme, the acquired data with the multidimensional attribute is classified, so that the similarity can be calculated according to the corresponding classes, and the calculated similarity is more accurate.
In a specific embodiment, in step 200, for each pair of attribute components of every two AS entities in the set of AS entities, which have the same type, in response to determining that the type of the pair of attribute components belongs to the character string attribute, the following first algorithm is used to calculate the similarity between the pair of attribute components, AS shown in fig. 3, where the calculating step of the first algorithm includes:
in step 211, it is determined whether the character strings of the pair of attribute components are the same, if yes, step 212 is performed, otherwise step 213 is performed.
At step 212, it is determined that the similarity between the pair of attribute components is equal to a first value indicating that the attributes are similar.
It is determined that the similarity between the pair of attribute components is equal to a second value indicating that the attributes are dissimilar, step 213.
The corresponding string attribute includes at least one of: country affiliation, organization information, managers, technical contacts, route maintenance organizations.
In a specific embodiment, in step 200, for each pair of attribute components of every two AS entities in the set of AS entities with the same type, in response to determining that the type of the pair of attribute components belongs to a text attribute, the following second algorithm is used to calculate the similarity between the pair of attribute components, AS shown in fig. 4, where the second algorithm calculating step includes:
step 221, performing word segmentation on the respective texts of the pair of attribute components to obtain two word frequency-inverse document frequency TF-IDF (term frequency-inverse document frequency) vectors corresponding to the pair of attribute components respectively.
In step 222, the cosine similarity between the two TF-IDF vectors is calculated.
Step 223, performing normalization processing on the calculated cosine similarity.
Step 224, determining whether the cosine similarity after the normalization processing is greater than a first predetermined threshold, if so, going to step 225, otherwise, going to step 226.
Step 225, determining that the similarity between the pair of attribute components is equal to a first value indicating that the attributes are similar;
at step 226, it is determined that the similarity between the pair of attribute components is equal to a second value indicating that the attributes are dissimilar.
In the above step, the related information of the corresponding text attribute includes at least one of: AS name, description information; the description information is a text description of the AS entity.
The text attributes of the two AS entities are subjected to word segmentation in advance, and the adverbs and the null words in the words are deleted (for example, "of", "etc.").
And then calculating TFIDF scores of all words of one AS entity, wherein the TFIDF scores are divided into two parts: TF value and IDF value.
Wherein, the TF value calculation formula is as follows: tf (t) is the number of times the term t appears in the document/total number of terms in the document.
The IDF value is calculated as: idf (t) ln (total number of documents/total number of documents in which the word t appears). And t is the sequence number of the corresponding word in the text attribute.
And combining the TF value and the IDF value to be used AS a TFIDF score, and integrating the TFIDF scores of all words in the text attribute of the AS entity to form a TF-IDF vector.
Thus, two TF-IDF vectors of the text attributes of the two AS entities needing similarity judgment can be obtained, wherein the two TF-IDF vectors are respectively AS1iAnd AS2i
The cosine similarity between two TF-IDF vectors is calculated as follows:
Figure BDA0003076370470000071
wherein, AS1iAnd AS2iIs a TF-IDF vector of two AS entities, n is the number of text attributes, and i belongs to n.
Cosine similarity measures the similarity between two vectors by measuring their cosine values of their angle. Since the value interval of the normalized cosine is [0,1], the corresponding first predetermined threshold is a value (e.g., 0.7) in the value interval, and a specific value may be selected according to an actual situation, which is not specifically limited herein.
The cosine similarity of the text attributes of the two AS entities is calculated based on the TF-IDF vector, so that the text attribute similarity of the two AS entities determined based on the first preset threshold value can be more fit with the actual situation, and the accuracy is higher.
In a specific embodiment, in step 200, for each pair of attribute components of every two AS entities in the set of AS entities, which have the same type, in response to determining that the type of the pair of attribute components belongs to the list attribute, the following third algorithm is used to calculate the similarity between the pair of attribute components, AS shown in fig. 5, where the calculating step of the third algorithm includes:
step 231, calculate the Jaccard similarity coefficient for the list of pairs of attribute components.
In step 232, it is determined whether the calculated Jaccard similarity coefficient is greater than a second predetermined threshold, if yes, step 233 is performed, otherwise step 234 is performed.
At step 233, it is determined that the similarity between the pair of attribute components is equal to a first value indicating that the attributes are similar.
At step 234, it is determined that the similarity between the pair of attribute components is equal to a second value indicating that the attributes are dissimilar.
In the above step, the related information of the corresponding list attribute includes at least one of: route input and route output. List attribute information A and B of every two AS entities in the AS entity set are obtained. Calculating the Jaccard similarity coefficient of A and B according to the following formula, and taking the Jaccard similarity coefficient AS the list attribute similarity of the two AS entities:
Figure BDA0003076370470000081
where sim (a, B) is the list attribute similarity, and Jaccard (a, B) is the Jaccard similarity coefficient. The Jaccard similarity coefficient is used to compare similarity and difference between limited sample sets. The larger the Jaccard coefficient value, the higher the sample similarity. The value of the Jaccard similarity coefficient is that Jaccard (A, B) is less than or equal to 1.
By the scheme, the information quantity of the list attributes is large, the data is large, and if each piece of data is compared in sequence, time is wasted through comparison, so that whether the list attribute information of the two AS entities is similar or not is judged by using the Jaccard similarity coefficient, workload can be reduced, and meanwhile, whether the list attribute information of the two AS entities is similar or not can be compared better.
In a specific embodiment, for each two AS entities in the set of AS entities, determining the similarity between the two AS entities based on the attribute similarity vector comprises:
determining similarity between the two AS entities in response to determining that at least one component of the attribute similarity vector between the two AS entities is equal to the first value.
In a specific embodiment, for each two AS entities in the set of AS entities, determining the similarity between the two AS entities based on the attribute similarity vector further includes:
determining that the two AS entities are dissimilar in response to determining that each component of the attribute similarity vector between the two AS entities is equal to the second value.
In the above scheme, the similarity of the character string attributes, and/or the similarity of the text attributes, and/or the similarity of the list attributes are integrated to obtain the attribute similarity vectors of the two ASs. Based on the attribute similarity vector, the similarity between the two AS entities is determined.
Wherein, the first value takes the value of '1', and the second value takes the value of '0'. A value of "0" indicates "dissimilar", and a value of "1" indicates "similar". Thus, the attribute similarity vector can be obtained according to the corresponding value. And if the attribute similarity vector has at least one value of '1', determining the similarity of the two AS entities AS 'similar', otherwise, determining the similarity of the two AS entities AS 'dissimilar'.
For example, the obtained attribute similarity vector of the two AS entities is (0, 1, 1), where there are two "1" s and thus the two AS entities are similar.
In addition, adding corresponding weight values, and integrating numerical values obtained by multiplying the similarity of the character string attributes, the similarity of the text attributes and the similarity of the list attributes by the corresponding weight values to obtain an attribute similarity vector. The matched weight value can be set by self according to the contribution degree of the similarity of the corresponding character string attributes, the similarity of the text attributes and the similarity of the list attributes to the similarity of the two AS entities, and the weight value is a real number larger than 0.
In a specific embodiment, as shown in fig. 6, step 300 specifically includes:
step 310, according to the similarity between every two AS entities in the AS entity set, dividing the AS entity set into a plurality of similar AS entity sets, wherein each AS entity in each similar AS entity set is similar.
And 320, for each similar AS entity set, extracting the name character strings of the organization mechanism to which the AS entity belongs from the data of the multidimensional attribute of each AS entity in the similar AS entity set, and combining the extracted name character strings to serve AS the attributive organization mechanism identifier of each AS entity in the similar AS entity set.
Step 330, merging the AS entities with the same home organization identifier in the set of AS entities. Thereby completing the organizational structure mapping.
In the above step, the extracted name character strings are combined, specifically, the name character strings are accumulated to obtain an accumulated value, or the name character strings are arranged and combined to form a combined character string; and taking the accumulated value or the combined character string AS the home organization identifier of each AS entity in the similar AS entity set.
The organization mapping can be completed after the organization identifiers are merged, so that whether the two AS entities are similar can be judged only according to the corresponding organization identifiers when the subsequent identification judgment is carried out. Plays a crucial role in detecting the hijacking event of the router.
Wherein each AS entity in the set of AS entities may be correspondingly tagged with one or more organizational identifiers.
For example, for a route hijacking event detected by a route monitoring system, the affected AS entity and the attacking AS entity probably belong to the same organization, and the similarity between the two AS entities is effectively identified based on the entity similarity calculation, so that the false hijacking judgment caused by the fact that the system cannot identify the AS entities can be avoided.
The organization mapping method of the autonomous system AS proposed in this embodiment has the following flow, AS shown in fig. 7, and its main technical solution is:
step A, obtaining multi-dimensional attributes of AS entities from an Internet registration mechanism to form an AS entity set, and carrying out standardization processing on each attribute of the AS entities.
And step B, taking any two AS entities from the AS entity set AS an AS entity pair.
And C, calculating the similarity of the selected AS entity pair, and calculating the similarity of the two AS entities according to the similarity.
And D, judging whether an AS entity pair without the calculated similarity exists in the AS entity set, if so, returning to the step C, otherwise, entering the step E.
And E, according to the similarity of the AS entities, merging the organization mechanisms of the AS entities to realize the organization mechanism mapping of the AS entities.
The step A specifically comprises the following steps:
A1. obtaining multidimensional information of an AS entity from an Internet registration mechanism RIR (regional Internet registration), and forming multidimensional attributes of the AS entity, wherein the multidimensional attributes comprise an AS name, a country attribution, an organization name, description information, an administrator, a technical contact, a route maintenance mechanism, route input and route output, the country attribution, the organization information, the administrator, the technical contact and the route maintenance mechanism are character string attributes, the AS name and the description information are text attributes, and the route input and the route output are list attributes.
A2. And setting the missing attribute of the AS entity AS a null value to obtain all AS entity sets.
And C, calculating the similarity of the attributes corresponding to the two selected AS entities, wherein the calculation process is AS follows:
C1. if the attributes corresponding to the two AS entities are identifier attributes and the character string attributes are the same, the attributes corresponding to the two AS entities are similar, otherwise, the attributes are not similar.
C2. Aiming at the text attributes of the two AS entities, the attribute similarity calculation method comprises the following steps:
C21. and performing word segmentation on the text corresponding to the AS entity attribute to obtain the TF-IDF vector of each AS entity text attribute.
C22. Cosine similarity is used for measuring similarity between the two n-dimensional TF-IDF vectors of the AS entity 1 and the AS entity 2, and the cosine similarity between the two n-dimensional TF-IDF vectors can be calculated by the following formula:
Figure BDA0003076370470000101
C23. the cosine similarity value is normalized to be the number of the interval [0,1 ].
C24. If the similarity value is larger than 0.7, the corresponding attributes of the two AS entities are similar, otherwise, the two AS entities are not similar.
C3. For the list attributes of two AS entities, a Jaccard similarity coefficient is calculated to measure the similarity of the corresponding attributes, for example, the similarity formula of the attribute a and the attribute B is AS follows:
Figure BDA0003076370470000111
C4. and after similarity calculation is carried out on each corresponding attribute, attribute similarity vectors of the two AS entities are obtained, the value corresponding to each component in the vectors is 0 or 1, 0 represents that the attributes are not similar, and 1 represents similarity.
C5. If all components of the two AS entity attribute similarity vectors are 0, the two AS entities are not similar, otherwise the two AS entities are similar.
The step E specifically comprises the following steps:
E1. and acquiring a similar AS entity set of each AS entity, extracting the organization name of each AS entity in the similar AS entity set to form a similar AS entity organization name set of the AS entity, and adding the structure name character strings in the set to be used AS the attributive organization identifier of the AS entity and each AS entity in the similar AS entity set.
E2. And merging the AS entities with the same organization identifier in the AS entity set, and mapping the organizations of the AS entities.
It should be noted that the method of the embodiments of the present disclosure may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may only perform one or more steps of the method of the embodiments of the present disclosure, and the devices may interact with each other to complete the method.
It should be noted that the above describes some embodiments of the disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the same inventive concept, corresponding to the method of any embodiment, the disclosure also provides an organization mapping device of the autonomous system AS.
Referring to fig. 8, the apparatus includes:
an obtaining module 21, configured to obtain data of multidimensional attributes of each AS entity in multiple AS entities to form an AS entity set, where the multidimensional attributes include multiple attribute components, and one of the multiple attribute components indicates an organization to which the AS entity belongs;
a similarity determining module 22, configured to calculate, for each two AS entities in the AS entity set, a similarity between each pair of attribute components of the two AS entities by using an algorithm corresponding to the type of the pair of attribute components, so AS to obtain an attribute similarity vector between the two AS entities; determining similarity between the two AS entities based on the attribute similarity vector;
the mapping module 23 is configured to perform organization and organization merging on the multiple AS entities according to similarity between every two AS entities in the AS entity set, so AS to implement organization and organization mapping of the multiple AS entities.
In a specific embodiment, the obtaining module 21 specifically includes:
the system comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for acquiring data of multidimensional attributes of each AS entity in a plurality of AS entities from an Internet registration mechanism;
and the completion normalization unit is used for performing normalization processing on the multidimensional attribute by setting the missing attribute components in the multidimensional attribute AS null values so AS to form an AS entity set.
In a specific embodiment, for each pair of attribute components of every two AS entities in the set of AS entities, where the type of the pair of attribute components is the same, in response to determining that the type of the pair of attribute components belongs to the character string attribute, a first algorithm unit is employed to calculate a similarity between the pair of attribute components, where the first algorithm unit is specifically configured to:
in response to determining that the strings of the pair of attribute components are the same, determining that a similarity between the pair of attribute components is equal to a first value indicating that the attributes are similar; in response to determining that the strings of the pair of attribute components are not the same, determining that the similarity between the pair of attribute components is equal to a second value indicating that the attributes are not similar.
In a specific embodiment, for each pair of attribute components of every two AS entities in the set of AS entities, where the type of the pair of attribute components is the same, in response to determining that the type of the pair of attribute components belongs to a text attribute, a second algorithm unit is employed to calculate a similarity between the pair of attribute components, where the second algorithm unit is specifically configured to:
respectively carrying out word segmentation on the texts of the attribute components to obtain two word frequency-inverse document frequency TF-IDF vectors respectively corresponding to the attribute components; calculating cosine similarity between the two TF-IDF vectors; normalizing the calculated cosine similarity; in response to determining that the cosine similarity after the normalization process is greater than a first predetermined threshold, determining that the similarity between the pair of attribute components is equal to a first value representing that the attributes are similar; in response to determining that the normalized cosine similarity is not greater than a first predetermined threshold, determining that the similarity between the pair of attribute components is equal to a second value indicating that the attributes are dissimilar.
In a specific embodiment, for each pair of attribute components of every two AS entities in the set of AS entities, where the type of the pair of attribute components is the same, in response to determining that the type of the pair of attribute components belongs to the list attribute, a third algorithm unit is employed to calculate a similarity between the pair of attribute components, where the third algorithm unit is specifically configured to:
calculating the Jaccard similarity coefficient of the list of the pair of attribute components; in response to determining that the calculated Jaccard similarity coefficient is greater than a second predetermined threshold, determining that the similarity between the pair of attribute components is equal to a first value indicative of the attributes being similar; in response to determining that the calculated Jaccard similarity coefficient is not greater than the second predetermined threshold, determining that the degree of similarity between the pair of attribute components is equal to a second value indicating that the attributes are dissimilar.
In a specific embodiment, the similarity determination module 22 is specifically configured to:
determining similarity between the two AS entities in response to determining that at least one component of the attribute similarity vector between the two AS entities is equal to the first value.
In a specific embodiment, the similarity determination module 22 is further specifically configured to:
determining that the two AS entities are dissimilar in response to determining that each component of the attribute similarity vector between the two AS entities is equal to the second value.
In a specific embodiment, the mapping unit specifically includes:
and dividing the AS entity set into a plurality of similar AS entity sets according to the similarity between every two AS entities in the AS entity set, wherein each AS entity in each similar AS entity set is similar.
And the extracting unit is used for extracting the name character string of the organization mechanism to which the AS entity belongs from the data of the multidimensional attribute of each AS entity in each similar AS entity set.
And a combining unit, configured to combine the extracted name strings AS a home organization identifier of each AS entity in the similar AS entity set.
And the merging unit is used for merging the AS entities with the same attribution organization identifiers in the AS entity set.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the various modules may be implemented in the same one or more software and/or hardware implementations of the present disclosure.
The apparatus of the foregoing embodiment is used to implement the corresponding organization mapping method of the autonomous system AS in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above embodiments, the present disclosure further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and operable on the processor, and when the processor executes the program, the organization mapping method of the autonomous system AS according to any of the above embodiments is implemented.
Fig. 9 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The input/output/module may be configured as a component within the device (not shown in fig. 9) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in fig. 9) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The electronic device of the foregoing embodiment is used to implement the corresponding organization mapping method of the autonomous system AS in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above-described embodiment methods, the present disclosure also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the organization mapping method of the autonomous system AS according to any of the above embodiments.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
The computer instructions stored in the storage medium of the foregoing embodiment are used to enable the computer to execute the organization mapping method of the autonomous system AS according to any embodiment, and have the beneficial effects of corresponding method embodiments, and are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the present disclosure, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present disclosure as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the disclosure. Furthermore, devices may be shown in block diagram form in order to avoid obscuring embodiments of the present disclosure, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the present disclosure are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that the embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made within the spirit and principles of the embodiments of the disclosure are intended to be included within the scope of the disclosure.

Claims (10)

1. An organization mapping method of an Autonomous System (AS), comprising the following steps:
acquiring data of multidimensional attributes of each AS entity in a plurality of AS entities to form an AS entity set, wherein the multidimensional attributes comprise a plurality of attribute components, and one of the attribute components indicates an organization mechanism to which the AS entity belongs;
for each two AS entities in the set of AS entities,
for each pair of attribute components with the same type of the two AS entities, calculating the similarity between the pair of attribute components by adopting an algorithm corresponding to the type of the pair of attribute components to obtain an attribute similarity vector between the two AS entities;
determining similarity between the two AS entities based on the attribute similarity vector;
according to the similarity between every two AS entities in the AS entity set, conducting organization and organization merging on the AS entities so AS to achieve organization and organization mapping of the AS entities.
2. The method of claim 1, wherein the obtaining data of the multidimensional attributes of each of the plurality of AS entities to form the set of AS entities comprises:
obtaining data of the multidimensional attribute of each of the plurality of AS entities from an Internet registry;
and setting the missing attribute components in the multidimensional attribute AS null values, and carrying out normalized processing on the multidimensional attribute to form the AS entity set.
3. The method of claim 1, wherein, for each pair of attribute components of every two AS entities in the set of AS entities having the same type, in response to determining that the type of the pair of attribute components belongs to a string attribute, employing the following first algorithm to calculate a similarity between the pair of attribute components:
in response to determining that the strings of the pair of attribute components are the same, determining that a similarity between the pair of attribute components is equal to a first value indicating that the attributes are similar;
in response to determining that the strings of the pair of attribute components are not the same, determining that the similarity between the pair of attribute components is equal to a second value indicating that the attributes are not similar.
4. The method of claim 1, wherein, for each pair of attribute components of every two AS entities in the set of AS entities having the same type, in response to determining that the type of the pair of attribute components belongs to a text attribute, employing the following second algorithm to calculate a similarity between the pair of attribute components:
respectively carrying out word segmentation on the texts of the attribute components to obtain two word frequency-inverse document frequency TF-IDF vectors respectively corresponding to the attribute components;
calculating cosine similarity between the two TF-IDF vectors;
normalizing the calculated cosine similarity;
in response to determining that the cosine similarity after the normalization process is greater than a first predetermined threshold, determining that the similarity between the pair of attribute components is equal to a first value representing attribute similarity;
in response to determining that the cosine similarity after the normalization process is not greater than the first predetermined threshold, determining that the similarity between the pair of attribute components is equal to a second value indicating that the attributes are dissimilar.
5. The method of claim 1, wherein for each pair of attribute components of each two AS entities in the set of AS entities having the same type, in response to determining that the type of the pair of attribute components belongs to a list attribute, employing the following third algorithm to calculate a similarity between the pair of attribute components:
calculating the Jaccard similarity coefficient of the list of the pair of attribute components;
in response to determining that the calculated Jaccard similarity coefficient is greater than a second predetermined threshold, determining that a similarity between the pair of attribute components is equal to a first value indicative of attribute similarity;
in response to determining that the calculated Jaccard similarity coefficient is not greater than the second predetermined threshold, determining that the degree of similarity between the pair of attribute components is equal to a second value indicating that the attributes are dissimilar.
6. The method of any of claims 3-5, wherein for each two AS entities in the set of AS entities, determining a similarity between the two AS entities based on the attribute similarity vector comprises:
determining similarity between the two AS entities in response to determining that at least one component of the attribute similarity vector between the two AS entities is equal to the first value.
7. The method of claim 6, wherein for each two AS entities in the set of AS entities, determining a similarity between the two AS entities based on the attribute similarity vector further comprises:
determining dissimilarity between the two AS entities in response to determining that each component of the attribute similarity vector between the two AS entities is equal to the second value.
8. The method of any of claims 1-5, wherein the organizational merging of the plurality of AS entities comprises:
dividing the AS entity set into a plurality of similar AS entity sets according to the similarity between every two AS entities in the AS entity set, wherein each AS entity in each similar AS entity set is similar;
for each similar AS entity set, extracting name character strings of organizations to which the AS entities belong from the data of the multidimensional attributes of each AS entity in the similar AS entity set, and combining the extracted name character strings to serve AS identifiers of the organizations to which the AS entities belong in the similar AS entity set;
merging the AS entities with the same home organization identifier in the AS entity set.
9. An organizational structure mapping apparatus of an AS, comprising:
an obtaining module, configured to obtain data of multidimensional attributes of each AS entity in a plurality of AS entities to form an AS entity set, where the multidimensional attributes include a plurality of attribute components, and one of the plurality of attribute components indicates an organization to which the AS entity belongs;
a similarity determining module, configured to calculate, for each two AS entities in the AS entity set, a similarity between each pair of attribute components of the two AS entities by using an algorithm corresponding to the type of the pair of attribute components to obtain an attribute similarity vector between the two AS entities, and determine a similarity between the two AS entities based on the attribute similarity vector;
and the mapping module is used for merging the organization mechanisms of the AS entities according to the similarity between every two AS entities in the AS entity set so AS to realize the organization mechanism mapping of the AS entities.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor, the processor implementing the method of any one of claims 1 to 8 when executing the computer program.
CN202110554627.1A 2021-05-20 2021-05-20 Organization mapping method and device of autonomous system AS and electronic equipment Pending CN113312895A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110554627.1A CN113312895A (en) 2021-05-20 2021-05-20 Organization mapping method and device of autonomous system AS and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110554627.1A CN113312895A (en) 2021-05-20 2021-05-20 Organization mapping method and device of autonomous system AS and electronic equipment

Publications (1)

Publication Number Publication Date
CN113312895A true CN113312895A (en) 2021-08-27

Family

ID=77373794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110554627.1A Pending CN113312895A (en) 2021-05-20 2021-05-20 Organization mapping method and device of autonomous system AS and electronic equipment

Country Status (1)

Country Link
CN (1) CN113312895A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114978932A (en) * 2022-05-20 2022-08-30 深信服科技股份有限公司 Fault case recommendation method and device and computer-readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190007371A1 (en) * 2017-06-30 2019-01-03 Microsoft Technology Licensing, Llc MAPPING IPv4 KNOWLEDGE TO IPv6
US20190303459A1 (en) * 2018-03-29 2019-10-03 International Business Machines Corporation Similarity-based clustering search engine
CN110427406A (en) * 2019-08-10 2019-11-08 吴诚诚 The method for digging and device of organization's related personnel's relationship
CN111130876A (en) * 2019-12-20 2020-05-08 北京邮电大学 Method and device for displaying three-dimensional geographic space of autonomous domain system
CN112632954A (en) * 2020-12-29 2021-04-09 中译语通科技股份有限公司 Method and device for acquiring technical similarity of mechanisms

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190007371A1 (en) * 2017-06-30 2019-01-03 Microsoft Technology Licensing, Llc MAPPING IPv4 KNOWLEDGE TO IPv6
US20190303459A1 (en) * 2018-03-29 2019-10-03 International Business Machines Corporation Similarity-based clustering search engine
CN110427406A (en) * 2019-08-10 2019-11-08 吴诚诚 The method for digging and device of organization's related personnel's relationship
CN111130876A (en) * 2019-12-20 2020-05-08 北京邮电大学 Method and device for displaying three-dimensional geographic space of autonomous domain system
CN112632954A (en) * 2020-12-29 2021-04-09 中译语通科技股份有限公司 Method and device for acquiring technical similarity of mechanisms

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
悦光阴: "数据挖掘 第二篇:基于距离评估数据的相似性和相异性", 《博客园HTTPS://CNBLOGS.COM/LJHDO/P/4876877.HTML》 *
李阳等: "知识图谱中实体相似度计算研究", 《中文信息学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114978932A (en) * 2022-05-20 2022-08-30 深信服科技股份有限公司 Fault case recommendation method and device and computer-readable storage medium
CN114978932B (en) * 2022-05-20 2024-05-24 深信服科技股份有限公司 Fault case recommendation method, device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN109347787B (en) Identity information identification method and device
CN110162695B (en) Information pushing method and equipment
CN110717076B (en) Node management method, device, computer equipment and storage medium
US7765505B2 (en) Design rule management method, design rule management program, rule management apparatus and rule verification apparatus
CN104077723B (en) A kind of social networks commending system and method
CN112463991B (en) Historical behavior data processing method and device, computer equipment and storage medium
CN109918678B (en) Method and device for identifying field meaning
CN110263104B (en) JSON character string processing method and device
US9355166B2 (en) Clustering signifiers in a semantics graph
CN114138246B (en) Topology automatic generation method, device, computing equipment and storage medium
CN106156126A (en) Process the data collision detection method in data task and server
JP6244992B2 (en) Configuration information management program, configuration information management method, and configuration information management apparatus
CN114640590B (en) Method for detecting conflict of policy set in intention network and related equipment
JP7292368B2 (en) A non-transitory computer-readable storage medium storing a method for identifying a device using attributes and location signatures from the device, a server of uniquely generated identifiers for the method, and a sequence of instructions for the method
CN113312895A (en) Organization mapping method and device of autonomous system AS and electronic equipment
CN114238767A (en) Service recommendation method and device, computer equipment and storage medium
CN111431962B (en) Cross-domain resource access Internet of things service discovery method based on context awareness calculation
CN116225690A (en) Memory multidimensional database calculation load balancing method and system based on docker
CN116186337A (en) Business scene data processing method, system and electronic equipment
CN113220949B (en) Construction method and device of private data identification system
CN112364181A (en) Insurance product matching degree determination method and device
JP7482159B2 (en) Computer system and security risk impact analysis method
CN115982508B (en) Heterogeneous information network-based website detection method, electronic equipment and medium
CN112016081B (en) Method, device, medium and electronic equipment for realizing identifier mapping
CN113065071B (en) Product information recommendation method and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210827

RJ01 Rejection of invention patent application after publication