CN113312895A - Organization mapping method and device of autonomous system AS and electronic equipment - Google Patents
Organization mapping method and device of autonomous system AS and electronic equipment Download PDFInfo
- Publication number
- CN113312895A CN113312895A CN202110554627.1A CN202110554627A CN113312895A CN 113312895 A CN113312895 A CN 113312895A CN 202110554627 A CN202110554627 A CN 202110554627A CN 113312895 A CN113312895 A CN 113312895A
- Authority
- CN
- China
- Prior art keywords
- entities
- similarity
- attribute
- pair
- entity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000008520 organization Effects 0.000 title claims abstract description 95
- 238000013507 mapping Methods 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 52
- 239000013598 vector Substances 0.000 claims abstract description 53
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 30
- 230000007246 mechanism Effects 0.000 claims abstract description 26
- 230000004044 response Effects 0.000 claims description 27
- 238000012545 processing Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000004891 communication Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008531 maintenance mechanism Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/84—Mapping; Conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The disclosure provides an organization mapping method and device of an autonomous system AS and electronic equipment. The method comprises the following steps: acquiring data of multidimensional attributes of each AS entity in a plurality of AS entities to form an AS entity set, wherein the multidimensional attributes comprise a plurality of attribute components, and one of the attribute components indicates an organization mechanism to which the AS entity belongs; for each pair of attribute components with the same type of every two AS entities in the AS entity set, adopting an algorithm corresponding to the type of the pair of attribute components to calculate the similarity between the pair of attribute components so AS to obtain an attribute similarity vector between the two AS entities; determining similarity between the two AS entities based on the attribute similarity vector; and according to the similarity between every two AS entities in the AS entity set, merging the organization mechanisms of the AS entities to realize the organization mechanism mapping of the AS entities. Therefore, the accuracy of the organization mechanism mapping of the autonomous system AS can be improved, and the situations of misjudgment and missed judgment can be reduced.
Description
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to an organization mapping method and apparatus for an autonomous system AS, and an electronic device.
Background
An Autonomous System (AS) is a basic component unit of a network space and a carrier of information resources, and is interconnected with a route transmission device through a Border Gateway Protocol (BGP) to implement global network intercommunication, thereby implementing communication of different organizations. Different technical management departments manage and operate different autonomous domains, each autonomous domain has a specific organization to which the autonomous domain belongs, similarly, each organization can simultaneously have a plurality of autonomous systems obtained by applying for AS numbers, the AS of the same organization has certain similarity in network management and safety protection strategies, an organization mapping topology of the AS is formed, the organization mapping topology reflects the alliance governance relationship of the autonomous domains to a certain extent, and a covering network for governing the network space is formed.
However, the conventional autonomous domain organization mapping is limited to individual attributes, and the definition of the organization is limited to a single business entity, and there are cases where misjudgment and misjudgment are made regarding the autonomous domain assignment of a transnational organization and an umbrella-type organization. In addition, the information of the autonomous system is diversified in the internet registration mechanism, and the situations of data ambiguity and inconsistency exist, which brings certain difficulty for the mapping of the autonomous domain organization mechanism.
Disclosure of Invention
In view of the above, the present disclosure is directed to an organization mapping method and apparatus for an autonomous system AS, and an electronic device, which can solve or partially solve the above problems.
Based on the above object, a first aspect of the present disclosure provides an organization mapping method for an autonomous system AS, including:
acquiring data of multidimensional attributes of each AS entity in a plurality of AS entities to form an AS entity set, wherein the multidimensional attributes comprise a plurality of attribute components, and one of the attribute components indicates an organization mechanism to which the AS entity belongs;
for each two AS entities in the set of AS entities,
for each pair of attribute components with the same type of the two AS entities, calculating the similarity between the pair of attribute components by adopting an algorithm corresponding to the type of the pair of attribute components to obtain an attribute similarity vector between the two AS entities;
determining similarity between the two AS entities based on the attribute similarity vector;
according to the similarity between every two AS entities in the AS entity set, conducting organization and organization merging on the AS entities so AS to achieve organization and organization mapping of the AS entities.
A second aspect of the present disclosure provides an AS entity organizational structure mapping apparatus based on entity similarity, including:
an obtaining module, configured to obtain data of multidimensional attributes of each AS entity in a plurality of AS entities to form an AS entity set, where the multidimensional attributes include a plurality of attribute components, and one of the plurality of attribute components indicates an organization to which the AS entity belongs;
a similarity determining module, configured to calculate, for each two AS entities in the AS entity set, a similarity between each pair of attribute components of the two AS entities by using an algorithm corresponding to the type of the pair of attribute components, so AS to obtain an attribute similarity vector between the two AS entities; determining similarity between the two AS entities based on the attribute similarity vector;
and the mapping module is used for merging the organization mechanisms of the AS entities according to the similarity between every two AS entities in the AS entity set so AS to realize the organization mechanism mapping of the AS entities.
A third aspect of the disclosure provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor, the processor implementing the method according to the first aspect when executing the computer program.
A fourth aspect of the present disclosure provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect.
AS can be seen from the foregoing, the organizational structure mapping method, apparatus, and electronic device for an autonomous system AS provided by the present disclosure can calculate the similarity of two AS entities with respect to various attribute components according to the types of various attribute components in the obtained multidimensional attribute data of the AS entities, determine an attribute similarity vector between the two AS entities according to each calculated similarity, determine whether the two AS entities are similar according to the attribute similarity vector, and then map the organization of the similar AS entities together. Therefore, the accuracy of the organization mechanism mapping of the autonomous system AS can be improved, and the situations of misjudgment and missed judgment caused by low accuracy of the organization mechanism mapping of the autonomous system AS are avoided.
Drawings
In order to more clearly illustrate the technical solutions in the present disclosure or related technologies, the drawings needed to be used in the description of the embodiments or related technologies are briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of an organizational structure mapping method of an autonomous system AS according to an embodiment of the present disclosure;
FIG. 2 is an expanded schematic diagram illustrating the steps 100 of an organization mapping method of an autonomous system AS according to an embodiment of the disclosure;
FIG. 3 is a schematic diagram illustrating an execution flow of a first algorithm of an organization mapping method of an autonomous system AS according to an embodiment of the disclosure;
FIG. 4 is a schematic diagram illustrating an execution flow of a second algorithm of the organization mapping method of the autonomous system AS according to the embodiment of the disclosure;
FIG. 5 is a schematic diagram illustrating an execution flow of a third algorithm of an organization mapping method of an autonomous system AS according to an embodiment of the disclosure;
FIG. 6 is an expanded schematic diagram illustrating the steps 300 of a method for organizational structure mapping of an autonomous system AS according to an embodiment of the disclosure;
FIG. 7 is a flowchart of an organizational structure mapping method of an autonomous system AS according to another embodiment of the disclosure;
FIG. 8 is a block diagram illustrating an organization mapping apparatus of an autonomous system AS according to an embodiment of the present disclosure;
fig. 9 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
It is to be noted that technical terms or scientific terms used in the embodiments of the present disclosure should have a general meaning as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of the terms "comprising" or "including" and the like in the embodiments of the present disclosure is intended to mean that the elements or items listed before the term cover the elements or items listed after the term and their equivalents, without excluding other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, direct connections, indirect connections, wired connections, and wireless connections.
The AS entity has multidimensional attributes including an AS name, an organization, a country of affiliation, a description, a technical contact, an administrative contact, a contact telephone, a contact mailbox, an organization name of an affiliation, a technical contact, an administrative contact and the like, and the multidimensional attributes of the autonomous domain reflect the clustering relation of the AS entity to a certain extent. The Internet Registry (RIR) is an organization that manages Internet resources in a particular region of the world, including the autonomous domain, IP addresses, and routing registration information described above. The multidimensional attributes of the autonomous system may be obtained from an internet registry.
The autonomous system topology abstracts the AS entity level topology into an undirected graph, wherein each AS entity is regarded AS a single point in the graph, the interconnection relationship among the AS entities through a border gateway is regarded AS an edge in the graph, and the importance degree and the hierarchical relationship of the AS entities are shown by the degree of the node and the business relationship among the AS entities. In order to provide more accurate AS entity network topology models for Internet researchers, a topology modeling researcher successively provides a large number of topology models, such AS a random network model, a hierarchical model, a small world model, a power law model, a local world model and the like. Based on the topology models, a plurality of AS entity level topology generation algorithms are generated, including a tree layout algorithm, a grid layout algorithm and a force guidance layout algorithm, and the problems of reasonable layout and calculation efficiency of AS entity topology nodes are solved.
In the related art, there are two methods for mapping organizations of the autonomous system, one is that the registration organization fills data information by itself, such as pch (packet Clearing house), which is an international organization responsible for providing operation support and security guarantee for key internet infrastructure, including the core of internet switching points and domain name systems; and the other method is to realize organizational structure mapping through AS entity clustering, obtain autonomous system data from an RIR WHOIS database, firstly, create an object for each AS entity in each database, then consider other objects linked to a given AS entity in the RIR database, allocate the fields of the objects to the specified AS entity objects, and finally, analyze the similarity between the objects by using a machine-based learning algorithm to complete the organizational structure mapping of the autonomous system AS.
However, the autonomous domain organization mapping is limited to individual attributes, the organization definition is limited to a single business entity, and misjudgment may occur in regard to the autonomous domain attribution of a transnational organization or an umbrella-type organization. In addition, the information of the autonomous system is diversified in the internet registration mechanism, and the situations of data ambiguity and inconsistency exist, which brings certain difficulty for the mapping of the autonomous domain organization mechanism.
AS shown in fig. 1, the organization mapping method of an autonomous system AS provided in this embodiment includes:
In this step, the entity attribute information of the AS entity generally includes a plurality of types, each of which constitutes data whose one dimension forms a multidimensional attribute. The data of the multi-dimensional attributes includes at least one of: AS name, organization, country of ownership, description, technical contact, administrative contact, contact phone, contact mailbox, and organization name of the affiliate, technical contact, administrative contact, etc.
And performing centralized integration on the acquired data of the multidimensional attributes of each AS entity, performing listing according to the AS names, filling the acquired attribute information of each dimension of the AS entity into a table by taking the AS names AS a first column and taking other dimensions and each dimension AS a column to form an AS entity set.
In step 200, for each two AS entities in the set of AS entities, the similarity between the two AS entities is determined.
The method specifically comprises the following steps:
for each pair of attribute components with the same type of every two AS entities in the AS entity set, adopting an algorithm corresponding to the type of the pair of attribute components to calculate the similarity between the pair of attribute components so AS to obtain an attribute similarity vector between the two AS entities; based on the attribute similarity vector, the similarity between the two AS entities is determined.
In this step, similar AS entities are grouped into a similar group according to the similarity, so that a plurality of similar groups can be obtained, wherein the AS entities in the similar group cannot be repeated, but the AS entities between the similar groups can be repeated. Each similarity group is the result of the organizational structure mapping of a plurality of AS entities.
By the scheme, the organizational structure mapping method, the organizational structure mapping device and the electronic equipment of the autonomous system AS can calculate the similarity of the two AS entities aiming at various attribute components according to the types of the various attribute components in the obtained data of the multidimensional attributes of the AS entities, determine the attribute similarity vector between the two AS entities according to the calculated similarity, further determine whether the two AS entities are similar according to the attribute similarity vector, and map the organization structures of the similar AS entities together. Therefore, the accuracy of the organization mechanism mapping of the AS entity can be improved, and the situations of misjudgment and missed judgment caused by low accuracy of the organization mechanism mapping of the AS entity are avoided.
In a specific embodiment, as shown in fig. 2, step 100 specifically includes:
The number of the character string attribute, the text attribute and the list attribute information is at least one.
And step 120, setting missing attribute components in the multidimensional attribute to be null values, and carrying out normalization processing on the multidimensional attribute to form an AS entity set.
In this step, in order to facilitate subsequent calculation processing of the multidimensional attribute data, the missing part is replaced with a null value, which may be a space character or "0".
And performing list storage on the acquired character string attributes, text attributes and list attribute information according to the arrangement sequence (which can be arranged randomly or arranged according to initials) of the AS entities to form an AS entity set.
Or constructing a corresponding table for each AS entity, adding corresponding character string attributes, text attributes and list attribute information to the corresponding AS entity table, and integrating and storing all AS entity tables in a folder to form an AS entity set.
Through the scheme, the acquired data with the multidimensional attribute is classified, so that the similarity can be calculated according to the corresponding classes, and the calculated similarity is more accurate.
In a specific embodiment, in step 200, for each pair of attribute components of every two AS entities in the set of AS entities, which have the same type, in response to determining that the type of the pair of attribute components belongs to the character string attribute, the following first algorithm is used to calculate the similarity between the pair of attribute components, AS shown in fig. 3, where the calculating step of the first algorithm includes:
in step 211, it is determined whether the character strings of the pair of attribute components are the same, if yes, step 212 is performed, otherwise step 213 is performed.
At step 212, it is determined that the similarity between the pair of attribute components is equal to a first value indicating that the attributes are similar.
It is determined that the similarity between the pair of attribute components is equal to a second value indicating that the attributes are dissimilar, step 213.
The corresponding string attribute includes at least one of: country affiliation, organization information, managers, technical contacts, route maintenance organizations.
In a specific embodiment, in step 200, for each pair of attribute components of every two AS entities in the set of AS entities with the same type, in response to determining that the type of the pair of attribute components belongs to a text attribute, the following second algorithm is used to calculate the similarity between the pair of attribute components, AS shown in fig. 4, where the second algorithm calculating step includes:
In step 222, the cosine similarity between the two TF-IDF vectors is calculated.
at step 226, it is determined that the similarity between the pair of attribute components is equal to a second value indicating that the attributes are dissimilar.
In the above step, the related information of the corresponding text attribute includes at least one of: AS name, description information; the description information is a text description of the AS entity.
The text attributes of the two AS entities are subjected to word segmentation in advance, and the adverbs and the null words in the words are deleted (for example, "of", "etc.").
And then calculating TFIDF scores of all words of one AS entity, wherein the TFIDF scores are divided into two parts: TF value and IDF value.
Wherein, the TF value calculation formula is as follows: tf (t) is the number of times the term t appears in the document/total number of terms in the document.
The IDF value is calculated as: idf (t) ln (total number of documents/total number of documents in which the word t appears). And t is the sequence number of the corresponding word in the text attribute.
And combining the TF value and the IDF value to be used AS a TFIDF score, and integrating the TFIDF scores of all words in the text attribute of the AS entity to form a TF-IDF vector.
Thus, two TF-IDF vectors of the text attributes of the two AS entities needing similarity judgment can be obtained, wherein the two TF-IDF vectors are respectively AS1iAnd AS2i。
The cosine similarity between two TF-IDF vectors is calculated as follows:
wherein, AS1iAnd AS2iIs a TF-IDF vector of two AS entities, n is the number of text attributes, and i belongs to n.
Cosine similarity measures the similarity between two vectors by measuring their cosine values of their angle. Since the value interval of the normalized cosine is [0,1], the corresponding first predetermined threshold is a value (e.g., 0.7) in the value interval, and a specific value may be selected according to an actual situation, which is not specifically limited herein.
The cosine similarity of the text attributes of the two AS entities is calculated based on the TF-IDF vector, so that the text attribute similarity of the two AS entities determined based on the first preset threshold value can be more fit with the actual situation, and the accuracy is higher.
In a specific embodiment, in step 200, for each pair of attribute components of every two AS entities in the set of AS entities, which have the same type, in response to determining that the type of the pair of attribute components belongs to the list attribute, the following third algorithm is used to calculate the similarity between the pair of attribute components, AS shown in fig. 5, where the calculating step of the third algorithm includes:
In step 232, it is determined whether the calculated Jaccard similarity coefficient is greater than a second predetermined threshold, if yes, step 233 is performed, otherwise step 234 is performed.
At step 233, it is determined that the similarity between the pair of attribute components is equal to a first value indicating that the attributes are similar.
At step 234, it is determined that the similarity between the pair of attribute components is equal to a second value indicating that the attributes are dissimilar.
In the above step, the related information of the corresponding list attribute includes at least one of: route input and route output. List attribute information A and B of every two AS entities in the AS entity set are obtained. Calculating the Jaccard similarity coefficient of A and B according to the following formula, and taking the Jaccard similarity coefficient AS the list attribute similarity of the two AS entities:
where sim (a, B) is the list attribute similarity, and Jaccard (a, B) is the Jaccard similarity coefficient. The Jaccard similarity coefficient is used to compare similarity and difference between limited sample sets. The larger the Jaccard coefficient value, the higher the sample similarity. The value of the Jaccard similarity coefficient is that Jaccard (A, B) is less than or equal to 1.
By the scheme, the information quantity of the list attributes is large, the data is large, and if each piece of data is compared in sequence, time is wasted through comparison, so that whether the list attribute information of the two AS entities is similar or not is judged by using the Jaccard similarity coefficient, workload can be reduced, and meanwhile, whether the list attribute information of the two AS entities is similar or not can be compared better.
In a specific embodiment, for each two AS entities in the set of AS entities, determining the similarity between the two AS entities based on the attribute similarity vector comprises:
determining similarity between the two AS entities in response to determining that at least one component of the attribute similarity vector between the two AS entities is equal to the first value.
In a specific embodiment, for each two AS entities in the set of AS entities, determining the similarity between the two AS entities based on the attribute similarity vector further includes:
determining that the two AS entities are dissimilar in response to determining that each component of the attribute similarity vector between the two AS entities is equal to the second value.
In the above scheme, the similarity of the character string attributes, and/or the similarity of the text attributes, and/or the similarity of the list attributes are integrated to obtain the attribute similarity vectors of the two ASs. Based on the attribute similarity vector, the similarity between the two AS entities is determined.
Wherein, the first value takes the value of '1', and the second value takes the value of '0'. A value of "0" indicates "dissimilar", and a value of "1" indicates "similar". Thus, the attribute similarity vector can be obtained according to the corresponding value. And if the attribute similarity vector has at least one value of '1', determining the similarity of the two AS entities AS 'similar', otherwise, determining the similarity of the two AS entities AS 'dissimilar'.
For example, the obtained attribute similarity vector of the two AS entities is (0, 1, 1), where there are two "1" s and thus the two AS entities are similar.
In addition, adding corresponding weight values, and integrating numerical values obtained by multiplying the similarity of the character string attributes, the similarity of the text attributes and the similarity of the list attributes by the corresponding weight values to obtain an attribute similarity vector. The matched weight value can be set by self according to the contribution degree of the similarity of the corresponding character string attributes, the similarity of the text attributes and the similarity of the list attributes to the similarity of the two AS entities, and the weight value is a real number larger than 0.
In a specific embodiment, as shown in fig. 6, step 300 specifically includes:
And 320, for each similar AS entity set, extracting the name character strings of the organization mechanism to which the AS entity belongs from the data of the multidimensional attribute of each AS entity in the similar AS entity set, and combining the extracted name character strings to serve AS the attributive organization mechanism identifier of each AS entity in the similar AS entity set.
In the above step, the extracted name character strings are combined, specifically, the name character strings are accumulated to obtain an accumulated value, or the name character strings are arranged and combined to form a combined character string; and taking the accumulated value or the combined character string AS the home organization identifier of each AS entity in the similar AS entity set.
The organization mapping can be completed after the organization identifiers are merged, so that whether the two AS entities are similar can be judged only according to the corresponding organization identifiers when the subsequent identification judgment is carried out. Plays a crucial role in detecting the hijacking event of the router.
Wherein each AS entity in the set of AS entities may be correspondingly tagged with one or more organizational identifiers.
For example, for a route hijacking event detected by a route monitoring system, the affected AS entity and the attacking AS entity probably belong to the same organization, and the similarity between the two AS entities is effectively identified based on the entity similarity calculation, so that the false hijacking judgment caused by the fact that the system cannot identify the AS entities can be avoided.
The organization mapping method of the autonomous system AS proposed in this embodiment has the following flow, AS shown in fig. 7, and its main technical solution is:
step A, obtaining multi-dimensional attributes of AS entities from an Internet registration mechanism to form an AS entity set, and carrying out standardization processing on each attribute of the AS entities.
And step B, taking any two AS entities from the AS entity set AS an AS entity pair.
And C, calculating the similarity of the selected AS entity pair, and calculating the similarity of the two AS entities according to the similarity.
And D, judging whether an AS entity pair without the calculated similarity exists in the AS entity set, if so, returning to the step C, otherwise, entering the step E.
And E, according to the similarity of the AS entities, merging the organization mechanisms of the AS entities to realize the organization mechanism mapping of the AS entities.
The step A specifically comprises the following steps:
A1. obtaining multidimensional information of an AS entity from an Internet registration mechanism RIR (regional Internet registration), and forming multidimensional attributes of the AS entity, wherein the multidimensional attributes comprise an AS name, a country attribution, an organization name, description information, an administrator, a technical contact, a route maintenance mechanism, route input and route output, the country attribution, the organization information, the administrator, the technical contact and the route maintenance mechanism are character string attributes, the AS name and the description information are text attributes, and the route input and the route output are list attributes.
A2. And setting the missing attribute of the AS entity AS a null value to obtain all AS entity sets.
And C, calculating the similarity of the attributes corresponding to the two selected AS entities, wherein the calculation process is AS follows:
C1. if the attributes corresponding to the two AS entities are identifier attributes and the character string attributes are the same, the attributes corresponding to the two AS entities are similar, otherwise, the attributes are not similar.
C2. Aiming at the text attributes of the two AS entities, the attribute similarity calculation method comprises the following steps:
C21. and performing word segmentation on the text corresponding to the AS entity attribute to obtain the TF-IDF vector of each AS entity text attribute.
C22. Cosine similarity is used for measuring similarity between the two n-dimensional TF-IDF vectors of the AS entity 1 and the AS entity 2, and the cosine similarity between the two n-dimensional TF-IDF vectors can be calculated by the following formula:
C23. the cosine similarity value is normalized to be the number of the interval [0,1 ].
C24. If the similarity value is larger than 0.7, the corresponding attributes of the two AS entities are similar, otherwise, the two AS entities are not similar.
C3. For the list attributes of two AS entities, a Jaccard similarity coefficient is calculated to measure the similarity of the corresponding attributes, for example, the similarity formula of the attribute a and the attribute B is AS follows:
C4. and after similarity calculation is carried out on each corresponding attribute, attribute similarity vectors of the two AS entities are obtained, the value corresponding to each component in the vectors is 0 or 1, 0 represents that the attributes are not similar, and 1 represents similarity.
C5. If all components of the two AS entity attribute similarity vectors are 0, the two AS entities are not similar, otherwise the two AS entities are similar.
The step E specifically comprises the following steps:
E1. and acquiring a similar AS entity set of each AS entity, extracting the organization name of each AS entity in the similar AS entity set to form a similar AS entity organization name set of the AS entity, and adding the structure name character strings in the set to be used AS the attributive organization identifier of the AS entity and each AS entity in the similar AS entity set.
E2. And merging the AS entities with the same organization identifier in the AS entity set, and mapping the organizations of the AS entities.
It should be noted that the method of the embodiments of the present disclosure may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may only perform one or more steps of the method of the embodiments of the present disclosure, and the devices may interact with each other to complete the method.
It should be noted that the above describes some embodiments of the disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the same inventive concept, corresponding to the method of any embodiment, the disclosure also provides an organization mapping device of the autonomous system AS.
Referring to fig. 8, the apparatus includes:
an obtaining module 21, configured to obtain data of multidimensional attributes of each AS entity in multiple AS entities to form an AS entity set, where the multidimensional attributes include multiple attribute components, and one of the multiple attribute components indicates an organization to which the AS entity belongs;
a similarity determining module 22, configured to calculate, for each two AS entities in the AS entity set, a similarity between each pair of attribute components of the two AS entities by using an algorithm corresponding to the type of the pair of attribute components, so AS to obtain an attribute similarity vector between the two AS entities; determining similarity between the two AS entities based on the attribute similarity vector;
the mapping module 23 is configured to perform organization and organization merging on the multiple AS entities according to similarity between every two AS entities in the AS entity set, so AS to implement organization and organization mapping of the multiple AS entities.
In a specific embodiment, the obtaining module 21 specifically includes:
the system comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for acquiring data of multidimensional attributes of each AS entity in a plurality of AS entities from an Internet registration mechanism;
and the completion normalization unit is used for performing normalization processing on the multidimensional attribute by setting the missing attribute components in the multidimensional attribute AS null values so AS to form an AS entity set.
In a specific embodiment, for each pair of attribute components of every two AS entities in the set of AS entities, where the type of the pair of attribute components is the same, in response to determining that the type of the pair of attribute components belongs to the character string attribute, a first algorithm unit is employed to calculate a similarity between the pair of attribute components, where the first algorithm unit is specifically configured to:
in response to determining that the strings of the pair of attribute components are the same, determining that a similarity between the pair of attribute components is equal to a first value indicating that the attributes are similar; in response to determining that the strings of the pair of attribute components are not the same, determining that the similarity between the pair of attribute components is equal to a second value indicating that the attributes are not similar.
In a specific embodiment, for each pair of attribute components of every two AS entities in the set of AS entities, where the type of the pair of attribute components is the same, in response to determining that the type of the pair of attribute components belongs to a text attribute, a second algorithm unit is employed to calculate a similarity between the pair of attribute components, where the second algorithm unit is specifically configured to:
respectively carrying out word segmentation on the texts of the attribute components to obtain two word frequency-inverse document frequency TF-IDF vectors respectively corresponding to the attribute components; calculating cosine similarity between the two TF-IDF vectors; normalizing the calculated cosine similarity; in response to determining that the cosine similarity after the normalization process is greater than a first predetermined threshold, determining that the similarity between the pair of attribute components is equal to a first value representing that the attributes are similar; in response to determining that the normalized cosine similarity is not greater than a first predetermined threshold, determining that the similarity between the pair of attribute components is equal to a second value indicating that the attributes are dissimilar.
In a specific embodiment, for each pair of attribute components of every two AS entities in the set of AS entities, where the type of the pair of attribute components is the same, in response to determining that the type of the pair of attribute components belongs to the list attribute, a third algorithm unit is employed to calculate a similarity between the pair of attribute components, where the third algorithm unit is specifically configured to:
calculating the Jaccard similarity coefficient of the list of the pair of attribute components; in response to determining that the calculated Jaccard similarity coefficient is greater than a second predetermined threshold, determining that the similarity between the pair of attribute components is equal to a first value indicative of the attributes being similar; in response to determining that the calculated Jaccard similarity coefficient is not greater than the second predetermined threshold, determining that the degree of similarity between the pair of attribute components is equal to a second value indicating that the attributes are dissimilar.
In a specific embodiment, the similarity determination module 22 is specifically configured to:
determining similarity between the two AS entities in response to determining that at least one component of the attribute similarity vector between the two AS entities is equal to the first value.
In a specific embodiment, the similarity determination module 22 is further specifically configured to:
determining that the two AS entities are dissimilar in response to determining that each component of the attribute similarity vector between the two AS entities is equal to the second value.
In a specific embodiment, the mapping unit specifically includes:
and dividing the AS entity set into a plurality of similar AS entity sets according to the similarity between every two AS entities in the AS entity set, wherein each AS entity in each similar AS entity set is similar.
And the extracting unit is used for extracting the name character string of the organization mechanism to which the AS entity belongs from the data of the multidimensional attribute of each AS entity in each similar AS entity set.
And a combining unit, configured to combine the extracted name strings AS a home organization identifier of each AS entity in the similar AS entity set.
And the merging unit is used for merging the AS entities with the same attribution organization identifiers in the AS entity set.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the various modules may be implemented in the same one or more software and/or hardware implementations of the present disclosure.
The apparatus of the foregoing embodiment is used to implement the corresponding organization mapping method of the autonomous system AS in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above embodiments, the present disclosure further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and operable on the processor, and when the processor executes the program, the organization mapping method of the autonomous system AS according to any of the above embodiments is implemented.
Fig. 9 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The input/output/module may be configured as a component within the device (not shown in fig. 9) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in fig. 9) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The electronic device of the foregoing embodiment is used to implement the corresponding organization mapping method of the autonomous system AS in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above-described embodiment methods, the present disclosure also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the organization mapping method of the autonomous system AS according to any of the above embodiments.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
The computer instructions stored in the storage medium of the foregoing embodiment are used to enable the computer to execute the organization mapping method of the autonomous system AS according to any embodiment, and have the beneficial effects of corresponding method embodiments, and are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the present disclosure, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present disclosure as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the disclosure. Furthermore, devices may be shown in block diagram form in order to avoid obscuring embodiments of the present disclosure, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the present disclosure are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that the embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made within the spirit and principles of the embodiments of the disclosure are intended to be included within the scope of the disclosure.
Claims (10)
1. An organization mapping method of an Autonomous System (AS), comprising the following steps:
acquiring data of multidimensional attributes of each AS entity in a plurality of AS entities to form an AS entity set, wherein the multidimensional attributes comprise a plurality of attribute components, and one of the attribute components indicates an organization mechanism to which the AS entity belongs;
for each two AS entities in the set of AS entities,
for each pair of attribute components with the same type of the two AS entities, calculating the similarity between the pair of attribute components by adopting an algorithm corresponding to the type of the pair of attribute components to obtain an attribute similarity vector between the two AS entities;
determining similarity between the two AS entities based on the attribute similarity vector;
according to the similarity between every two AS entities in the AS entity set, conducting organization and organization merging on the AS entities so AS to achieve organization and organization mapping of the AS entities.
2. The method of claim 1, wherein the obtaining data of the multidimensional attributes of each of the plurality of AS entities to form the set of AS entities comprises:
obtaining data of the multidimensional attribute of each of the plurality of AS entities from an Internet registry;
and setting the missing attribute components in the multidimensional attribute AS null values, and carrying out normalized processing on the multidimensional attribute to form the AS entity set.
3. The method of claim 1, wherein, for each pair of attribute components of every two AS entities in the set of AS entities having the same type, in response to determining that the type of the pair of attribute components belongs to a string attribute, employing the following first algorithm to calculate a similarity between the pair of attribute components:
in response to determining that the strings of the pair of attribute components are the same, determining that a similarity between the pair of attribute components is equal to a first value indicating that the attributes are similar;
in response to determining that the strings of the pair of attribute components are not the same, determining that the similarity between the pair of attribute components is equal to a second value indicating that the attributes are not similar.
4. The method of claim 1, wherein, for each pair of attribute components of every two AS entities in the set of AS entities having the same type, in response to determining that the type of the pair of attribute components belongs to a text attribute, employing the following second algorithm to calculate a similarity between the pair of attribute components:
respectively carrying out word segmentation on the texts of the attribute components to obtain two word frequency-inverse document frequency TF-IDF vectors respectively corresponding to the attribute components;
calculating cosine similarity between the two TF-IDF vectors;
normalizing the calculated cosine similarity;
in response to determining that the cosine similarity after the normalization process is greater than a first predetermined threshold, determining that the similarity between the pair of attribute components is equal to a first value representing attribute similarity;
in response to determining that the cosine similarity after the normalization process is not greater than the first predetermined threshold, determining that the similarity between the pair of attribute components is equal to a second value indicating that the attributes are dissimilar.
5. The method of claim 1, wherein for each pair of attribute components of each two AS entities in the set of AS entities having the same type, in response to determining that the type of the pair of attribute components belongs to a list attribute, employing the following third algorithm to calculate a similarity between the pair of attribute components:
calculating the Jaccard similarity coefficient of the list of the pair of attribute components;
in response to determining that the calculated Jaccard similarity coefficient is greater than a second predetermined threshold, determining that a similarity between the pair of attribute components is equal to a first value indicative of attribute similarity;
in response to determining that the calculated Jaccard similarity coefficient is not greater than the second predetermined threshold, determining that the degree of similarity between the pair of attribute components is equal to a second value indicating that the attributes are dissimilar.
6. The method of any of claims 3-5, wherein for each two AS entities in the set of AS entities, determining a similarity between the two AS entities based on the attribute similarity vector comprises:
determining similarity between the two AS entities in response to determining that at least one component of the attribute similarity vector between the two AS entities is equal to the first value.
7. The method of claim 6, wherein for each two AS entities in the set of AS entities, determining a similarity between the two AS entities based on the attribute similarity vector further comprises:
determining dissimilarity between the two AS entities in response to determining that each component of the attribute similarity vector between the two AS entities is equal to the second value.
8. The method of any of claims 1-5, wherein the organizational merging of the plurality of AS entities comprises:
dividing the AS entity set into a plurality of similar AS entity sets according to the similarity between every two AS entities in the AS entity set, wherein each AS entity in each similar AS entity set is similar;
for each similar AS entity set, extracting name character strings of organizations to which the AS entities belong from the data of the multidimensional attributes of each AS entity in the similar AS entity set, and combining the extracted name character strings to serve AS identifiers of the organizations to which the AS entities belong in the similar AS entity set;
merging the AS entities with the same home organization identifier in the AS entity set.
9. An organizational structure mapping apparatus of an AS, comprising:
an obtaining module, configured to obtain data of multidimensional attributes of each AS entity in a plurality of AS entities to form an AS entity set, where the multidimensional attributes include a plurality of attribute components, and one of the plurality of attribute components indicates an organization to which the AS entity belongs;
a similarity determining module, configured to calculate, for each two AS entities in the AS entity set, a similarity between each pair of attribute components of the two AS entities by using an algorithm corresponding to the type of the pair of attribute components to obtain an attribute similarity vector between the two AS entities, and determine a similarity between the two AS entities based on the attribute similarity vector;
and the mapping module is used for merging the organization mechanisms of the AS entities according to the similarity between every two AS entities in the AS entity set so AS to realize the organization mechanism mapping of the AS entities.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor, the processor implementing the method of any one of claims 1 to 8 when executing the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110554627.1A CN113312895A (en) | 2021-05-20 | 2021-05-20 | Organization mapping method and device of autonomous system AS and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110554627.1A CN113312895A (en) | 2021-05-20 | 2021-05-20 | Organization mapping method and device of autonomous system AS and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113312895A true CN113312895A (en) | 2021-08-27 |
Family
ID=77373794
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110554627.1A Pending CN113312895A (en) | 2021-05-20 | 2021-05-20 | Organization mapping method and device of autonomous system AS and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113312895A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114978932A (en) * | 2022-05-20 | 2022-08-30 | 深信服科技股份有限公司 | Fault case recommendation method and device and computer-readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190007371A1 (en) * | 2017-06-30 | 2019-01-03 | Microsoft Technology Licensing, Llc | MAPPING IPv4 KNOWLEDGE TO IPv6 |
US20190303459A1 (en) * | 2018-03-29 | 2019-10-03 | International Business Machines Corporation | Similarity-based clustering search engine |
CN110427406A (en) * | 2019-08-10 | 2019-11-08 | 吴诚诚 | The method for digging and device of organization's related personnel's relationship |
CN111130876A (en) * | 2019-12-20 | 2020-05-08 | 北京邮电大学 | Method and device for displaying three-dimensional geographic space of autonomous domain system |
CN112632954A (en) * | 2020-12-29 | 2021-04-09 | 中译语通科技股份有限公司 | Method and device for acquiring technical similarity of mechanisms |
-
2021
- 2021-05-20 CN CN202110554627.1A patent/CN113312895A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190007371A1 (en) * | 2017-06-30 | 2019-01-03 | Microsoft Technology Licensing, Llc | MAPPING IPv4 KNOWLEDGE TO IPv6 |
US20190303459A1 (en) * | 2018-03-29 | 2019-10-03 | International Business Machines Corporation | Similarity-based clustering search engine |
CN110427406A (en) * | 2019-08-10 | 2019-11-08 | 吴诚诚 | The method for digging and device of organization's related personnel's relationship |
CN111130876A (en) * | 2019-12-20 | 2020-05-08 | 北京邮电大学 | Method and device for displaying three-dimensional geographic space of autonomous domain system |
CN112632954A (en) * | 2020-12-29 | 2021-04-09 | 中译语通科技股份有限公司 | Method and device for acquiring technical similarity of mechanisms |
Non-Patent Citations (2)
Title |
---|
悦光阴: "数据挖掘 第二篇:基于距离评估数据的相似性和相异性", 《博客园HTTPS://CNBLOGS.COM/LJHDO/P/4876877.HTML》 * |
李阳等: "知识图谱中实体相似度计算研究", 《中文信息学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114978932A (en) * | 2022-05-20 | 2022-08-30 | 深信服科技股份有限公司 | Fault case recommendation method and device and computer-readable storage medium |
CN114978932B (en) * | 2022-05-20 | 2024-05-24 | 深信服科技股份有限公司 | Fault case recommendation method, device and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109347787B (en) | Identity information identification method and device | |
CN110162695B (en) | Information pushing method and equipment | |
CN110717076B (en) | Node management method, device, computer equipment and storage medium | |
US7765505B2 (en) | Design rule management method, design rule management program, rule management apparatus and rule verification apparatus | |
CN104077723B (en) | A kind of social networks commending system and method | |
CN112463991B (en) | Historical behavior data processing method and device, computer equipment and storage medium | |
CN109918678B (en) | Method and device for identifying field meaning | |
CN110263104B (en) | JSON character string processing method and device | |
US9355166B2 (en) | Clustering signifiers in a semantics graph | |
CN114138246B (en) | Topology automatic generation method, device, computing equipment and storage medium | |
CN106156126A (en) | Process the data collision detection method in data task and server | |
JP6244992B2 (en) | Configuration information management program, configuration information management method, and configuration information management apparatus | |
CN114640590B (en) | Method for detecting conflict of policy set in intention network and related equipment | |
JP7292368B2 (en) | A non-transitory computer-readable storage medium storing a method for identifying a device using attributes and location signatures from the device, a server of uniquely generated identifiers for the method, and a sequence of instructions for the method | |
CN113312895A (en) | Organization mapping method and device of autonomous system AS and electronic equipment | |
CN114238767A (en) | Service recommendation method and device, computer equipment and storage medium | |
CN111431962B (en) | Cross-domain resource access Internet of things service discovery method based on context awareness calculation | |
CN116225690A (en) | Memory multidimensional database calculation load balancing method and system based on docker | |
CN116186337A (en) | Business scene data processing method, system and electronic equipment | |
CN113220949B (en) | Construction method and device of private data identification system | |
CN112364181A (en) | Insurance product matching degree determination method and device | |
JP7482159B2 (en) | Computer system and security risk impact analysis method | |
CN115982508B (en) | Heterogeneous information network-based website detection method, electronic equipment and medium | |
CN112016081B (en) | Method, device, medium and electronic equipment for realizing identifier mapping | |
CN113065071B (en) | Product information recommendation method and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210827 |
|
RJ01 | Rejection of invention patent application after publication |