CN113032636A - Complete subgraph data searching method, device, equipment and medium - Google Patents

Complete subgraph data searching method, device, equipment and medium Download PDF

Info

Publication number
CN113032636A
CN113032636A CN201911352614.5A CN201911352614A CN113032636A CN 113032636 A CN113032636 A CN 113032636A CN 201911352614 A CN201911352614 A CN 201911352614A CN 113032636 A CN113032636 A CN 113032636A
Authority
CN
China
Prior art keywords
data table
node
nodes
order
connection relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911352614.5A
Other languages
Chinese (zh)
Inventor
李三川
谢笑娟
李金柱
吴丽丽
余韦
梁恩磊
杨猛
陶涛
徐海勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Information Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Information Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201911352614.5A priority Critical patent/CN113032636A/en
Publication of CN113032636A publication Critical patent/CN113032636A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The embodiment of the invention provides a method, a device, equipment and a medium for searching complete subgraph data. The method comprises the following steps: acquiring connection relation information of nodes in network nodes and storing the connection relation information as a first data table; determining an ith data table according to the first data table, wherein the ith data table comprises connection relation information of a first K-order node, the initial value of i is 2, and the initial value of K is 3; determining connection relation information of a second K-order node according to the ith data table and a first preset screening condition and storing the connection relation information as an (i + 1) th data table; determining an i +2 data table according to the i +1 data table and the first data table, wherein the i +2 data table comprises connection relation information of a first K + 1-order node; determining the connection relation information of a second K + 1-order node according to the i +2 data table and a second preset screening condition, and storing the connection relation information of the first K nodes in the second K + 1-order node as an i +3 data table, wherein any two nodes in the first K nodes are connected; when the (i + 3) th data table exists, i is equal to i +4, and K is equal to K +1, a complete subgraph can be rapidly acquired.

Description

Complete subgraph data searching method, device, equipment and medium
Technical Field
The present invention relates to the field of graph theory, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for searching complete sub-graph data.
Background
The complex network generally refers to a network with high complexity, such as an interpersonal relationship network, a traffic network, a communication network and the like, and a complete subgraph in the complex network refers to a sub-network in the complex network, which satisfies that any two nodes have a connection relationship. Complete subgraphs in a complex network have important practical significance, for example, in an interpersonal relationship network, artificial nodes are used, and complete subgraphs related pairwise represent friend groups closely related; or in the product part relation network, the part is taken as a node, any two nodes are connected to form an edge, and the node and the edge form a complete subgraph. The complete subgraph can effectively cluster the complex network, and the search of the complete subgraph is the basis of the cluster clustering of the complex network.
At present, the conventional complete subgraph search method needs to traverse all nodes twice per iteration, so that the time complexity of the search is rapidly increased along with the increase of the number of the nodes.
Disclosure of Invention
The embodiment of the invention provides a method, a device and equipment for searching complete subgraph data and a computer-readable storage medium, which can quickly search out complete subgraphs and reduce the time for finding out all complete subgraphs of a complex network.
In a first aspect, the present invention provides a method for searching complete subgraph data, including: acquiring connection relation information between each node and adjacent nodes in the network nodes, and storing the connection relation information as a first data table; determining an ith data table according to the first data table, wherein the ith data table is an extended data table of the first data table and comprises connection relation information of a first K-order node, wherein i is 2, 3 and 4 … … M, the initial value of i is 2, K is 3, 4 and 5 … … N, and the initial value of K is 3; determining connection relation information of a second K-order node in the ith data table according to the ith data table and a first preset screening condition, and storing the connection relation information as an (i + 1) th data table; determining an i +2 data table according to the i +1 data table and the first data table, wherein the i +2 data table comprises connection relation information of a first K + 1-order node; determining the connection relation information of the second K + 1-order node in the i +2 data table according to the i +2 data table and a second preset screening condition, and storing the connection relation information of the first K nodes in the second K + 1-order node as an i +3 data table, wherein any two nodes in the first K nodes in the second K + 1-order node are connected with each other; when the (i + 3) th data table exists, i is equal to i +4, and K is equal to K + 1.
In some implementations of the first aspect, acquiring and storing connection relationship information between each node and an adjacent node in the network node as a first data table includes: and extracting each node and adjacent nodes in the network nodes and the edge weight of each node and adjacent nodes to form the connection relation information of each node and adjacent nodes, and storing the connection relation information as a first data table.
In some implementations of the first aspect, when i is 2, determining the ith data table from the first data table includes: and self-connecting the first data table, and determining the ith data table.
In some implementation manners of the first aspect, the first preset filtering condition is to reserve connection relationship information of a first K-th order node where a kth node is different from a K-2 nd node as connection relationship information of a second K-th order node.
In some implementations of the first aspect, determining the (i + 2) th data table from the (i + 1) th data table and the first data table includes: and externally connecting the (i + 1) th data table and the first data table to determine the (i + 2) th data table.
In some implementation manners of the first aspect, the second preset screening condition is that when the first K nodes of any two first K + 1-order nodes in the K-2 first K + 1-order nodes are the same and the sequence of the first K nodes is the same, and the K + 1-th node of each first K + 1-order node is the same as any one node in the first K-2-order nodes, the connection relationship information of the K-2 first K + 1-order nodes is retained as the connection relationship information of the second K + 1-order node.
In some realizations of the first aspect, after determining the connection relationship information of the second K +1 th node in the i +2 th data table according to the i +2 th data table and the second preset screening condition, and storing the connection relationship information of the first K nodes in the second K +1 th node as the i +3 th data table, the method further includes: removing duplication of the connection relation information of the K-order nodes with the same node in the (i + 3) th data table to obtain the connection relation information of the K-order nodes corresponding to each group of K-order nodes; determining a common node according to the connection relation information of the K-order node in the i +3 th data table after the duplication removal; calculating the compactness of the connection relation information of the K-order node in the i +3 th data table after the duplication is removed; and distributing the common nodes according to the compactness of the connection relation information of the K-order nodes in the i +3 th data table after the duplication is removed.
In a second aspect, the present invention provides a complete subgraph data searching device, which includes: the acquisition module is used for acquiring the connection relation information between each node and the adjacent node in the network nodes and storing the connection relation information as a first data table; the determining module is used for determining an ith data table according to the first data table, wherein the ith data table is an extended data table of the first data table and comprises connection relation information of a first K-order node, i is 2, 3 and 4 … … M, the initial value of i is 2, K is 3, 4 and 5 … … N, and the initial value of K is 3; the determining module is further used for determining the connection relation information of a second K-order node in the ith data table according to the ith data table and the first preset screening condition, and storing the connection relation information as an i +1 th data table; the determining module is further used for determining an i +2 data table according to the i +1 data table and the first data table, wherein the i +2 data table comprises connection relation information of a first K + 1-order node; the determining module is further used for determining the connection relationship information of the second K + 1-order node in the i +2 data table according to the i +2 data table and a second preset screening condition, and storing the connection relationship information of the first K nodes in the second K + 1-order node as an i +3 data table, wherein any two nodes in the first K nodes in the second K + 1-order node are connected with each other; and the processing module is used for determining that i is i +4 and K is K +1 when the i +3 th data table exists.
In some implementations of the second aspect, the obtaining module is specifically configured to: and extracting each node and adjacent nodes in the network nodes and the edge weight of each node and adjacent nodes to form the connection relation information of each node and adjacent nodes, and storing the connection relation information as a first data table.
In some implementations of the second aspect, the determining module is further to: when i is 2, self-connection is carried out on the first data table, and the ith data table is determined.
In some implementation manners of the second aspect, the first preset filtering condition is to reserve connection relationship information of a first K-th order node where the kth node is different from the K-2 th node as connection relationship information of a second K-th order node.
In some implementations of the second aspect, the determining module is further to: and externally connecting the (i + 1) th data table and the first data table to determine the (i + 2) th data table.
In some implementation manners of the second aspect, the second preset screening condition is that when the first K nodes of any two first K + 1-order nodes in the K-2 first K + 1-order nodes are the same and the sequence of the first K nodes is the same, and the K + 1-th node of each first K + 1-order node is the same as any one node in the first K-2-order nodes, the connection relationship information of the K-2 first K + 1-order nodes is reserved as the connection relationship information of the second K + 1-order node.
In some implementations of the second aspect, the determining module is further to: after determining the connection relation information of a second K + 1-order node in an i +2 data table according to the i +2 data table and a second preset screening condition and storing the connection relation information of the first K nodes in the second K + 1-order node as an i +3 data table, removing the connection relation information of the K-order nodes with the same node in the i +3 data table to obtain the connection relation information of the K-order nodes corresponding to each group of K-order nodes; determining a common node according to the connection relation information of the K-order node in the i +3 th data table after the duplication removal; calculating the compactness of the connection relation information of the K-order node in the i +3 th data table after the duplication is removed; and distributing the common nodes according to the compactness of the connection relation information of the K-order nodes in the i +3 th data table after the duplication is removed.
In a third aspect, the present invention provides a complete subgraph data searching device, including: a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements the method for searching full sub-graph data described in the first aspect or any of the realizations of the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which computer program instructions are stored, and the computer program instructions, when executed by a processor, implement the method for searching full sub-graph data in the first aspect or any of the realizable manners of the first aspect.
The present invention relates to the field of graph theory, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for searching complete sub-graph data. The connection relation information of each node and adjacent nodes in the network nodes is stored in the first data table, the connection relation information of each node is obtained by expanding the stored connection relation information of the nodes based on the first data table, the connection relation information of the nodes meeting the complete subgraph condition is screened out according to the first preset screening condition and the second preset screening condition, and a node group capable of forming the complete subgraph is directly obtained, so that the complete subgraph can be quickly searched out, the iteration efficiency is improved, and the time for finding out all the complete subgraphs of the complex network is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for searching complete sub-graph data according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a complete sub-graph data search apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a complete subgraph data searching device according to an embodiment of the present invention.
Detailed Description
Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.
At present, nodes are generally required to be traversed when a network is completely sub-graph searched, but when the size of the nodes is large and the network structure is complex, the time cost for traversing the nodes is very high, and even the nodes are difficult to implement. Therefore, searching a complete subgraph in a node traversal manner is not suitable for a large-scale node network.
In view of the above, embodiments of the present invention provide a method, an apparatus, a device, and a computer-readable storage medium for searching complete subgraph data, which are capable of obtaining connection relationship information of each node by storing the connection relationship information of each node and adjacent nodes in a network node in a data table, expanding the stored connection relationship information of the nodes based on the data table, and screening connection relationship information of nodes meeting complete subgraph conditions according to preset screening conditions to directly obtain a node group capable of forming a complete subgraph, so that a complete subgraph can be quickly searched, and time required for finding all complete subgraphs of a complex network is reduced.
The following describes a complete sub-graph data search method provided by the embodiment of the present invention with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of a method for searching complete sub-graph data according to an embodiment of the present invention. As shown in fig. 1, the method 100 for searching complete subgraph data may include S110 to S160.
S110, acquiring connection relation information between each node and adjacent nodes in the network nodes, and storing the connection relation information as a first data table.
Specifically, each node and adjacent nodes in the network nodes and the edge weights of each node and adjacent nodes may be extracted to form connection relationship information between each node and adjacent nodes, and the connection relationship information may be stored as the first data table. The network nodes can be from a complex network and are large in number. Also, the first data table may be a Hive table.
As a specific example, the first data table may be as shown in table 1, for example, the second row has a node a, a neighbor node B, and an edge weight value of 1PABThe connection relationship information of the two nodes constituting A, B can be represented by A-B, and the connection relationship information of the higher-order node can be represented by X-X-X … …. Since the connection relationship information of the nodes in the first data table is the connection relationship information between 2 nodes, the connection relationship information of the nodes stored in the first data table can be regarded as the connection relationship information of the nodes of the 2-order, and the same applies to the connection relationship information of the nodes of the higher order.
TABLE 1
Node point Neighbor node Side weight 1
A B PAB
A C PAC
B D PBD
C A PCA
And S120, determining the ith data table according to the first data table.
The ith data table is an extended data table of the first data table, and the ith data table comprises connection relation information of the first K-order node. Further, i is 2, 3, and 4 … … M, i has an initial value of 2, K is 3, 4, and 5 … … N, and K has an initial value of 3.
When the initial value of i is 2 and the initial value of K is 3, the first data table may be self-connected to expand the connection relation information of the node and determine the 2 nd data table, where the 2 nd data table includes the connection relation information of the first 3 rd-order node.
As a specific example, the 2 nd data table may be as shown in table 2, for example, in the second row, the node is a, the neighboring node is B, the extension node is D, the edge weight 1 is PABThe edge weight 2 is PBDThe connection relation information constituting A, B, D three nodes can be represented by A-B-D, and the other lines are the same.
TABLE 2
Node point Neighbor node Extension node Side weight 1 Side weight 2
A B D PAB PBD
A C A PAC PCA
B D E PBD PDE
C A B PCA PAB
And S130, determining the connection relation information of the second K-order node in the ith data table according to the ith data table and the first preset screening condition, and storing the connection relation information as an (i + 1) th data table.
The first preset screening condition may be that connection relationship information of a first K-order node where the kth node is different from the K-2 th node is reserved as connection relationship information of a second K-order node.
Because the invalid connection relationship information of the first K-order node is generated when the ith data table is determined according to the first data table, the invalid connection relationship information of the first K-order node needs to be removed according to the first preset screening condition, the valid connection relationship information of the first K-order node is reserved as the connection relationship information of the second K-order node, and each node in the connection relationship information of the second K-order node is ensured to be unique and stored as the (i + 1) th data table. The connection relationship information of the first K-th order node, where the kth node is the same as the kth-2 node, may be regarded as the invalid connection relationship information of the first K-th order node. That is to say, the connection relationship information of all the first K-order nodes in the ith data table is screened according to the first preset screening condition, so as to obtain the connection relationship information of the second K-order nodes.
As a specific example, in the connection relationship information of the first K-th order node in the ith data table, the connection relationship information of the 4 th order node, which belongs to invalid connection relationship information, is a-B-C-B, K is 4, or the connection relationship information of the 5 th order node, which belongs to invalid connection relationship information, and if the connection relationship information is not removed, more invalid connection relationship information will be generated in the following extension.
For example, when i is 2 and K is 3, the connection relationship information of the second 3 rd order node in the 2 nd data table may be determined according to the 2 nd data table and the first preset screening condition and stored as the 3 rd data table. And S140, determining an i +2 data table according to the i +1 data table and the first data table.
Specifically, the (i + 1) th data table and the first data table may be externally connected, for example, left connected, to determine the (i + 2) th data table. The (i + 2) th data table comprises connection relation information of a first K + 1-order node, and the connection relation information of the first K + 1-order node is expanded based on the connection relation information of a second K-order node of the (i + 1) th data table and the connection relation information of the nodes stored in the first data table.
For example, when i is 2 and K is 3, a 4 th data table may be determined from the 3 rd data table and the first data table, the 4 th data table including connection relationship information of the first 4 th-order node.
S150, determining the connection relation information of the second K + 1-order node in the (i + 2) th data table according to the (i + 2) th data table and a second preset screening condition, and storing the connection relation information of the previous K nodes in the second K + 1-order node as an (i + 3) th data table.
Any two nodes in the first K nodes in the second K + 1-order node are connected with each other, and a relationship exists. That is to say, the connection relationship information of the first K nodes in the second K + 1-order node is the connection relationship information of the nodes meeting the condition of the complete subgraph, and the connection relationship information of the first K nodes in the second K + 1-order node can form the complete subgraph.
Here, the second preset screening condition may be that when the first K nodes of any two first K + 1-order nodes in the K-2 first K + 1-order nodes are the same and the sequence of the first K nodes is the same, and the K + 1-th node of each first K + 1-order node is the same as any one node in the first K-2-order nodes, the connection relationship information of the K-2 first K + 1-order nodes is retained as the connection relationship information of the second K + 1-order node. That is to say, the connection relationship information of all the first K +1 order nodes in the i +2 data table is screened according to the second preset screening condition, so as to obtain the connection relationship information of the second K +1 order node.
As a specific example, when K is 5, and connection relationship information a-B-F-S-D-F, A-B-F-S-D-B, A-B-F-S-D-a of 3 nodes of 6 orders simultaneously exists in connection relationship information of a first node of K +1 orders in an i +2 data table, a-B-F-S-D-F, A-B-F-S-D-B, A-B-F-S-D-a is reserved as connection relationship information of a second node of K +1 orders, and connection relationship information a-B-F-S-D of 5 orders is considered to form a 5-order complete subgraph.
For example, when i is 2 and K is 3, the connection relationship information of the second 4 th-order node in the 4 th data table may be determined according to the 4 th data table and the second preset screening condition, and the connection relationship information of the first 3 nodes in the second 4 th-order node may be stored as the 5 th data table. Any two nodes in the first 3 nodes in the second 4-order node are connected with each other, that is, the connection relation information of the first 3 nodes in the second 4-order node can form a 3-order complete subgraph.
And S160, when the (i + 3) th data table exists, i is equal to i +4, and K is equal to K + 1.
Specifically, when the (i + 3) th data table exists, on the basis of the (i + 3) th data table, i is equal to i +4, and K is equal to K +1, and S120-S160 are repeatedly executed until a higher-order complete sub-graph cannot be obtained, in other words, when the (i + 3) th data table does not exist, the process is ended.
For example, when i is 2, K is 3, and the 5 th data table exists, i may be 6, and K may be 4, based on the 5 th data table, and the determining of the i-th data table according to the first data table in S120 may be that the 5 th data table is externally connected to the first data table to determine the 6 th data table, where the 6 th data table includes connection relationship information of the first 4 th-order node. That is, S120-S160 are repeated until a higher order complete sub-graph is not available.
According to the method for searching the complete subgraph data, the connection relation information of each node and adjacent nodes in the network nodes is stored in the first data table, the stored connection relation information of the nodes is expanded on the basis of the first data table to obtain the connection relation information of each node, the connection relation information of the nodes meeting the complete subgraph conditions is screened out according to the first preset screening condition and the second preset screening condition, and the node group capable of forming the complete subgraph is directly obtained, so that the complete subgraph can be quickly searched out, the iteration efficiency is improved, and the time required for finding out all the complete subgraphs of the complex network is reduced.
In some embodiments, after determining the connection relationship information of the second K + 1-order node in the i + 2-th data table according to the i + 2-th data table and the second preset screening condition and storing the connection relationship information of the previous K nodes in the second K + 1-order node as the i + 3-th data table, the connection relationship information of the K-order nodes with the same node in the i + 3-th data table may be deduplicated to obtain the connection relationship information of the unique K-order node corresponding to each group of K-order nodes.
As a specific example, since the same node will constitute different connection relation information due to different orders, the complete subgraph constituted by the same node is the same. If the repeated connection relation information is reserved, a large amount of unnecessary calculation is added in the subsequent iteration process, so that the repeated connection relation information needs to be deduplicated, and each group of nodes reserves unique connection relation information. For example, when K is 3, the connection relationship information a-B-C, B-C-A, C-a-B of the 3 rd order node in the connection relationship information of the K order node in the i +3 th data table is identical to the connection relationship information of the three 3 rd order nodes, and only the connection relationship information a-B-C of the 3 rd order node of the A, B, C node may be reserved.
In addition, the common node can be determined according to the connection relationship information of the K-order node in the i +3 th data table after the duplication is removed, the compactness of the connection relationship information of the K-order node in the i +3 th data table after the duplication is removed is calculated, and the common node is distributed according to the compactness of the connection relationship information of the K-order node in the i +3 th data table after the duplication is removed.
As a specific example, in the connection relationship information of the K-th order node in the i +3 th data table after the duplication removal, a situation that the same node belongs to a plurality of connection relationship information may occur, and at this time, the common node needs to be reallocated. The redistribution of the common nodes needs to calculate the closeness Y of the connection relationship information of each K-th order node in the i +3 th data table after the deduplication, in other words, the closeness Y of each complete subgraph is calculated, and the calculation formula may include (1), (2), and (3), specifically as follows:
Figure BDA0002335023760000101
Figure BDA0002335023760000102
Figure BDA0002335023760000103
wherein, PjThe edge weight value between each node in the connection relation information of the K-order node in each i +3 data table after the duplication removal is carried out, and n is the duplication removalThe number of nodes of the connection relation information of the K-order nodes in each i +3 th data table after the connection relation information is repeated,
Figure BDA0002335023760000104
the average value of the edge weight values of the connection relationship information of the K-order nodes in each i +3 th data table after the duplication is removed, sigma is the standard deviation of the edge weight values of the connection relationship information of the K-order nodes in each i +3 th data table after the duplication is removed, and alpha is a weighting coefficient and can be used for adjusting the tendency of the average value and the standard deviation according to the practical application scene.
In other words, PiMay be the edge weight between each node in each full sub-graph after the duplication removal, n is the number of nodes in each full sub-graph after the duplication removal,
Figure BDA0002335023760000105
and sigma is the standard deviation of the edge weight value of each complete sub-graph after the weight is removed.
The compactness Y comprehensively considers the average value and the standard deviation of the edge weight values among all nodes, and the larger the average value is, the smaller the standard deviation is, the tighter the complete subgraph is. When σ ≠ 0, the common node can be preferentially allocated to the full subgraph with higher compactness by compactness Y, and when σ ≠ 0, the common node can be allocated to the full subgraph with higher compactness by compactness Y
Figure BDA0002335023760000111
The common nodes are assigned to the full subgraphs with higher average values.
The following describes a detailed search process of complete sub-graph data with reference to a specific embodiment, wherein the purpose of this embodiment is to identify a group of families living together according to 1145 universal family relationship pairs identified by a province and consistent with the living places. Since living together is a strong relationship, members of a family group can only be considered to be living together if any member of the family group has the relationship. Therefore, the problem can be seen as that a user node group meeting a complete sub-graph condition is searched from 1145 thousands of user relationship pairs, and the specific implementation process is as follows:
step 1, taking any two users as nodes and neighbor nodes, taking the probability of the real family relationship existing between the users as an edge weight, extracting all the user relationship pairs and storing the user relationship pairs into a Hive table to form a first user family relationship table.
And 2, expanding the first user family relation table into a second user family relation table through self connection, wherein the second user family relation table comprises 3 family relation chains.
And 3, screening the 3 family relation chains in the second user family relation table, eliminating the 3 family relation chains with repeated family relations, namely invalid 3 family relation chains, reserving the valid 3 family relation chains, and storing the chains as a third user family relation table to ensure that each user in the 3 family relation chains in the third user family relation table is unique.
And 4, externally connecting the third user family relation table with the first user family relation table, and expanding the third user family relation table into a fourth user family relation table, wherein the fourth user family relation table comprises 4 family relation chains.
And 5, screening 4 family relation chains in the fourth user family relation table, screening 4 family relation chains of which the 4 th user is the same as the 1 st user, ensuring that the 4 family relation chains meet the condition of a complete subgraph, namely any two users have family relations, taking 3 family relation chains of the first 3 users as 3 family groups, and storing the 3 family relation chains as a fifth user family relation table.
And 6, removing the duplicate of the 3 family groups in the fifth user family relationship table, only reserving one 3 family relationship chain for the same 3 family group, calculating the closeness of the family groups according to the probability of forming the real family relationship among the users, and distributing the common users to the family groups with higher closeness.
And 7, repeating the relation chain expansion, screening and duplicate removal processes, and iterating the family groups until no new family group can be searched.
It can be understood that the complete sub-graph data searching method of the embodiment of the invention can be applied to identification of family groups, and also can be applied to identification of groups with corresponding contact between every two members, such as friend groups, work partner groups and the like, so that the time for searching and identifying the groups can be effectively reduced, and the efficiency is improved.
Fig. 2 is a schematic structural diagram of a full sub-graph data search apparatus according to an embodiment of the present invention, and as shown in fig. 2, the full sub-graph data search apparatus 200 may include: an acquisition module 210, a determination module 220, and a processing module 230.
The obtaining module 210 is configured to obtain connection relationship information between each node in the network nodes and an adjacent node, and store the connection relationship information as a first data table. The determining module 220 is configured to determine an ith data table according to the first data table, where the ith data table is an extended data table of the first data table, and the ith data table includes connection relationship information of a first K-th-order node, where i is 2, 3, and 4 … … M, i has an initial value of 2, K is 3, 4, and 5 … … N, and K has an initial value of 3. The determining module 220 is further configured to determine connection relationship information of a second K-th order node in the ith data table according to the ith data table and the first preset screening condition, and store the connection relationship information as an i +1 th data table. The determining module 220 is further configured to determine an i +2 th data table according to the i +1 th data table and the first data table, where the i +2 th data table includes connection relationship information of the first K +1 order node. The determining module 220 is further configured to determine connection relationship information of a second K + 1-order node in the i + 2-th data table according to the i + 2-th data table and a second preset screening condition, and store the connection relationship information of the first K nodes in the second K + 1-order node as an i + 3-th data table, where any two nodes in the first K nodes in the second K + 1-order node are connected to each other. The processing module 230 is configured to, when the i +3 th data table exists, set i to i +4, and set K to K + 1.
In some embodiments, the obtaining module 210 is specifically configured to extract each node and neighboring nodes in the network node, and an edge weight of each node and neighboring nodes, to form connection relationship information of each node and neighboring nodes, and store the connection relationship information as the first data table.
In some embodiments, the determining module 220 is further configured to self-join the first data table to determine the ith data table when i is 2.
In some embodiments, the first preset screening condition is to reserve connection relationship information of a first K-th order node where the kth node is different from the K-2 nd node as connection relationship information of a second K-th order node.
In some embodiments, the determining module 220 is further configured to perform an external connection on the (i + 1) th data table and the first data table, and determine the (i + 2) th data table.
In some embodiments, the second preset filtering condition is that when the first K nodes of any two first K +1 order nodes in the K-2 first K +1 order nodes are the same and the sequence of the first K nodes is the same, and the K +1 th node of each first K +1 order node is the same as any one node in the first K-2 order nodes, the connection relationship information of the K-2 first K +1 order nodes is retained as the connection relationship information of the second K +1 order node.
In some embodiments, the determining module 220 is further configured to, after determining the connection relationship information of the second K + 1-th order node in the i + 2-th data table according to the i + 2-th data table and the second preset screening condition and storing the connection relationship information of the previous K nodes in the second K + 1-th order node as the i + 3-th data table, perform deduplication on the connection relationship information of the K-th order nodes that are the same as the nodes in the i + 3-th data table to obtain the connection relationship information of the K-th order nodes corresponding to each group of K-th order nodes. And determining the common node according to the connection relation information of the K-order node in the i +3 th data table after the duplication is removed. And calculating the compactness of the connection relation information of the K-order node in the i +3 th data table after the duplication is removed. And distributing the common nodes according to the compactness of the connection relation information of the K-order nodes in the i +3 th data table after the duplication is removed.
According to the searching device of the complete subgraph data, the connection relation information of each node and adjacent nodes in the network nodes is stored in the first data table, the stored connection relation information of the nodes is expanded on the basis of the first data table to obtain the connection relation information of each node, the connection relation information of the nodes meeting the complete subgraph conditions is screened out according to the first preset screening condition and the second preset screening condition, and the node group capable of forming the complete subgraph is directly obtained, so that the complete subgraph can be quickly searched out, the iteration efficiency is improved, and the time required for finding out all the complete subgraphs of the complex network is reduced.
It can be understood that the complete sub-graph data search apparatus 200 according to the embodiment of the present invention may correspond to the execution main body of the complete sub-graph data search method according to the embodiment of the present invention in fig. 1, and specific details of operations and/or functions of each module/unit of the complete sub-graph data search apparatus 200 may refer to the description of the corresponding part in the complete sub-graph data search method according to the embodiment of the present invention in fig. 1, and for brevity, no further description is provided here.
Fig. 3 is a schematic diagram of a hardware structure of a complete subgraph data search device according to an embodiment of the present invention.
As shown in fig. 3, the search device 300 of the complete subgraph data in the present embodiment includes an input device 301, an input interface 302, a central processor 303, a memory 304, an output interface 305, and an output device 306. The input interface 302, the central processing unit 303, the memory 304, and the output interface 305 are connected to each other through a bus 310, and the input device 301 and the output device 306 are connected to the bus 310 through the input interface 302 and the output interface 305, respectively, and further connected to other components of the complete subgraph data search device 300.
Specifically, the input device 301 receives input information from the outside and transmits the input information to the central processor 303 through the input interface 302; central processor 303 processes the input information based on computer-executable instructions stored in memory 304 to generate output information, stores the output information temporarily or permanently in memory 304, and then transmits the output information to output device 306 through output interface 305; the output device 306 outputs the output information to the outside of the search device 300 of the full sub-picture data for use by the user.
That is, the search apparatus of the complete sub-picture data shown in fig. 3 may also be implemented to include: a memory storing computer-executable instructions; and a processor that, when executing computer executable instructions, may implement the method of searching for full sub-graph data described in connection with the example shown in fig. 1.
In one embodiment, the search apparatus 300 of full sub-graph data shown in fig. 3 includes: a memory 304 for storing programs; a processor 303, configured to execute a program stored in the memory to perform the method for searching complete subgraph data provided in the embodiment shown in fig. 1.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium has computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement the method for searching complete subgraph data provided by the embodiment shown in fig. 1.
It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic Circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuits, semiconductor Memory devices, Read-Only memories (ROMs), flash memories, erasable ROMs (eroms), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims (13)

1. A method for searching complete subgraph data, the method comprising:
acquiring connection relation information between each node and adjacent nodes in the network nodes, and storing the connection relation information as a first data table;
determining an ith data table according to the first data table, wherein the ith data table is an extended data table of the first data table and comprises connection relation information of a first K-order node, wherein i is 2, 3 and 4 … … M, i has an initial value of 2, K is 3, 4 and 5 … … N, and K has an initial value of 3;
determining connection relation information of a second K-order node in the ith data table according to the ith data table and a first preset screening condition, and storing the connection relation information as an (i + 1) th data table;
determining an i +2 data table according to the i +1 data table and the first data table, wherein the i +2 data table comprises connection relation information of a first K + 1-order node;
determining connection relation information of second K + 1-order nodes in the i +2 data table according to the i +2 data table and a second preset screening condition, and storing the connection relation information of first K nodes in the second K + 1-order nodes as an i +3 data table, wherein any two nodes in the first K nodes in the second K + 1-order nodes are connected with each other;
when the (i + 3) th data table exists, i is equal to i +4, and K is equal to K + 1.
2. The method according to claim 1, wherein the obtaining and storing connection relation information of each node in the network nodes and the neighboring nodes as a first data table comprises:
and extracting each node and adjacent nodes in the network nodes and the edge weight of each node and adjacent nodes to form the connection relation information of each node and adjacent nodes, and storing the connection relation information as the first data table.
3. The method of claim 1, wherein when i is 2, determining an ith data table from the first data table comprises:
and self-connecting the first data table, and determining the ith data table.
4. The method according to claim 1, wherein the first predetermined filtering condition is to reserve connection relationship information of a first K-th node where a kth node is different from a K-2 nd node as connection relationship information of a second K-th node.
5. The method of claim 1, wherein determining the (i + 2) th data table from the (i + 1) th data table and the first data table comprises:
and externally connecting the (i + 1) th data table and the first data table, and determining the (i + 2) th data table.
6. The method according to claim 1, wherein the second preset filtering condition is that when the first K nodes of any two first K + 1-th nodes of the K-2 first K + 1-th nodes are the same and the order of the first K nodes is the same, and the K + 1-th node of each first K + 1-th node is the same as any one node of the first K-2-th nodes, the connection relationship information of the K-2 first K + 1-th nodes is retained as the connection relationship information of the second K + 1-th node.
7. The method according to claim 1, wherein after determining the connection relationship information of the second K +1 th node in the i +2 th data table according to the i +2 th data table and the second preset screening condition, and storing the connection relationship information of the first K nodes in the second K +1 th node as the i +3 th data table, the method further comprises:
removing duplication of the connection relation information of the K-order nodes with the same node in the (i + 3) th data table to obtain the connection relation information of the K-order nodes corresponding to each group of K-order nodes;
determining a common node according to the connection relation information of the K-order node in the i +3 th data table after the duplication removal;
calculating the compactness of the connection relation information of the K-order node in the i +3 th data table after the duplication is removed;
and distributing the common nodes according to the compactness of the connection relation information of the K-order nodes in the i +3 th data table after the duplication is removed.
8. An apparatus for searching complete sub-graph data, the apparatus comprising:
the acquisition module is used for acquiring the connection relation information between each node and the adjacent node in the network nodes and storing the connection relation information as a first data table;
the determining module is used for determining an ith data table according to the first data table, wherein the ith data table is an extended data table of the first data table and comprises connection relation information of a first K-order node, i is 2, 3 and 4 … … M, the initial value of i is 2, K is 3, 4 and 5 … … N, and the initial value of K is 3;
the determining module is further used for determining connection relation information of a second K-order node in the ith data table according to the ith data table and a first preset screening condition, and storing the connection relation information as an i +1 th data table;
the determining module is further configured to determine an i +2 th data table according to the i +1 th data table and the first data table, where the i +2 th data table includes connection relationship information of a first K +1 order node;
the determining module is further configured to determine connection relationship information of a second K + 1-order node in the i +2 data table according to the i +2 data table and a second preset screening condition, and store the connection relationship information of first K nodes in the second K + 1-order node as an i +3 data table, where any two nodes in the first K nodes in the second K + 1-order node are connected with each other;
and the processing module is used for determining that i is i +4 and K is K +1 when the i +3 th data table exists.
9. The apparatus according to claim 8, wherein the first predetermined filtering condition is to reserve connection relationship information of a first K-th node where a kth node is different from a K-2 nd node as connection relationship information of a second K-th node.
10. The apparatus according to claim 8, wherein the second predetermined filtering condition is to keep the connection relationship information of the K-2 first K + 1-th nodes as the connection relationship information of the second K + 1-th node when the first K nodes of any two first K + 1-th nodes of the K-2 first K + 1-th nodes are the same and the order of the first K nodes is the same, and the K + 1-th node of each first K + 1-th node is the same as any one of the first K-2 nodes.
11. The apparatus of claim 8, wherein the determining module is further configured to:
after determining the connection relationship information of the second K + 1-order node in the i +2 data table according to the i +2 data table and a second preset screening condition and storing the connection relationship information of the previous K nodes in the second K + 1-order node as an i +3 data table, removing the connection relationship information of the K-order nodes with the same node in the i +3 data table to obtain the connection relationship information of the K-order nodes corresponding to each group of K-order nodes;
determining a common node according to the connection relation information of the K-order node in the i +3 th data table after the duplication removal;
calculating the compactness of the connection relation information of the K-order node in the i +3 th data table after the duplication is removed;
and distributing the common nodes according to the compactness of the connection relation information of the K-order nodes in the i +3 th data table after the duplication is removed.
12. A device for searching complete subgraph data, characterized in that it comprises: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements a method of searching full sub-graph data according to any of claims 1-7.
13. A computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, implement the method of searching full sub-graph data according to any of claims 1-7.
CN201911352614.5A 2019-12-25 2019-12-25 Complete subgraph data searching method, device, equipment and medium Pending CN113032636A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911352614.5A CN113032636A (en) 2019-12-25 2019-12-25 Complete subgraph data searching method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911352614.5A CN113032636A (en) 2019-12-25 2019-12-25 Complete subgraph data searching method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN113032636A true CN113032636A (en) 2021-06-25

Family

ID=76452337

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911352614.5A Pending CN113032636A (en) 2019-12-25 2019-12-25 Complete subgraph data searching method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113032636A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306581A (en) * 2023-05-08 2023-06-23 中新宽维传媒科技有限公司 Event extraction method and device

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040148275A1 (en) * 2003-01-29 2004-07-29 Dimitris Achlioptas System and method for employing social networks for information discovery
CN103426042A (en) * 2012-05-15 2013-12-04 腾讯科技(深圳)有限公司 Method and system for grouping in social network
CN103914493A (en) * 2013-01-09 2014-07-09 北大方正集团有限公司 Method and system for discovering and analyzing microblog user group structure
CN104933624A (en) * 2015-06-29 2015-09-23 电子科技大学 Community discovery method of complex network and important node discovery method of community
CN105117422A (en) * 2015-07-30 2015-12-02 中国传媒大学 Intelligent social network recommender system
CN107679097A (en) * 2017-09-08 2018-02-09 广州汉邮通信有限公司 A kind of distributed data processing method, system and storage medium
CN108182265A (en) * 2018-01-09 2018-06-19 清华大学 For the Multilevel Iteration screening technique and device of relational network
CN108429683A (en) * 2018-03-20 2018-08-21 苏州大学 A kind of network data routing method, system and device
CN108733686A (en) * 2017-04-17 2018-11-02 伊姆西Ip控股有限责任公司 Information processing method and equipment
CN109697467A (en) * 2018-12-24 2019-04-30 宁波大学 A kind of summarization methods of complex network figure
CN109978705A (en) * 2019-02-26 2019-07-05 华中科技大学 Combo discovering method in a kind of social networks enumerated based on Maximum Clique

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040148275A1 (en) * 2003-01-29 2004-07-29 Dimitris Achlioptas System and method for employing social networks for information discovery
CN103426042A (en) * 2012-05-15 2013-12-04 腾讯科技(深圳)有限公司 Method and system for grouping in social network
US20150356444A1 (en) * 2013-01-09 2015-12-10 Peking University Founder Group Co., Ltd. Method and system of discovering and analyzing structures of user groups in microblog
WO2014107988A1 (en) * 2013-01-09 2014-07-17 北大方正集团有限公司 Method and system for discovering and analyzing micro-blog user group structure
CN103914493A (en) * 2013-01-09 2014-07-09 北大方正集团有限公司 Method and system for discovering and analyzing microblog user group structure
CN104933624A (en) * 2015-06-29 2015-09-23 电子科技大学 Community discovery method of complex network and important node discovery method of community
CN105117422A (en) * 2015-07-30 2015-12-02 中国传媒大学 Intelligent social network recommender system
CN108733686A (en) * 2017-04-17 2018-11-02 伊姆西Ip控股有限责任公司 Information processing method and equipment
CN107679097A (en) * 2017-09-08 2018-02-09 广州汉邮通信有限公司 A kind of distributed data processing method, system and storage medium
CN108182265A (en) * 2018-01-09 2018-06-19 清华大学 For the Multilevel Iteration screening technique and device of relational network
CN108429683A (en) * 2018-03-20 2018-08-21 苏州大学 A kind of network data routing method, system and device
CN109697467A (en) * 2018-12-24 2019-04-30 宁波大学 A kind of summarization methods of complex network figure
CN109978705A (en) * 2019-02-26 2019-07-05 华中科技大学 Combo discovering method in a kind of social networks enumerated based on Maximum Clique

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BOWIEE: "极大团(maximal clique)算法:Bron-Kerbosch算法", Retrieved from the Internet <URL:https://www.jianshu.com/p/437bd6936dad> *
JAYPHONE17: "回溯、图论——最大团问题(求最大完全子图)", pages 2 - 3, Retrieved from the Internet <URL:https://blog.csdn.net/Jayphone17/article/details/102962990> *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306581A (en) * 2023-05-08 2023-06-23 中新宽维传媒科技有限公司 Event extraction method and device

Similar Documents

Publication Publication Date Title
CN109859054B (en) Network community mining method and device, computer equipment and storage medium
Wang et al. A divide-and-conquer approach for minimum spanning tree-based clustering
Ren et al. Multi-query optimization for subgraph isomorphism search
CN111125268B (en) Network alarm analysis model creation method, alarm analysis method and device
EP2945071B1 (en) Index generating device and method, and search device and search method
US20100022752A1 (en) Identifying components of a network having high importance for network integrity
CN105528407B (en) Method and device for acquiring L users with optimal propagation influence
CN110795603B (en) Prediction method and device based on tree model
US20190149419A1 (en) Information processing device and information processing method
US9674083B2 (en) Path calculation order deciding method, program and calculating apparatus
CN113032636A (en) Complete subgraph data searching method, device, equipment and medium
Iwata et al. Separator-based pruned dynamic programming for Steiner tree
WO2015014637A1 (en) Local neighbourhood sub-graph matching method
CN112287400A (en) Transaction sequencing method and device in super account book and computer equipment
KR101847965B1 (en) Apparatus Detecting Target Node in Network Using Topology Matrix and Method thereof
CN111222667B (en) Route planning method, device, equipment and storage medium
JP5761029B2 (en) Dictionary creation device, word collection method, and program
Khaled et al. Solving limited-memory influence diagrams using branch-and-bound search
CN107315693A (en) A kind of date storage method and device
CN106506183A (en) The discovery method and device of Web Community
CN116016205B (en) Network key node identification method based on comprehensive strength and node efficiency
JP2008159015A (en) Frequent pattern mining system and frequent pattern mining method
CN113672751B (en) Background similar picture clustering method and device, electronic equipment and storage medium
Yan et al. Overlapping community detection based on contribution value improved SLPA
CN115118650B (en) Complex network key node identification method based on information propagation probability

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination