CN113032636A

CN113032636A - Complete subgraph data searching method, device, equipment and medium

Info

Publication number: CN113032636A
Application number: CN201911352614.5A
Authority: CN
Inventors: 李三川; 谢笑娟; 李金柱; 吴丽丽; 余韦; 梁恩磊; 杨猛; 陶涛; 徐海勇
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Information Technology Co Ltd
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2021-06-25

Abstract

The embodiment of the invention provides a method, a device, equipment and a medium for searching complete subgraph data. The method comprises the following steps: acquiring connection relation information of nodes in network nodes and storing the connection relation information as a first data table; determining an ith data table according to the first data table, wherein the ith data table comprises connection relation information of a first K-order node, the initial value of i is 2, and the initial value of K is 3; determining connection relation information of a second K-order node according to the ith data table and a first preset screening condition and storing the connection relation information as an (i + 1) th data table; determining an i +2 data table according to the i +1 data table and the first data table, wherein the i +2 data table comprises connection relation information of a first K + 1-order node; determining the connection relation information of a second K + 1-order node according to the i +2 data table and a second preset screening condition, and storing the connection relation information of the first K nodes in the second K + 1-order node as an i +3 data table, wherein any two nodes in the first K nodes are connected; when the (i + 3) th data table exists, i is equal to i +4, and K is equal to K +1, a complete subgraph can be rapidly acquired.

Description

Complete subgraph data searching method, device, equipment and medium

Technical Field

The present invention relates to the field of graph theory, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for searching complete sub-graph data.

Background

The complex network generally refers to a network with high complexity, such as an interpersonal relationship network, a traffic network, a communication network and the like, and a complete subgraph in the complex network refers to a sub-network in the complex network, which satisfies that any two nodes have a connection relationship. Complete subgraphs in a complex network have important practical significance, for example, in an interpersonal relationship network, artificial nodes are used, and complete subgraphs related pairwise represent friend groups closely related; or in the product part relation network, the part is taken as a node, any two nodes are connected to form an edge, and the node and the edge form a complete subgraph. The complete subgraph can effectively cluster the complex network, and the search of the complete subgraph is the basis of the cluster clustering of the complex network.

At present, the conventional complete subgraph search method needs to traverse all nodes twice per iteration, so that the time complexity of the search is rapidly increased along with the increase of the number of the nodes.

Disclosure of Invention

The embodiment of the invention provides a method, a device and equipment for searching complete subgraph data and a computer-readable storage medium, which can quickly search out complete subgraphs and reduce the time for finding out all complete subgraphs of a complex network.

In a first aspect, the present invention provides a method for searching complete subgraph data, including: acquiring connection relation information between each node and adjacent nodes in the network nodes, and storing the connection relation information as a first data table; determining an ith data table according to the first data table, wherein the ith data table is an extended data table of the first data table and comprises connection relation information of a first K-order node, wherein i is 2, 3 and 4 … … M, the initial value of i is 2, K is 3, 4 and 5 … … N, and the initial value of K is 3; determining connection relation information of a second K-order node in the ith data table according to the ith data table and a first preset screening condition, and storing the connection relation information as an (i + 1) th data table; determining an i +2 data table according to the i +1 data table and the first data table, wherein the i +2 data table comprises connection relation information of a first K + 1-order node; determining the connection relation information of the second K + 1-order node in the i +2 data table according to the i +2 data table and a second preset screening condition, and storing the connection relation information of the first K nodes in the second K + 1-order node as an i +3 data table, wherein any two nodes in the first K nodes in the second K + 1-order node are connected with each other; when the (i + 3) th data table exists, i is equal to i +4, and K is equal to K + 1.

In some implementations of the first aspect, acquiring and storing connection relationship information between each node and an adjacent node in the network node as a first data table includes: and extracting each node and adjacent nodes in the network nodes and the edge weight of each node and adjacent nodes to form the connection relation information of each node and adjacent nodes, and storing the connection relation information as a first data table.

In some implementations of the first aspect, when i is 2, determining the ith data table from the first data table includes: and self-connecting the first data table, and determining the ith data table.

In some implementation manners of the first aspect, the first preset filtering condition is to reserve connection relationship information of a first K-th order node where a kth node is different from a K-2 nd node as connection relationship information of a second K-th order node.

In some implementations of the first aspect, determining the (i + 2) th data table from the (i + 1) th data table and the first data table includes: and externally connecting the (i + 1) th data table and the first data table to determine the (i + 2) th data table.

In some implementation manners of the first aspect, the second preset screening condition is that when the first K nodes of any two first K + 1-order nodes in the K-2 first K + 1-order nodes are the same and the sequence of the first K nodes is the same, and the K + 1-th node of each first K + 1-order node is the same as any one node in the first K-2-order nodes, the connection relationship information of the K-2 first K + 1-order nodes is retained as the connection relationship information of the second K + 1-order node.

In some realizations of the first aspect, after determining the connection relationship information of the second K +1 th node in the i +2 th data table according to the i +2 th data table and the second preset screening condition, and storing the connection relationship information of the first K nodes in the second K +1 th node as the i +3 th data table, the method further includes: removing duplication of the connection relation information of the K-order nodes with the same node in the (i + 3) th data table to obtain the connection relation information of the K-order nodes corresponding to each group of K-order nodes; determining a common node according to the connection relation information of the K-order node in the i +3 th data table after the duplication removal; calculating the compactness of the connection relation information of the K-order node in the i +3 th data table after the duplication is removed; and distributing the common nodes according to the compactness of the connection relation information of the K-order nodes in the i +3 th data table after the duplication is removed.

In a second aspect, the present invention provides a complete subgraph data searching device, which includes: the acquisition module is used for acquiring the connection relation information between each node and the adjacent node in the network nodes and storing the connection relation information as a first data table; the determining module is used for determining an ith data table according to the first data table, wherein the ith data table is an extended data table of the first data table and comprises connection relation information of a first K-order node, i is 2, 3 and 4 … … M, the initial value of i is 2, K is 3, 4 and 5 … … N, and the initial value of K is 3; the determining module is further used for determining the connection relation information of a second K-order node in the ith data table according to the ith data table and the first preset screening condition, and storing the connection relation information as an i +1 th data table; the determining module is further used for determining an i +2 data table according to the i +1 data table and the first data table, wherein the i +2 data table comprises connection relation information of a first K + 1-order node; the determining module is further used for determining the connection relationship information of the second K + 1-order node in the i +2 data table according to the i +2 data table and a second preset screening condition, and storing the connection relationship information of the first K nodes in the second K + 1-order node as an i +3 data table, wherein any two nodes in the first K nodes in the second K + 1-order node are connected with each other; and the processing module is used for determining that i is i +4 and K is K +1 when the i +3 th data table exists.

In some implementations of the second aspect, the obtaining module is specifically configured to: and extracting each node and adjacent nodes in the network nodes and the edge weight of each node and adjacent nodes to form the connection relation information of each node and adjacent nodes, and storing the connection relation information as a first data table.

In some implementations of the second aspect, the determining module is further to: when i is 2, self-connection is carried out on the first data table, and the ith data table is determined.

In some implementation manners of the second aspect, the first preset filtering condition is to reserve connection relationship information of a first K-th order node where the kth node is different from the K-2 th node as connection relationship information of a second K-th order node.

In some implementations of the second aspect, the determining module is further to: and externally connecting the (i + 1) th data table and the first data table to determine the (i + 2) th data table.

In some implementation manners of the second aspect, the second preset screening condition is that when the first K nodes of any two first K + 1-order nodes in the K-2 first K + 1-order nodes are the same and the sequence of the first K nodes is the same, and the K + 1-th node of each first K + 1-order node is the same as any one node in the first K-2-order nodes, the connection relationship information of the K-2 first K + 1-order nodes is reserved as the connection relationship information of the second K + 1-order node.

In some implementations of the second aspect, the determining module is further to: after determining the connection relation information of a second K + 1-order node in an i +2 data table according to the i +2 data table and a second preset screening condition and storing the connection relation information of the first K nodes in the second K + 1-order node as an i +3 data table, removing the connection relation information of the K-order nodes with the same node in the i +3 data table to obtain the connection relation information of the K-order nodes corresponding to each group of K-order nodes; determining a common node according to the connection relation information of the K-order node in the i +3 th data table after the duplication removal; calculating the compactness of the connection relation information of the K-order node in the i +3 th data table after the duplication is removed; and distributing the common nodes according to the compactness of the connection relation information of the K-order nodes in the i +3 th data table after the duplication is removed.

In a third aspect, the present invention provides a complete subgraph data searching device, including: a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements the method for searching full sub-graph data described in the first aspect or any of the realizations of the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which computer program instructions are stored, and the computer program instructions, when executed by a processor, implement the method for searching full sub-graph data in the first aspect or any of the realizable manners of the first aspect.

The present invention relates to the field of graph theory, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for searching complete sub-graph data. The connection relation information of each node and adjacent nodes in the network nodes is stored in the first data table, the connection relation information of each node is obtained by expanding the stored connection relation information of the nodes based on the first data table, the connection relation information of the nodes meeting the complete subgraph condition is screened out according to the first preset screening condition and the second preset screening condition, and a node group capable of forming the complete subgraph is directly obtained, so that the complete subgraph can be quickly searched out, the iteration efficiency is improved, and the time for finding out all the complete subgraphs of the complex network is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for searching complete sub-graph data according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a complete sub-graph data search apparatus according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a complete subgraph data searching device according to an embodiment of the present invention.

Detailed Description

Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.

At present, nodes are generally required to be traversed when a network is completely sub-graph searched, but when the size of the nodes is large and the network structure is complex, the time cost for traversing the nodes is very high, and even the nodes are difficult to implement. Therefore, searching a complete subgraph in a node traversal manner is not suitable for a large-scale node network.

In view of the above, embodiments of the present invention provide a method, an apparatus, a device, and a computer-readable storage medium for searching complete subgraph data, which are capable of obtaining connection relationship information of each node by storing the connection relationship information of each node and adjacent nodes in a network node in a data table, expanding the stored connection relationship information of the nodes based on the data table, and screening connection relationship information of nodes meeting complete subgraph conditions according to preset screening conditions to directly obtain a node group capable of forming a complete subgraph, so that a complete subgraph can be quickly searched, and time required for finding all complete subgraphs of a complex network is reduced.

The following describes a complete sub-graph data search method provided by the embodiment of the present invention with reference to the accompanying drawings.

Fig. 1 is a schematic flowchart of a method for searching complete sub-graph data according to an embodiment of the present invention. As shown in fig. 1, the method 100 for searching complete subgraph data may include S110 to S160.

S110, acquiring connection relation information between each node and adjacent nodes in the network nodes, and storing the connection relation information as a first data table.

Specifically, each node and adjacent nodes in the network nodes and the edge weights of each node and adjacent nodes may be extracted to form connection relationship information between each node and adjacent nodes, and the connection relationship information may be stored as the first data table. The network nodes can be from a complex network and are large in number. Also, the first data table may be a Hive table.

As a specific example, the first data table may be as shown in table 1, for example, the second row has a node a, a neighbor node B, and an edge weight value of 1P_ABThe connection relationship information of the two nodes constituting A, B can be represented by A-B, and the connection relationship information of the higher-order node can be represented by X-X-X … …. Since the connection relationship information of the nodes in the first data table is the connection relationship information between 2 nodes, the connection relationship information of the nodes stored in the first data table can be regarded as the connection relationship information of the nodes of the 2-order, and the same applies to the connection relationship information of the nodes of the higher order.

TABLE 1

Node point	Neighbor node	Side weight 1
			A	B	P_AB
A	C	P_AC
			B	D	P_BD
C	A	P_CA
			…	…	…

And S120, determining the ith data table according to the first data table.

The ith data table is an extended data table of the first data table, and the ith data table comprises connection relation information of the first K-order node. Further, i is 2, 3, and 4 … … M, i has an initial value of 2, K is 3, 4, and 5 … … N, and K has an initial value of 3.

When the initial value of i is 2 and the initial value of K is 3, the first data table may be self-connected to expand the connection relation information of the node and determine the 2 nd data table, where the 2 nd data table includes the connection relation information of the first 3 rd-order node.

As a specific example, the 2 nd data table may be as shown in table 2, for example, in the second row, the node is a, the neighboring node is B, the extension node is D, the edge weight 1 is P_ABThe edge weight 2 is P_BDThe connection relation information constituting A, B, D three nodes can be represented by A-B-D, and the other lines are the same.

TABLE 2

Node point	Neighbor node	Extension node	Side weight 1	Side weight 2
					A	B	D	P_AB	P_BD
A	C	A	P_AC	P_CA
					B	D	E	P_BD	P_DE
C	A	B	P_CA	P_AB
					…	…	…	…	…

And S130, determining the connection relation information of the second K-order node in the ith data table according to the ith data table and the first preset screening condition, and storing the connection relation information as an (i + 1) th data table.

The first preset screening condition may be that connection relationship information of a first K-order node where the kth node is different from the K-2 th node is reserved as connection relationship information of a second K-order node.

Because the invalid connection relationship information of the first K-order node is generated when the ith data table is determined according to the first data table, the invalid connection relationship information of the first K-order node needs to be removed according to the first preset screening condition, the valid connection relationship information of the first K-order node is reserved as the connection relationship information of the second K-order node, and each node in the connection relationship information of the second K-order node is ensured to be unique and stored as the (i + 1) th data table. The connection relationship information of the first K-th order node, where the kth node is the same as the kth-2 node, may be regarded as the invalid connection relationship information of the first K-th order node. That is to say, the connection relationship information of all the first K-order nodes in the ith data table is screened according to the first preset screening condition, so as to obtain the connection relationship information of the second K-order nodes.

As a specific example, in the connection relationship information of the first K-th order node in the ith data table, the connection relationship information of the 4 th order node, which belongs to invalid connection relationship information, is a-B-C-B, K is 4, or the connection relationship information of the 5 th order node, which belongs to invalid connection relationship information, and if the connection relationship information is not removed, more invalid connection relationship information will be generated in the following extension.

For example, when i is 2 and K is 3, the connection relationship information of the second 3 rd order node in the 2 nd data table may be determined according to the 2 nd data table and the first preset screening condition and stored as the 3 rd data table. And S140, determining an i +2 data table according to the i +1 data table and the first data table.

Specifically, the (i + 1) th data table and the first data table may be externally connected, for example, left connected, to determine the (i + 2) th data table. The (i + 2) th data table comprises connection relation information of a first K + 1-order node, and the connection relation information of the first K + 1-order node is expanded based on the connection relation information of a second K-order node of the (i + 1) th data table and the connection relation information of the nodes stored in the first data table.

For example, when i is 2 and K is 3, a 4 th data table may be determined from the 3 rd data table and the first data table, the 4 th data table including connection relationship information of the first 4 th-order node.

S150, determining the connection relation information of the second K + 1-order node in the (i + 2) th data table according to the (i + 2) th data table and a second preset screening condition, and storing the connection relation information of the previous K nodes in the second K + 1-order node as an (i + 3) th data table.

Any two nodes in the first K nodes in the second K + 1-order node are connected with each other, and a relationship exists. That is to say, the connection relationship information of the first K nodes in the second K + 1-order node is the connection relationship information of the nodes meeting the condition of the complete subgraph, and the connection relationship information of the first K nodes in the second K + 1-order node can form the complete subgraph.

Here, the second preset screening condition may be that when the first K nodes of any two first K + 1-order nodes in the K-2 first K + 1-order nodes are the same and the sequence of the first K nodes is the same, and the K + 1-th node of each first K + 1-order node is the same as any one node in the first K-2-order nodes, the connection relationship information of the K-2 first K + 1-order nodes is retained as the connection relationship information of the second K + 1-order node. That is to say, the connection relationship information of all the first K +1 order nodes in the i +2 data table is screened according to the second preset screening condition, so as to obtain the connection relationship information of the second K +1 order node.

As a specific example, when K is 5, and connection relationship information a-B-F-S-D-F, A-B-F-S-D-B, A-B-F-S-D-a of 3 nodes of 6 orders simultaneously exists in connection relationship information of a first node of K +1 orders in an i +2 data table, a-B-F-S-D-F, A-B-F-S-D-B, A-B-F-S-D-a is reserved as connection relationship information of a second node of K +1 orders, and connection relationship information a-B-F-S-D of 5 orders is considered to form a 5-order complete subgraph.

For example, when i is 2 and K is 3, the connection relationship information of the second 4 th-order node in the 4 th data table may be determined according to the 4 th data table and the second preset screening condition, and the connection relationship information of the first 3 nodes in the second 4 th-order node may be stored as the 5 th data table. Any two nodes in the first 3 nodes in the second 4-order node are connected with each other, that is, the connection relation information of the first 3 nodes in the second 4-order node can form a 3-order complete subgraph.

And S160, when the (i + 3) th data table exists, i is equal to i +4, and K is equal to K + 1.

Specifically, when the (i + 3) th data table exists, on the basis of the (i + 3) th data table, i is equal to i +4, and K is equal to K +1, and S120-S160 are repeatedly executed until a higher-order complete sub-graph cannot be obtained, in other words, when the (i + 3) th data table does not exist, the process is ended.

For example, when i is 2, K is 3, and the 5 th data table exists, i may be 6, and K may be 4, based on the 5 th data table, and the determining of the i-th data table according to the first data table in S120 may be that the 5 th data table is externally connected to the first data table to determine the 6 th data table, where the 6 th data table includes connection relationship information of the first 4 th-order node. That is, S120-S160 are repeated until a higher order complete sub-graph is not available.

According to the method for searching the complete subgraph data, the connection relation information of each node and adjacent nodes in the network nodes is stored in the first data table, the stored connection relation information of the nodes is expanded on the basis of the first data table to obtain the connection relation information of each node, the connection relation information of the nodes meeting the complete subgraph conditions is screened out according to the first preset screening condition and the second preset screening condition, and the node group capable of forming the complete subgraph is directly obtained, so that the complete subgraph can be quickly searched out, the iteration efficiency is improved, and the time required for finding out all the complete subgraphs of the complex network is reduced.

In some embodiments, after determining the connection relationship information of the second K + 1-order node in the i + 2-th data table according to the i + 2-th data table and the second preset screening condition and storing the connection relationship information of the previous K nodes in the second K + 1-order node as the i + 3-th data table, the connection relationship information of the K-order nodes with the same node in the i + 3-th data table may be deduplicated to obtain the connection relationship information of the unique K-order node corresponding to each group of K-order nodes.

As a specific example, since the same node will constitute different connection relation information due to different orders, the complete subgraph constituted by the same node is the same. If the repeated connection relation information is reserved, a large amount of unnecessary calculation is added in the subsequent iteration process, so that the repeated connection relation information needs to be deduplicated, and each group of nodes reserves unique connection relation information. For example, when K is 3, the connection relationship information a-B-C, B-C-A, C-a-B of the 3 rd order node in the connection relationship information of the K order node in the i +3 th data table is identical to the connection relationship information of the three 3 rd order nodes, and only the connection relationship information a-B-C of the 3 rd order node of the A, B, C node may be reserved.

In addition, the common node can be determined according to the connection relationship information of the K-order node in the i +3 th data table after the duplication is removed, the compactness of the connection relationship information of the K-order node in the i +3 th data table after the duplication is removed is calculated, and the common node is distributed according to the compactness of the connection relationship information of the K-order node in the i +3 th data table after the duplication is removed.

As a specific example, in the connection relationship information of the K-th order node in the i +3 th data table after the duplication removal, a situation that the same node belongs to a plurality of connection relationship information may occur, and at this time, the common node needs to be reallocated. The redistribution of the common nodes needs to calculate the closeness Y of the connection relationship information of each K-th order node in the i +3 th data table after the deduplication, in other words, the closeness Y of each complete subgraph is calculated, and the calculation formula may include (1), (2), and (3), specifically as follows:

wherein, P_jThe edge weight value between each node in the connection relation information of the K-order node in each i +3 data table after the duplication removal is carried out, and n is the duplication removalThe number of nodes of the connection relation information of the K-order nodes in each i +3 th data table after the connection relation information is repeated,

the average value of the edge weight values of the connection relationship information of the K-order nodes in each i +3 th data table after the duplication is removed, sigma is the standard deviation of the edge weight values of the connection relationship information of the K-order nodes in each i +3 th data table after the duplication is removed, and alpha is a weighting coefficient and can be used for adjusting the tendency of the average value and the standard deviation according to the practical application scene.

In other words, P_iMay be the edge weight between each node in each full sub-graph after the duplication removal, n is the number of nodes in each full sub-graph after the duplication removal,

and sigma is the standard deviation of the edge weight value of each complete sub-graph after the weight is removed.

The compactness Y comprehensively considers the average value and the standard deviation of the edge weight values among all nodes, and the larger the average value is, the smaller the standard deviation is, the tighter the complete subgraph is. When σ ≠ 0, the common node can be preferentially allocated to the full subgraph with higher compactness by compactness Y, and when σ ≠ 0, the common node can be allocated to the full subgraph with higher compactness by compactness Y

The common nodes are assigned to the full subgraphs with higher average values.

The following describes a detailed search process of complete sub-graph data with reference to a specific embodiment, wherein the purpose of this embodiment is to identify a group of families living together according to 1145 universal family relationship pairs identified by a province and consistent with the living places. Since living together is a strong relationship, members of a family group can only be considered to be living together if any member of the family group has the relationship. Therefore, the problem can be seen as that a user node group meeting a complete sub-graph condition is searched from 1145 thousands of user relationship pairs, and the specific implementation process is as follows:

step 1, taking any two users as nodes and neighbor nodes, taking the probability of the real family relationship existing between the users as an edge weight, extracting all the user relationship pairs and storing the user relationship pairs into a Hive table to form a first user family relationship table.

And 2, expanding the first user family relation table into a second user family relation table through self connection, wherein the second user family relation table comprises 3 family relation chains.

And 3, screening the 3 family relation chains in the second user family relation table, eliminating the 3 family relation chains with repeated family relations, namely invalid 3 family relation chains, reserving the valid 3 family relation chains, and storing the chains as a third user family relation table to ensure that each user in the 3 family relation chains in the third user family relation table is unique.

And 4, externally connecting the third user family relation table with the first user family relation table, and expanding the third user family relation table into a fourth user family relation table, wherein the fourth user family relation table comprises 4 family relation chains.

And 5, screening 4 family relation chains in the fourth user family relation table, screening 4 family relation chains of which the 4 th user is the same as the 1 st user, ensuring that the 4 family relation chains meet the condition of a complete subgraph, namely any two users have family relations, taking 3 family relation chains of the first 3 users as 3 family groups, and storing the 3 family relation chains as a fifth user family relation table.

And 6, removing the duplicate of the 3 family groups in the fifth user family relationship table, only reserving one 3 family relationship chain for the same 3 family group, calculating the closeness of the family groups according to the probability of forming the real family relationship among the users, and distributing the common users to the family groups with higher closeness.

And 7, repeating the relation chain expansion, screening and duplicate removal processes, and iterating the family groups until no new family group can be searched.

It can be understood that the complete sub-graph data searching method of the embodiment of the invention can be applied to identification of family groups, and also can be applied to identification of groups with corresponding contact between every two members, such as friend groups, work partner groups and the like, so that the time for searching and identifying the groups can be effectively reduced, and the efficiency is improved.

Fig. 2 is a schematic structural diagram of a full sub-graph data search apparatus according to an embodiment of the present invention, and as shown in fig. 2, the full sub-graph data search apparatus 200 may include: an acquisition module 210, a determination module 220, and a processing module 230.

The obtaining module 210 is configured to obtain connection relationship information between each node in the network nodes and an adjacent node, and store the connection relationship information as a first data table. The determining module 220 is configured to determine an ith data table according to the first data table, where the ith data table is an extended data table of the first data table, and the ith data table includes connection relationship information of a first K-th-order node, where i is 2, 3, and 4 … … M, i has an initial value of 2, K is 3, 4, and 5 … … N, and K has an initial value of 3. The determining module 220 is further configured to determine connection relationship information of a second K-th order node in the ith data table according to the ith data table and the first preset screening condition, and store the connection relationship information as an i +1 th data table. The determining module 220 is further configured to determine an i +2 th data table according to the i +1 th data table and the first data table, where the i +2 th data table includes connection relationship information of the first K +1 order node. The determining module 220 is further configured to determine connection relationship information of a second K + 1-order node in the i + 2-th data table according to the i + 2-th data table and a second preset screening condition, and store the connection relationship information of the first K nodes in the second K + 1-order node as an i + 3-th data table, where any two nodes in the first K nodes in the second K + 1-order node are connected to each other. The processing module 230 is configured to, when the i +3 th data table exists, set i to i +4, and set K to K + 1.

In some embodiments, the obtaining module 210 is specifically configured to extract each node and neighboring nodes in the network node, and an edge weight of each node and neighboring nodes, to form connection relationship information of each node and neighboring nodes, and store the connection relationship information as the first data table.

In some embodiments, the determining module 220 is further configured to self-join the first data table to determine the ith data table when i is 2.

In some embodiments, the first preset screening condition is to reserve connection relationship information of a first K-th order node where the kth node is different from the K-2 nd node as connection relationship information of a second K-th order node.

In some embodiments, the determining module 220 is further configured to perform an external connection on the (i + 1) th data table and the first data table, and determine the (i + 2) th data table.

In some embodiments, the second preset filtering condition is that when the first K nodes of any two first K +1 order nodes in the K-2 first K +1 order nodes are the same and the sequence of the first K nodes is the same, and the K +1 th node of each first K +1 order node is the same as any one node in the first K-2 order nodes, the connection relationship information of the K-2 first K +1 order nodes is retained as the connection relationship information of the second K +1 order node.

In some embodiments, the determining module 220 is further configured to, after determining the connection relationship information of the second K + 1-th order node in the i + 2-th data table according to the i + 2-th data table and the second preset screening condition and storing the connection relationship information of the previous K nodes in the second K + 1-th order node as the i + 3-th data table, perform deduplication on the connection relationship information of the K-th order nodes that are the same as the nodes in the i + 3-th data table to obtain the connection relationship information of the K-th order nodes corresponding to each group of K-th order nodes. And determining the common node according to the connection relation information of the K-order node in the i +3 th data table after the duplication is removed. And calculating the compactness of the connection relation information of the K-order node in the i +3 th data table after the duplication is removed. And distributing the common nodes according to the compactness of the connection relation information of the K-order nodes in the i +3 th data table after the duplication is removed.

According to the searching device of the complete subgraph data, the connection relation information of each node and adjacent nodes in the network nodes is stored in the first data table, the stored connection relation information of the nodes is expanded on the basis of the first data table to obtain the connection relation information of each node, the connection relation information of the nodes meeting the complete subgraph conditions is screened out according to the first preset screening condition and the second preset screening condition, and the node group capable of forming the complete subgraph is directly obtained, so that the complete subgraph can be quickly searched out, the iteration efficiency is improved, and the time required for finding out all the complete subgraphs of the complex network is reduced.

It can be understood that the complete sub-graph data search apparatus 200 according to the embodiment of the present invention may correspond to the execution main body of the complete sub-graph data search method according to the embodiment of the present invention in fig. 1, and specific details of operations and/or functions of each module/unit of the complete sub-graph data search apparatus 200 may refer to the description of the corresponding part in the complete sub-graph data search method according to the embodiment of the present invention in fig. 1, and for brevity, no further description is provided here.

Fig. 3 is a schematic diagram of a hardware structure of a complete subgraph data search device according to an embodiment of the present invention.

As shown in fig. 3, the search device 300 of the complete subgraph data in the present embodiment includes an input device 301, an input interface 302, a central processor 303, a memory 304, an output interface 305, and an output device 306. The input interface 302, the central processing unit 303, the memory 304, and the output interface 305 are connected to each other through a bus 310, and the input device 301 and the output device 306 are connected to the bus 310 through the input interface 302 and the output interface 305, respectively, and further connected to other components of the complete subgraph data search device 300.

Specifically, the input device 301 receives input information from the outside and transmits the input information to the central processor 303 through the input interface 302; central processor 303 processes the input information based on computer-executable instructions stored in memory 304 to generate output information, stores the output information temporarily or permanently in memory 304, and then transmits the output information to output device 306 through output interface 305; the output device 306 outputs the output information to the outside of the search device 300 of the full sub-picture data for use by the user.

That is, the search apparatus of the complete sub-picture data shown in fig. 3 may also be implemented to include: a memory storing computer-executable instructions; and a processor that, when executing computer executable instructions, may implement the method of searching for full sub-graph data described in connection with the example shown in fig. 1.

In one embodiment, the search apparatus 300 of full sub-graph data shown in fig. 3 includes: a memory 304 for storing programs; a processor 303, configured to execute a program stored in the memory to perform the method for searching complete subgraph data provided in the embodiment shown in fig. 1.

An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium has computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement the method for searching complete subgraph data provided by the embodiment shown in fig. 1.

It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic Circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuits, semiconductor Memory devices, Read-Only memories (ROMs), flash memories, erasable ROMs (eroms), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims

1. A method for searching complete subgraph data, the method comprising:

acquiring connection relation information between each node and adjacent nodes in the network nodes, and storing the connection relation information as a first data table;

determining an ith data table according to the first data table, wherein the ith data table is an extended data table of the first data table and comprises connection relation information of a first K-order node, wherein i is 2, 3 and 4 … … M, i has an initial value of 2, K is 3, 4 and 5 … … N, and K has an initial value of 3;

determining connection relation information of a second K-order node in the ith data table according to the ith data table and a first preset screening condition, and storing the connection relation information as an (i + 1) th data table;

determining an i +2 data table according to the i +1 data table and the first data table, wherein the i +2 data table comprises connection relation information of a first K + 1-order node;

determining connection relation information of second K + 1-order nodes in the i +2 data table according to the i +2 data table and a second preset screening condition, and storing the connection relation information of first K nodes in the second K + 1-order nodes as an i +3 data table, wherein any two nodes in the first K nodes in the second K + 1-order nodes are connected with each other;

when the (i + 3) th data table exists, i is equal to i +4, and K is equal to K + 1.

2. The method according to claim 1, wherein the obtaining and storing connection relation information of each node in the network nodes and the neighboring nodes as a first data table comprises:

and extracting each node and adjacent nodes in the network nodes and the edge weight of each node and adjacent nodes to form the connection relation information of each node and adjacent nodes, and storing the connection relation information as the first data table.

3. The method of claim 1, wherein when i is 2, determining an ith data table from the first data table comprises:

and self-connecting the first data table, and determining the ith data table.

4. The method according to claim 1, wherein the first predetermined filtering condition is to reserve connection relationship information of a first K-th node where a kth node is different from a K-2 nd node as connection relationship information of a second K-th node.

5. The method of claim 1, wherein determining the (i + 2) th data table from the (i + 1) th data table and the first data table comprises:

and externally connecting the (i + 1) th data table and the first data table, and determining the (i + 2) th data table.

6. The method according to claim 1, wherein the second preset filtering condition is that when the first K nodes of any two first K + 1-th nodes of the K-2 first K + 1-th nodes are the same and the order of the first K nodes is the same, and the K + 1-th node of each first K + 1-th node is the same as any one node of the first K-2-th nodes, the connection relationship information of the K-2 first K + 1-th nodes is retained as the connection relationship information of the second K + 1-th node.

7. The method according to claim 1, wherein after determining the connection relationship information of the second K +1 th node in the i +2 th data table according to the i +2 th data table and the second preset screening condition, and storing the connection relationship information of the first K nodes in the second K +1 th node as the i +3 th data table, the method further comprises:

removing duplication of the connection relation information of the K-order nodes with the same node in the (i + 3) th data table to obtain the connection relation information of the K-order nodes corresponding to each group of K-order nodes;

determining a common node according to the connection relation information of the K-order node in the i +3 th data table after the duplication removal;

calculating the compactness of the connection relation information of the K-order node in the i +3 th data table after the duplication is removed;

and distributing the common nodes according to the compactness of the connection relation information of the K-order nodes in the i +3 th data table after the duplication is removed.

8. An apparatus for searching complete sub-graph data, the apparatus comprising:

the acquisition module is used for acquiring the connection relation information between each node and the adjacent node in the network nodes and storing the connection relation information as a first data table;

the determining module is used for determining an ith data table according to the first data table, wherein the ith data table is an extended data table of the first data table and comprises connection relation information of a first K-order node, i is 2, 3 and 4 … … M, the initial value of i is 2, K is 3, 4 and 5 … … N, and the initial value of K is 3;

the determining module is further used for determining connection relation information of a second K-order node in the ith data table according to the ith data table and a first preset screening condition, and storing the connection relation information as an i +1 th data table;

the determining module is further configured to determine an i +2 th data table according to the i +1 th data table and the first data table, where the i +2 th data table includes connection relationship information of a first K +1 order node;

the determining module is further configured to determine connection relationship information of a second K + 1-order node in the i +2 data table according to the i +2 data table and a second preset screening condition, and store the connection relationship information of first K nodes in the second K + 1-order node as an i +3 data table, where any two nodes in the first K nodes in the second K + 1-order node are connected with each other;

and the processing module is used for determining that i is i +4 and K is K +1 when the i +3 th data table exists.

9. The apparatus according to claim 8, wherein the first predetermined filtering condition is to reserve connection relationship information of a first K-th node where a kth node is different from a K-2 nd node as connection relationship information of a second K-th node.

10. The apparatus according to claim 8, wherein the second predetermined filtering condition is to keep the connection relationship information of the K-2 first K + 1-th nodes as the connection relationship information of the second K + 1-th node when the first K nodes of any two first K + 1-th nodes of the K-2 first K + 1-th nodes are the same and the order of the first K nodes is the same, and the K + 1-th node of each first K + 1-th node is the same as any one of the first K-2 nodes.

11. The apparatus of claim 8, wherein the determining module is further configured to:

after determining the connection relationship information of the second K + 1-order node in the i +2 data table according to the i +2 data table and a second preset screening condition and storing the connection relationship information of the previous K nodes in the second K + 1-order node as an i +3 data table, removing the connection relationship information of the K-order nodes with the same node in the i +3 data table to obtain the connection relationship information of the K-order nodes corresponding to each group of K-order nodes;

12. A device for searching complete subgraph data, characterized in that it comprises: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements a method of searching full sub-graph data according to any of claims 1-7.

13. A computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, implement the method of searching full sub-graph data according to any of claims 1-7.