CN109543077B - Community search method - Google Patents

Community search method Download PDF

Info

Publication number
CN109543077B
CN109543077B CN201811205006.7A CN201811205006A CN109543077B CN 109543077 B CN109543077 B CN 109543077B CN 201811205006 A CN201811205006 A CN 201811205006A CN 109543077 B CN109543077 B CN 109543077B
Authority
CN
China
Prior art keywords
community
nodes
node
search
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811205006.7A
Other languages
Chinese (zh)
Other versions
CN109543077A (en
Inventor
王朝坤
竺俊超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201811205006.7A priority Critical patent/CN109543077B/en
Publication of CN109543077A publication Critical patent/CN109543077A/en
Priority to PCT/CN2019/111419 priority patent/WO2020078370A1/en
Application granted granted Critical
Publication of CN109543077B publication Critical patent/CN109543077B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying

Abstract

The embodiment of the invention discloses a community searching method, which comprises the following steps: according to the requirement of a user on community search, a node corresponds to a node variable, and a corresponding search condition is written out; converting the search condition into a plurality of search terms; performing a community search of a single condition for each search term; and merging the results of each single-term community search, and returning the union set of community results. The invention unifies the search conditions into a Boolean expression form, which is convenient for the user to perform personalized expression search requirements and the execution of community search under complex conditions; because nodes which the user does not want to appear in the community are considered in the searching process, the result is more consistent with the expectation of the user; because the community is allowed to at least contain the requirement of one given node, one search condition can obtain a plurality of different community results, and the search condition is met, so that the results are more abundantly selected by the user; a plurality of different implementation modes are provided, and selection can be carried out according to actual needs.

Description

Community search method
Technical Field
The invention relates to the technical field of search, in particular to a community search method under a complex condition.
Background
Network structures formed by a large number of nodes and connection relations among the nodes are widely present in the fields of computer science, biology, sociology and the like. In network-related research work, communities (communities) are receiving continuous attention. A community generally refers to a sub-graph in which the connections between nodes within the community are more compact than the connections between nodes within and outside the community. The community structure excavated in the network is beneficial to friend recommendation, criminal group identification and protein function prediction of people. Community search (local community discovery) refers to given one or more nodes, looking for communities that contain them, which focus more on local network structure and return more personalized community results than global community discovery.
The current community searching method is mainly based on specific topological structures such as k-clique, k-core, k-tress and the like, and in addition, the community searching method partially comprehensively considers the topological structures and node attributes.
Communities found by the method based on the k-tress structure need to satisfy the following properties: 1. the number of triangles on which each edge is located is not less than k-2; 2. any two sides can be reached through a series of adjacent triangles. A typical method is to record the truss values of all adjacent edges around each node, organize the adjacent points of each node into an index of a tree structure according to the truss values of the edges, which is called TCP-index, and continuously find out neighbor nodes which can be expanded from the index according to the given nodes and k values until the neighbor nodes cannot be expanded, so as to obtain a community containing the k-truss structure of the given nodes.
A typical method for comprehensively considering the topological structure and the node attributes, such as an AGAR method, is to supplement edges to an original image according to the attribute similarity between nodes, so as to construct a TA-graph, and then perform community search on the TA-graph according to a k-tress structure, so as to finally obtain a community containing a given node.
The current community search method has the disadvantage that only communities containing a given set of nodes can be found. When we actually do community search, we often meet the following requirements: 1. communities are not only allowed to contain some given points, but are not allowed to contain other given points; 2. a community is to contain at least any one of a given number of nodes. The existing community searching method cannot meet the requirements.
Accordingly, there is a need in the art for improvements.
Disclosure of Invention
The embodiment of the invention aims to solve the technical problem that: provided is a community search method to solve the problems of the prior art, the community search method comprising:
according to the requirement of a user on community search, a node corresponds to a node variable, and a corresponding search condition is written out;
converting the search condition into a plurality of search terms;
performing a community search of a single condition for each search term;
and merging the results of each single conditional community search, namely returning the union set of community results.
In another embodiment of the above community search method according to the present invention, the step of writing the corresponding search condition by associating the node with the node variable according to the requirement of the user for the community search includes:
corresponding nodes mentioned in user requirements to be Boolean variables;
nodes that appear in the community are not allowed to be logically unmodified, nodes that the community must contain are unmodified;
nodes which must be simultaneously present in the community are connected in a logic mode, and nodes which must be contained in the community and nodes which are not allowed to be contained in the community are also connected in a logic mode;
a community must contain several nodes, at least one of which needs to be represented in logic or connected.
In another embodiment of the above community search method according to the present invention, the converting the search condition into a plurality of search terms includes:
enumerating the node variable value combination meeting the search condition so as to obtain a main disjunctive normal form equivalent to the search condition;
simplifying the main extraction model into a simplest AND/OR formula through a Quine-McCluskey algorithm;
setting each conjunct term of the simplest AND or formula as a search term;
and extracting a variable which has no logic non-modification and has the largest frequency of occurrence in different conjuncts, combining the conjuncts into a new search term if the conjuncts containing the variable are more than 1, and repeating the step until no conjuncts can be combined. .
In another embodiment of the above-mentioned community search method according to the present invention, the performing a community search with a single condition for each search term includes:
for the search terms in the form of conjunctive expression, nodes which are necessarily contained in the community and nodes which are not allowed to be contained in the community are respectively arranged into a necessary node set and a forbidden node set to be used as input of a single conditional community search process;
for search terms obtained by combining a plurality of conjunctions, arranging one or more extracted public node variables into a necessary node set and a forbidden node set according to whether the common node variables can appear in the community as input of a single condition community search process, and using the rest part of the common node variables for judging output results;
and carrying out single condition community search, and searching community results from the network graph by using the necessary node set and the forbidden node set, so that the obtained community contains the necessary node set and does not contain nodes in the forbidden node set.
In another embodiment of the community search method according to the present invention, the performing a single conditional community search, and searching a community result from a network graph by using a necessary node set and a prohibited node set, so that the obtained community includes the necessary node set, and does not include a node in the prohibited node set, includes three implementation manners, which are respectively: a filtered community search mode, a weighted filtering mode and a search and filtering mode.
In another embodiment of the community search method according to the present invention, the filtered community search mode includes:
deleting the forbidden node set from the network graph to obtain the network graph without the forbidden nodes;
and carrying out community search on the new network graph by using the necessary node set as input.
In another embodiment of the above community search method according to the present invention, the weighted filtering manner includes:
assigning numerical weights to all nodes in the network graph, making necessary nodes be 1, forbidden nodes be-1 and other nodes be 0;
except for necessary nodes and forbidden nodes, the weight of each node is updated iteratively and is assigned as the mean value of the weights of all neighbor nodes, namely:
Figure BDA0001831007130000041
setting a node weight threshold lambda, reserving nodes with the node weight more than or equal to lambda, and extracting a derived subgraph of the nodes in the original network graph to serve as a new network graph;
a01, a community result C is input by necessary node sets;
a02, obtaining a derived subgraph corresponding to the community result according to the given network graph, and stopping and returning the community result if the derived subgraph of the community result C has only one connected component and the node degree of all the nodes in the derived subgraph is more than or equal to a given threshold k;
a03, according to the connected components of the derived subgraph, dividing the nodes in the same component into the same group;
a04, dividing neighbor nodes of all nodes in the community result C into a Candidate node set Candidate, and excluding nodes already existing in the community result C;
a05, if Candidate is empty, setting the community result C as an empty set, and going to step a 08;
a06, recording the number a of different connected components connected to each node in the Candidate node set Candidate, the number b of nodes in the connected community result C, the point number d of the node in a given network diagram, and then performing multi-keyword descending order on the nodes in the node set according to a, b and d;
a07, if the degree of the Candidate node c arranged at the head is smaller than the threshold k, removing the node c from the Candidate node set Candidate, and turning to a06, otherwise, adding the node c into the community result, simultaneously adding the neighbor node of the node into the Candidate node set Candidate, removing the node c from the Candidate node set Candidate, and turning to a 02;
a08, assigning nodes of the whole network graph into a community result C;
a09, if the number of connected components of the derived subgraph of the community result C is more than 1, stopping and returning to the empty set, and if the degree of the minimum point in the derived subgraph of the community result C is more than or equal to a threshold k, stopping and returning to the community result C;
a10, deleting the nodes with the degree lower than k in the derived subgraph of the community result C from the community result C, stopping and returning to an empty set if the deleted nodes are members of the necessary node set, otherwise, turning to the step a 09.
In another embodiment of the above-mentioned community search method according to the present invention, the community search for the new network graph using the necessary node set as an input includes:
b01, a community result C is input by the necessary node set;
b02, obtaining a derived subgraph corresponding to the community result according to the given network graph, and stopping and returning the community result if the derived subgraph of the community result C only has one connected component and the node degree of all the nodes in the derived subgraph is more than or equal to a given threshold k;
b03, according to the connected components of the derived subgraph, dividing the nodes in the same component into the same group;
b04, dividing neighbor nodes of all nodes in the community result C into a Candidate node set Candidate, and excluding nodes already existing in the community result C;
b05, if Candidate is empty, setting the community result C as an empty set, and going to step b 08;
b06, recording the number a of different connected components connected to each node in the Candidate node set Candidate, the number b of nodes in the connected community result C, the number d of the nodes in a given network graph, and then performing multi-keyword descending order sorting on the nodes in the node set;
b07, if the degree of the Candidate node c arranged at the head is smaller than the threshold k, removing the node c from the Candidate node set Candidate, and turning to the step b06, otherwise, adding the node c into the community result, simultaneously adding the neighbor node of the node into the Candidate node set Candidate, removing the node c from the Candidate node set Candidate, and turning to the step b 02;
b08, assigning the nodes of the whole network graph into a community result C;
b09, if the number of connected components of the derived subgraph of the community result C is more than 1, stopping and returning the empty set, and if the degree of the minimum point in the derived subgraph of the community result C is more than or equal to a threshold k, stopping and returning the community result C;
b10, deleting the nodes with the degree lower than k in the derived subgraph of the community result C from the community result C, stopping and returning to an empty set if the deleted nodes are members of the necessary node set, otherwise, turning to the step b 09.
In another embodiment of the above community search method according to the present invention, the filtering while searching includes:
c01, a community result C is input by the necessary node set;
and C02, obtaining a derived subgraph corresponding to the community result C according to the given network graph. If the derived subgraph of the community result C only has one connected component and the node degree of all the nodes in the derived subgraph is more than or equal to a given threshold value k, stopping and returning the community result C;
c03, according to the connected components of the derived subgraph, dividing the nodes in the same component into the same group;
c04, dividing neighbor nodes of all nodes in the community result C into a Candidate node set Candidate, and excluding nodes and prohibited nodes which exist in the community result C;
c05, if Candidate is empty, setting the community result C as an empty set, and going to step C08;
c06, recording the number a of different connected components connected to each node in the Candidate node set Candidate, the number b of nodes in the connected community result C, and the number d of nodes connected with non-forbidden nodes in a given network graph; performing multi-keyword descending ordering on the nodes in the node set according to the a, the b and the d;
c07, if the degree of the Candidate node C arranged at the head is smaller than the threshold k, removing the node C from the Candidate node set Candidate, and turning to the step C06, otherwise, adding the node C into the community result C, simultaneously adding the neighbor node of the node into the Candidate node set Candidate, removing the node C from the Candidate node set Candidate, and turning to the step C02;
c08, assigning the nodes of the whole network graph into the community result C and deleting forbidden nodes;
c09, if the number of connected components of the derived subgraph of the community result C is more than 1, stopping and returning to the empty set, and if the degree of the minimum point in the derived subgraph of the community result C is more than or equal to a threshold k, checking whether the community result C contains forbidden nodes at the moment;
and C10, deleting the nodes with the degree lower than k in the derived subgraph of the community result C from the community result C, stopping and returning to an empty set if the deleted nodes are members of the necessary node set, and otherwise, turning to the step C09.
Compared with the prior art, the invention has the following advantages:
the invention provides a community search method, which expresses search conditions in a uniform Boolean expression form, is convenient for users to perform personalized expression search requirements and is also convenient for the execution of community search under complex conditions; because the nodes which the user does not want to appear in the community are considered in the community searching process, the community result obtained by searching is more in line with the expectation of the user, and the community result is more personalized; the requirement that at least one of the given nodes is contained in the community is allowed to be considered, so that one search condition can obtain a plurality of different community results and the search condition is met, and the results are more abundantly selected by a user; a plurality of different implementation modes are provided, and selection can be carried out according to actual needs.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
The invention will be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:
FIG. 1 is a flow diagram of one embodiment of a community search method of the present invention;
FIG. 2 is a flow diagram of another embodiment of a community search method of the present invention;
FIG. 3 is a flow diagram of yet another embodiment of a community search method of the present invention;
FIG. 4 is a flow diagram of yet another embodiment of a community search method of the present invention;
FIG. 5 is a flow diagram of yet another embodiment of a community search method of the present invention;
FIG. 6 is a flow diagram of yet another embodiment of a community search method of the present invention;
FIG. 7 is a flow diagram of yet another embodiment of a community search method of the present invention;
FIG. 8 is a flow chart of yet another embodiment of the community search method of the present invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
Fig. 1 is a flowchart of an embodiment of a community search method according to the present invention, as shown in fig. 1, the community search method includes:
10, according to the requirement of a user on community search, corresponding nodes to node variables, writing corresponding search conditions, wherein the node variables are also called Boolean variables, and the search conditions are expressed by Boolean expressions;
20, converting the search condition into a plurality of search terms;
30, carrying out single-term community search for each search term;
and 40, merging the results of the single conditional community search, namely returning the union set of community results.
Fig. 2 is a flowchart of another embodiment of the community search method of the present invention, as shown in fig. 2, the mapping nodes to node variables according to the requirement of the user for community search, and writing the corresponding search condition includes:
101, corresponding nodes mentioned in user requirements to be Boolean variables;
102, the nodes appearing in the community are not allowed to be modified in a logic way, the community must contain nodes without modification, and the expression symbol of the logic way is
Figure BDA0001831007130000081
103, nodes that must be present in the community at the same time to be logically and-connected, nodes that the community must contain and nodes that are not allowed to contain, also being logically and-connected, said logical and being represented by the symbol "Λ", for example: boolean expression
Figure BDA0001831007130000082
Indicating that the user wants the community to have to contain node a and node B, and not allow node C. (ii) a
104, the community must contain at least one of several nodes that need to be represented by a logical or a linked several nodes, the logical or representation symbol being a "V-cut", such as: the Boolean expression A V-B indicates that the user desires the community to contain at least one of node A, node B, and node C.
Fig. 3 is a flowchart of a community search method according to another embodiment of the present invention, where the converting the search condition into a plurality of search terms, as shown in fig. 3, includes:
enumerating node variable value combinations meeting the search conditions so as to obtain a main disjunctive normal form equivalent to the search conditions;
202, simplifying the main extraction model into a simplest AND/OR formula through a Quine-McCluskey algorithm;
203, setting each conjunction term of the simplest and or formula as a search term, for example: the simplest and-or (a Λ B) V-V (C Λ D) contains two search terms, respectively (a Λ B) and (C Λ D), which can be combined into a new search term if several search terms are found to have the same node variable, for example: simplest form of AND
Figure BDA0001831007130000091
Two search terms (A ^ B) and
Figure BDA0001831007130000092
containing the same node variable A, the common node variable A can then be extracted, and the two conjuncts combined into
Figure BDA0001831007130000093
In order to reduce the number of search terms, thereby reducing the frequency of subsequent single-term conditional community search processes and achieving the purpose of saving time and expense;
and 204, extracting a variable which has the largest occurrence frequency in different conjuncts and has no logic non-modification, combining the conjuncts into a new search term if the conjuncts containing the variable are more than 1, and repeating the step until no conjuncts can be combined.
Fig. 4 is a flowchart of a further embodiment of the community search method of the present invention, and as shown in fig. 4, the performing of the community search of the single term condition for each search term includes:
301, for the search term in the form of conjunctive formula, the nodes that the community must contain and the nodes that the community is not allowed to contain are respectively sorted into the necessary node set and the prohibited node set as the input of the single conditional community search process, because it only contains the nodes that must appear and the nodes that are not allowed to appear in the community;
302, for a search term obtained by merging a plurality of conjuncts, arranging one or more extracted common node variables into a necessary node set and a prohibited node set according to whether the common node variables can appear in the community as input of a single conditional community search process, and using the remaining part to judge an output result, for example: in two search terms (A ^ B) and
Figure BDA0001831007130000094
merged search terms
Figure BDA0001831007130000095
The necessary node set { A } is used as input to the process of single conditional community search, i.e. for finding the community containing node A,
Figure BDA0001831007130000096
a discriminant as an output result, namely, the discriminant is used for judging whether the community result contains the node B or does not contain the node D;
303, carrying out single condition community search, and searching community results from the network graph by using the necessary node set and the forbidden node set, so that the obtained community contains the necessary node set and does not contain nodes in the forbidden node set.
The method for carrying out the single condition community search and searching the community result from the network graph by using the necessary node set and the forbidden node set so that the obtained community contains the necessary node set and does not contain the node in the forbidden node set comprises three implementation modes, wherein the three implementation modes are respectively as follows: a filtered community search mode, a weighted filtering mode and a search and filtering mode.
The filtered community search mode comprises the following steps:
deleting the forbidden node set from the network graph to obtain the network graph without the forbidden nodes;
and carrying out community search on the new network graph by using the necessary node set as input.
Fig. 5 is a flowchart of another embodiment of the community searching method of the present invention, and as shown in fig. 5, the weighted filtering manner includes:
401, assigning numerical weights to all nodes in the network graph, making necessary nodes 1, prohibited nodes-1, and the rest nodes 0, that is:
Figure BDA0001831007130000101
and 402, iteratively updating the weight of each node except for the necessary node and the forbidden node, and assigning the weight of each node to be the average value of the weights of all the neighbor nodes, namely:
Figure BDA0001831007130000102
403, setting a node weight threshold lambda, reserving nodes with the node weight more than or equal to lambda, and extracting a derived subgraph of the nodes in the original network graph to serve as a new network graph;
404, a community search is performed on the new network graph with the set of necessary nodes as input.
FIG. 6 is a flowchart of another embodiment of the community search method of the present invention, as shown in FIG. 6, the performing community search on the new network graph with the necessary node set as input includes:
501, a necessary node set is classified into a community result C;
502, obtaining a derived subgraph corresponding to the community result according to the given network graph, and stopping and returning the community result if the derived subgraph of the community result C only has one connected component and the node degree of all the nodes in the derived subgraph is more than or equal to a given threshold value k;
503, according to the connected components of the derived subgraph, dividing the nodes in the same component into the same group;
504, dividing neighbor nodes of all nodes in the community result C into a Candidate node set Candidate, and excluding nodes already existing in the community result C;
505, if the Candidate is empty, setting the community result C as an empty set, and going to step 508;
506, recording the number a of different connected components of each node in the Candidate node set Candidate, the number b of nodes in the connected community result C, the point degree d of the node in a given network graph, and then performing multi-keyword descending order on the nodes in the node set according to a, b and d;
507, if the degree of the Candidate node c ranked at the top is smaller than the threshold k, removing the node c from the Candidate node set Candidate, and going to the step 506, otherwise, adding the node c into the community result, simultaneously adding the neighbor node of the node into the Candidate node set Candidate, removing the node c from the Candidate node set Candidate, and going to the step 502;
508, the nodes of the whole network graph are classified into a community result C;
509, if the number of connected components of the derived subgraph of the community result C is greater than 1, stopping and returning to the empty set, and if the degree of the minimum point in the derived subgraph of the community result C is greater than or equal to a threshold k, stopping and returning to the community result C;
and 510, deleting the nodes with the degree lower than k in the derived subgraph of the community result C from the community result C, stopping and returning to an empty set if the deleted nodes are members of the necessary node set, and otherwise, turning to the step 509.
FIG. 7 is a flowchart of another embodiment of the community search method of the present invention, as shown in FIG. 7, the community search on the new network graph using the necessary node set as input includes:
601, a community result C is obtained by dividing a necessary node set;
602, obtaining a derived subgraph corresponding to the community result according to a given network graph, and stopping and returning the community result if the derived subgraph of the community result C only has one connected component and the node degree of all nodes in the derived subgraph is more than or equal to a given threshold value k;
603, according to the connected components of the derived subgraph, dividing the nodes in the same component into the same group;
604, the neighbor nodes of all the nodes in the community result C are classified into a Candidate node set Candidate, and the nodes existing in the community result C are eliminated;
605, if Candidate, setting the community result C as an empty set, and going to step 608;
606, recording the number a of different connected components of each node in the Candidate node set Candidate, the number b of nodes in the community result C, the number d of the nodes in a given network graph, and then performing multi-keyword descending order on the nodes in the node set according to a, b and d;
607, if the degree of the Candidate node c ranked at the top is less than the threshold k, removing the node c from the Candidate node set Candidate, and going to step 606, otherwise, adding the node c into the community result, simultaneously adding the neighbor node of the node into the Candidate node set Candidate, and removing the node c from the Candidate node set Candidate, and going to step 602;
608, the nodes of the whole network graph are classified into a community result C;
609, if the number of the connected components of the derived subgraph of the community result C is more than 1, stopping and returning to the empty set, and if the degree of the minimum point in the derived subgraph of the community result C is more than or equal to a threshold k, stopping and returning to the community result C;
and 610, deleting the nodes with the degree lower than k in the derived subgraph of the community result C from the community result C, stopping and returning to an empty set if the deleted nodes are members of the necessary node set, and otherwise, turning to the step 609.
Fig. 8 is a flowchart of a community search method according to another embodiment of the present invention, and as shown in fig. 8, the filtering manner while searching includes:
701, distributing necessary node sets into a community result C;
and 702, obtaining a derived subgraph corresponding to the community result C according to the given network graph. If the derived subgraph of the community result C only has one connected component and the node degree of all the nodes in the derived subgraph is more than or equal to a given threshold value k, stopping and returning the community result C;
703, according to the connected component of the derived subgraph, dividing nodes in the same component into the same group;
704, dividing neighbor nodes of all nodes in the community result C into a Candidate node set Candidate, and excluding nodes and forbidden nodes which already exist in the community result C;
705, if the Candidate is empty, setting the community result C as an empty set, and going to step 708;
706, for each node in the Candidate node set Candidate, recording the number a of different connected components connected with the node, the number b of nodes in the connected community result C, and the number d of nodes connected with non-forbidden nodes in a given network graph; performing multi-keyword descending ordering on the nodes in the node set according to the a, the b and the d;
707, if the degree of the Candidate node C ranked at the top is smaller than the threshold k, removing the node C from the Candidate node set Candidate, and going to step 706, otherwise, adding the node C into the community result C, and simultaneously adding the neighbor node of the node into the Candidate node set Candidate, and removing the node C from the Candidate node set Candidate;
708, assigning the nodes of the whole network graph into a community result C and deleting forbidden nodes;
709, if the number of connected components of the derived subgraph of the community result C is more than 1, stopping and returning the empty set, and if the degree of the minimum point in the derived subgraph of the community result C is more than or equal to a threshold k, stopping and returning the community result C;
and 710, deleting the nodes with the degree lower than k in the derived subgraph of the community result C, stopping and returning to an empty set if the deleted nodes are members of the necessary node set, otherwise, jumping to step 709.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (10)

1. A method of community search, comprising:
according to the requirement of a user for community search, corresponding nodes mentioned in the user requirement to node variables, and writing corresponding search conditions;
converting the search condition into a plurality of search terms;
performing a single conditional community search for each search term;
and merging the results of the single conditional community searches, namely returning the union set of the results of the single conditional community searches.
2. The community search method according to claim 1, wherein the step of corresponding nodes to node variables according to the requirement of the user for community search comprises:
corresponding nodes mentioned in user requirements to be Boolean variables;
nodes that appear in the community are not allowed to be logically unmodified, nodes that the community must contain are unmodified;
nodes which must be simultaneously present in the community are connected in a logic mode, and nodes which must be contained in the community and nodes which are not allowed to be contained in the community are also connected in a logic mode;
a community must contain several nodes, at least one of which needs to be represented in logic or connected.
3. The community search method of claim 2, wherein the converting the search criteria into a plurality of search terms comprises:
enumerating the node variable value combination meeting the search condition so as to obtain a main disjunctive normal form equivalent to the search condition;
simplifying the main extraction model into a simplest AND/OR formula through a Quine-McCluskey algorithm;
setting each conjunct term of the simplest AND or formula as a search term;
and extracting a variable which has no logic non-modification and has the largest frequency of occurrence in different conjuncts, combining the conjuncts into a new search term if the conjuncts containing the variable are more than 1, and repeating the step until no conjuncts can be combined.
4. The community search method according to claim 3, wherein said performing a single-term community search for each search term comprises:
for the search terms in the form of conjunctive expression, nodes which are necessarily contained in the community and nodes which are not allowed to be contained in the community are respectively arranged into a necessary node set and a forbidden node set to be used as input of a single conditional community search process;
for search terms obtained by combining a plurality of conjunctions, arranging one or more extracted public node variables into a necessary node set and a forbidden node set according to whether the common node variables can appear in the community as input of a single condition community search process, and using the rest part of the common node variables for judging output results;
and carrying out single condition community search, and searching community results from the network graph by using the necessary node set and the forbidden node set, so that the obtained community contains the necessary node set and does not contain nodes in the forbidden node set.
5. The community search method according to claim 4, wherein the performing of the single conditional community search and searching the community result from the network graph by using the necessary node set and the prohibited node set, so that the obtained community includes the necessary node set and does not include the node in the prohibited node set, respectively includes three implementation manners: a filtered community search mode, a weighted filtering mode and a search and filtering mode.
6. The community search method of claim 5, wherein the filtered community search mode comprises:
deleting the forbidden node set from the network graph to obtain the network graph without the forbidden nodes;
and carrying out community search on the new network graph by using the necessary node set as input.
7. The community search method according to claim 5, wherein the weighted filtering comprises:
assigning numerical weights to all nodes in the network graph, making necessary nodes be 1, forbidden nodes be-1 and other nodes be 0;
except for necessary nodes and forbidden nodes, the weight of each node is updated iteratively and is assigned as the average value of the weights of all neighbor nodes in the network graph, namely:
Figure FDA0002532128380000031
setting a node weight threshold lambda, reserving nodes with the node weight more than or equal to lambda, and extracting a derived subgraph of the nodes in the original network graph to serve as a new network graph;
and carrying out community search on the new network graph by using the necessary node set as input.
8. The community search method of claim 6, wherein the community search of the new network graph with the necessary node set as input comprises:
a01, a community result C is input by necessary node sets;
a02, obtaining a derived subgraph corresponding to the community result according to the given network graph, and stopping and returning the community result if the derived subgraph of the community result C has only one connected component and the node degree of all the nodes in the derived subgraph is more than or equal to a given threshold k;
a03, according to the connected components of the derived subgraph, dividing the nodes in the same component into the same group;
a04, dividing neighbor nodes of all nodes in the community result C into a Candidate node set Candidate, and excluding nodes already existing in the community result C;
a05, if Candidate is empty, setting the community result C as an empty set, and going to step a 08;
a06, recording the number a of different connected components connected to each node in the Candidate node set Candidate, the number b of nodes in the connected community result C, the point number d of the node in a given network diagram, and then performing multi-keyword descending order on the nodes in the node set according to a, b and d;
a07, if the degree of the Candidate node c arranged at the head is smaller than the threshold k, removing the node c from the Candidate node set Candidate, and turning to a06, otherwise, adding the node c into the community result, simultaneously adding the neighbor node of the node into the Candidate node set Candidate, removing the node c from the Candidate node set Candidate, and turning to a 02;
a08, assigning nodes of the whole network graph into a community result C;
a09, if the number of connected components of the derived subgraph of the community result C is more than 1, stopping and returning to the empty set, and if the degree of the minimum point in the derived subgraph of the community result C is more than or equal to a threshold k, stopping and returning to the community result C;
a10, deleting the nodes with the degree lower than k in the derived subgraph of the community result C from the community result C, stopping and returning to an empty set if the deleted nodes are members of the necessary node set, otherwise, turning to the step a 09.
9. The community search method of claim 7, wherein said community searching a new network graph with a set of essential nodes as input comprises:
b01, a community result C is input by the necessary node set;
b02, obtaining a derived subgraph corresponding to the community result according to the given network graph, and stopping and returning the community result if the derived subgraph of the community result C only has one connected component and the node degree of all the nodes in the derived subgraph is more than or equal to a given threshold k;
b03, according to the connected components of the derived subgraph, dividing the nodes in the same component into the same group;
b04, dividing neighbor nodes of all nodes in the community result C into a Candidate node set Candidate, and excluding nodes already existing in the community result C;
b05, if Candidate is empty, setting the community result C as an empty set, and going to step b 08;
b06, recording the number a of different connected components connected to each node in the Candidate node set Candidate, the number b of nodes in the connected community result C, the number d of the nodes in a given network graph, and then performing multi-keyword descending order sorting on the nodes in the node set;
b07, if the degree of the Candidate node c arranged at the head is smaller than the threshold k, removing the node c from the Candidate node set Candidate, and turning to the step b06, otherwise, adding the node c into the community result, simultaneously adding the neighbor node of the node into the Candidate node set Candidate, removing the node c from the Candidate node set Candidate, and turning to the step b 02;
b08, assigning the nodes of the whole network graph into a community result C;
b09, if the number of connected components of the derived subgraph of the community result C is more than 1, stopping and returning the empty set, and if the degree of the minimum point in the derived subgraph of the community result C is more than or equal to a threshold k, stopping and returning the community result C;
b10, deleting the nodes with the degree lower than k in the derived subgraph of the community result C from the community result C, stopping and returning to an empty set if the deleted nodes are members of the necessary node set, otherwise, turning to the step b 09.
10. The community search method according to claim 5, wherein the filtering while searching comprises:
c01, a community result C is input by the necessary node set;
c02, obtaining a derived subgraph corresponding to the community result C according to the given network graph, and stopping and returning the community result C if the derived subgraph of the community result C only has one connected component and the node degree of all the nodes in the derived subgraph is more than or equal to a given threshold k;
c03, according to the connected components of the derived subgraph, dividing the nodes in the same component into the same group;
c04, dividing neighbor nodes of all nodes in the community result C into a Candidate node set Candidate, and excluding nodes and prohibited nodes which exist in the community result C;
c05, if Candidate is empty, setting the community result C as an empty set, and going to step C08;
c06, recording the number a of different connected components connected to each node in the Candidate node set Candidate, the number b of nodes in the connected community result C, and the number d of nodes connected with non-forbidden nodes in a given network graph; performing multi-keyword descending ordering on the nodes in the node set according to the a, the b and the d;
c07, if the degree of the Candidate node C arranged at the head is smaller than the threshold k, removing the node C from the Candidate node set Candidate, and turning to the step C06, otherwise, adding the node C into the community result C, simultaneously adding the neighbor node of the node into the Candidate node set Candidate, removing the node C from the Candidate node set Candidate, and turning to the step C02;
c08, assigning the nodes of the whole network graph into the community result C and deleting forbidden nodes;
c09, if the number of connected components of the derived subgraph of the community result C is more than 1, stopping and returning to the empty set, and if the degree of the minimum point in the derived subgraph of the community result C is more than or equal to a threshold k, checking whether the community result C contains forbidden nodes at the moment;
and C10, deleting the nodes with the degree lower than k in the derived subgraph of the community result C from the community result C, stopping and returning to an empty set if the deleted nodes are members of the necessary node set, and otherwise, turning to the step C09.
CN201811205006.7A 2018-10-16 2018-10-16 Community search method Active CN109543077B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811205006.7A CN109543077B (en) 2018-10-16 2018-10-16 Community search method
PCT/CN2019/111419 WO2020078370A1 (en) 2018-10-16 2019-10-16 Community search method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811205006.7A CN109543077B (en) 2018-10-16 2018-10-16 Community search method

Publications (2)

Publication Number Publication Date
CN109543077A CN109543077A (en) 2019-03-29
CN109543077B true CN109543077B (en) 2020-07-31

Family

ID=65843813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811205006.7A Active CN109543077B (en) 2018-10-16 2018-10-16 Community search method

Country Status (2)

Country Link
CN (1) CN109543077B (en)
WO (1) WO2020078370A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543077B (en) * 2018-10-16 2020-07-31 清华大学 Community search method
CN113254797B (en) * 2021-04-19 2022-09-20 江汉大学 Searching method, device and processing equipment for social network community
CN116485587B (en) * 2023-04-21 2024-04-09 深圳润高智慧产业有限公司 Community service acquisition method, community service providing method, electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8972557B2 (en) * 2012-02-28 2015-03-03 Samsung Electronics Co., Ltd. Topic-based community index generation apparatus and method and topic-based community searching apparatus and method
US9652875B2 (en) * 2012-10-29 2017-05-16 Yahoo! Inc. Systems and methods for generating a dense graph
CN108268603A (en) * 2017-12-22 2018-07-10 中国电子科技集团公司第三十研究所 A kind of community discovery method based on core member's identification
CN108319728A (en) * 2018-03-15 2018-07-24 深圳大学 A kind of frequent community search method and system based on k-star

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170032044A1 (en) * 2006-11-14 2017-02-02 Paul Vincent Hayes System and Method for Personalized Search While Maintaining Searcher Privacy
US9613164B2 (en) * 2009-09-11 2017-04-04 University Of Maryland, College Park System and method for data management in large data networks
CN103425662B (en) * 2012-05-16 2017-08-25 腾讯科技(深圳)有限公司 Information search method and device in a kind of Web Community
US9461876B2 (en) * 2012-08-29 2016-10-04 Loci System and method for fuzzy concept mapping, voting ontology crowd sourcing, and technology prediction
CN105224555B (en) * 2014-06-12 2019-12-10 北京搜狗科技发展有限公司 Searching method, device and system
US20160063110A1 (en) * 2014-08-29 2016-03-03 Matthew David Shoup User interface for generating search queries
CN104636978B (en) * 2015-02-12 2017-11-14 西安电子科技大学 A kind of overlapping community detection method propagated based on multi-tag
JP6697247B2 (en) * 2015-11-18 2020-05-20 カシオ計算機株式会社 Information processing apparatus, program, and search display method
JP6332243B2 (en) * 2015-11-18 2018-05-30 カシオ計算機株式会社 Information processing apparatus, electronic device, and program
CN106530039A (en) * 2016-10-26 2017-03-22 深圳市亿家信息科技有限公司 Data processing realization method and system of intelligent community
CN109543077B (en) * 2018-10-16 2020-07-31 清华大学 Community search method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8972557B2 (en) * 2012-02-28 2015-03-03 Samsung Electronics Co., Ltd. Topic-based community index generation apparatus and method and topic-based community searching apparatus and method
US9652875B2 (en) * 2012-10-29 2017-05-16 Yahoo! Inc. Systems and methods for generating a dense graph
CN108268603A (en) * 2017-12-22 2018-07-10 中国电子科技集团公司第三十研究所 A kind of community discovery method based on core member's identification
CN108319728A (en) * 2018-03-15 2018-07-24 深圳大学 A kind of frequent community search method and system based on k-star

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Personal Web Revisitation by Context and Content Keywords with Relevance Feedback;Li Jin,Ling Feng,Gangli Liu,Chaokun Wang;《 IEEE Transactions on Knowledge and Data Engineering》;20170701;全文 *
Recommendation for Repeat Consumption from;Jun Chen,Chaokun Wang,Jianmin Wang;《2017 IEEE 33rd International Conference on Data Engineering (ICDE)》;20170422;全文 *
基于社区的动态网络节点介数中心度更新算法;钱珺;《清华大学 软件学报》;20180331;全文 *

Also Published As

Publication number Publication date
WO2020078370A1 (en) 2020-04-23
CN109543077A (en) 2019-03-29

Similar Documents

Publication Publication Date Title
CN107153713B (en) Overlapping community detection method and system based on similitude between node in social networks
US7814105B2 (en) Method for domain identification of documents in a document database
US7801887B2 (en) Method for re-ranking documents retrieved from a document database
CN109543077B (en) Community search method
Murthy et al. Content based image retrieval using Hierarchical and K-means clustering techniques
US7707162B2 (en) Method and apparatus for classifying multimedia artifacts using ontology selection and semantic classification
US6556710B2 (en) Image searching techniques
US20020174095A1 (en) Very-large-scale automatic categorizer for web content
US20110173141A1 (en) Method and apparatus for hybrid tagging and browsing annotation for multimedia content
Bouramoul et al. PRESY: A Context based query reformulation tool for information retrieval on the Web
JP3683758B2 (en) Similar image retrieval system, similar image retrieval method, and recording medium recording similar image retrieval program
US9230210B2 (en) Information processing apparatus and method for obtaining a knowledge item based on relation information and an attribute of the relation
JPH1049542A (en) Picture registering device and method therefor
JP3692416B2 (en) Information filtering method and apparatus
CN114428834B (en) Retrieval method, retrieval device, electronic equipment and storage medium
KR20210006661A (en) Animaiton contents resource service system based on intelligent informatin technology
Tran et al. VideoGraph: A graphical object-based model for representing and querying video data
Bartolini et al. Imagination: exploiting link analysis for accurate image annotation
Mezaris et al. Combining textual and visual information processing for interactive video retrieval: SCHEMA's participation in TRECVID 2004
CN113849462A (en) Intelligent recommendation method and system for network materials, computer equipment and medium
Gallas et al. Negative relevance feedback for improving retrieval in large-scale image collections
KR102248570B1 (en) Fuzzy-based multimedia contents retrieval method using mood tags and their synonyms in social networks
CN111274265B (en) Method and device for fusion retrieval based on multiple retrieval modes
Lindley et al. A specification language for dynamic virtual video sequence generation
Zhan et al. A recommendation algorithm based on fuzzy clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant