CN110119462B - Community search method of attribute network - Google Patents

Community search method of attribute network Download PDF

Info

Publication number
CN110119462B
CN110119462B CN201910266196.1A CN201910266196A CN110119462B CN 110119462 B CN110119462 B CN 110119462B CN 201910266196 A CN201910266196 A CN 201910266196A CN 110119462 B CN110119462 B CN 110119462B
Authority
CN
China
Prior art keywords
vertex
circle
node
core
vertices
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910266196.1A
Other languages
Chinese (zh)
Other versions
CN110119462A (en
Inventor
曲强
罗捷桓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Zhongke advanced technology development Co.,Ltd.
Original Assignee
Hangzhou Zhongke Advanced Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Zhongke Advanced Technology Research Institute Co ltd filed Critical Hangzhou Zhongke Advanced Technology Research Institute Co ltd
Priority to CN201910266196.1A priority Critical patent/CN110119462B/en
Publication of CN110119462A publication Critical patent/CN110119462A/en
Application granted granted Critical
Publication of CN110119462B publication Critical patent/CN110119462B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/909Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/954Navigation, e.g. using categorised browsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Remote Sensing (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Library & Information Science (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a community searching method of an attribute network. The method comprises the following steps: defining a search area range according to the spatial position of a network user; searching a target community according to the connection closeness among network users in the attribute network, wherein the space position of the users in the target community is within the range of the defined search area. According to the method, the target community meeting the structural cohesion and the spatial cohesion can be effectively searched, and the method is used for behavior analysis, recommendation, disease prediction and the like of social network users.

Description

Community search method of attribute network
Technical Field
The invention relates to the technical field of community search, in particular to a community search method of an attribute network.
Background
Attribute networks are used to model a variety of networks, including social networks, knowledge graphs, and protein interaction networks, among others. The increasing amount of data and the rich nature of these networks have presented enormous challenges to community search and have attracted much attention in recent years. Research on finding communities can be divided into community detection and community search. Community detection methods are commonly used to discover communities in a social network based on predefined implicit criteria, while community search is an online approach to finding cohesive communities that meet a given set of explicit criteria, such as k-core (k-kernel) and k-tress based community search.
Spatial attributes are one of the most important and useful features in attribute networks. In a spatially aware network, each node is accompanied by spatial information, e.g., social networks such as Twitter and Foursquare can be modeled by networks in which each node (i.e., user) has one or more locations (e.g., a current location or a historical enrollment location). By searching the community in view of the user's location information, understanding of the user's behavior can be changed from a virtual world to reality.
However, in the prior art, only non-attribute networks are generally considered, and rich information of vertices in attribute networks is ignored. In addition, communities in space-aware networks have been searched using various measures of structural cohesion, which is a query constraint in existing research, e.g., for k-core or k-tress measures, users need to specify a value of k in community search, but without considering spatial closeness between users.
Therefore, there is a need for improvement in the prior art to search out a web community that takes structural cohesion and spatial compactness into consideration, and further improve the efficiency of community search.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a community searching method for the attribute network.
According to a first aspect of the invention, a community searching method of a property network is provided. The method comprises the following steps:
step S1: defining a search area range according to the spatial position of a network user;
step S2: searching a target community according to the connection closeness among network users in the attribute network, wherein the spatial position of the users in the target community is within the range of the defined search area.
In one embodiment, step S1 includes the following sub-steps:
the attribute network is characterized by an undirected connected graph G ═ V, E and S, wherein V represents a vertex set, E represents an edge set, S represents a space position set, and the vertex represents a network user;
in the undirected connected graph G, searching for a target community represented by a connected subgraph, wherein the vertex position of the subgraph can be surrounded by a circle with the diameter D and relative to other subgraphs of the undirected connected graph G, the vertex in the subgraph forms the highest-order k-core.
In one embodiment, in step S2, the target communities represented in the connected subgraph are searched according to the following steps:
step S21: constructing a quadtree index structure for the undirected connected graph G, wherein a root node corresponds to the whole space of G;
step S22: traversing the quad-tree index structure to obtain all nodes with the side length smaller than D and the parent node with the side length larger than D, and storing the nodes in a node list nodeList;
step S23: for each node in the node list nodeList, the maximum number of cores k is obtainedcur
Step S24: prune N.DistMap [ k ] from node listcur]>D node N, wherein Ncur]A distance map representing node N;
step S25: for the remaining nodes in nodeList, sorting in ascending order according to the upper bound of the number of kernels and verifying in sequence to search out the nodes satisfying the k-core with the highest order and capable of being surrounded by a circle with diameter D.
In one embodiment, in step S25, for a node N in the node list nodeList, the following steps are used for verification:
expanding N by length D, performing kernel decomposition in the expanded square region and neglecting that the number of kernels is less than kcurThe vertex of (1);
verifying whether remaining vertices in the expanded square region have an order higher than kcurIf so, record the k-core and update kcur
In one embodiment, the following steps are used to verify whether the remaining vertices in the expanded square region have an order higher than kcurK-core of (2):
for one vertex in node N, place it on the boundary of a circle of diameter D and rotate the circle;
when a new vertex enters the circle, checking whether the order is higher than kcurK-core of (1).
In one embodiment, the following steps are used to verify whether the remaining vertices in the expanded square region have an order higher than kcurK-core of (2):
dividing the expanded square area into m × m cells, and searching for k-core in the expanded square area using a square covering s × s cells that can enclose a circle having a diameter D, where s, m are positive integers and s is smaller than m.
In one embodiment, the following steps are taken to verify the extensionWhether the remaining vertexes in the square region of the exhibition have orders higher than kcurK-core of (2):
for one vertex in node N, place it on the boundary of a circle of diameter D and rotate the circle;
when rotating a circle, k is satisfied when a new vertex into the circlecCore, stop rotation, where kcRepresenting the number of currently verified cores.
In one embodiment, the target communities represented in the connected subgraph are searched according to the following steps:
searching all circles with the diameter D in the undirected connectivity graph G;
for all searched circles, checking the maximum number of kernels of the vertices that can be surrounded by the circles and regarding the vertices surrounded by the circle with the maximum number of kernels as the target community.
Compared with the prior art, the invention has the advantages that: the invention provides a solution for co-located community search with structural cohesion; in the community searching process, the spatial information and the local structure information are integrated together by constructing the index structure, so that the efficiency and effectiveness of community searching are improved.
Drawings
The invention is illustrated and described only by way of example and not by way of limitation in the scope of the invention as set forth in the following drawings, in which:
FIG. 1 is a flow diagram of a community search method for an attribute network according to one embodiment of the present invention;
FIG. 2 is a schematic diagram of an attribute network and co-located communities according to one embodiment of the present invention;
FIG. 3 is a schematic diagram of a distance-aware k-core quadtree, according to one embodiment of the present invention;
FIG. 4 is a diagram of constructing a distance map, according to one embodiment of the present invention;
FIG. 5 is a diagram of a quadtree-based co-located community search, according to one embodiment of the invention;
FIG. 6 is a schematic diagram of a quadtree-based co-located community search according to another embodiment of the present invention;
7(a) -7 (c) are schematic diagrams of the correlation of diameter and community search time according to one embodiment of the present invention;
8(a) -8 (b) are schematic diagrams illustrating the correlation between the number of user locations and the community search time according to an embodiment of the present invention;
9(a) -9 (b) are schematic diagrams of the relevance of the location distribution of users and community search time according to one embodiment of the present invention;
fig. 10(a) to 10(b) are diagrams of effects of scalability according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions, design methods, and advantages of the present invention more apparent, the present invention will be further described in detail by specific embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not as a limitation. Thus, other examples of the exemplary embodiments may have different values.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
One of the research goals of the present invention is to provide the search problem for the most cohesive co-sited communities (referred to herein as MC)3The most social co-located community), wherein the searched community satisfies the following two attributes: structural cohesion, which means that the members in the community are most closely related; spatially co-localized, meaning that members are close to each other in geographic location and have spatial cohesion.
According to one embodiment of the invention, a community searching method of a home network is provided, which is briefly summarized as using a directed connectivity graph to characterize an attribute network, and determining a searched target community by searching a connectivity sub-graph satisfying structural cohesion and spatial cohesion criteria in the directed connectivity graph. Specifically, referring to fig. 1, the method comprises the steps of:
step S110, the property network is represented by using the undirected connectivity graph.
In the embodiment of the present invention, a undirected connected graph is taken as an example to characterize an undirected attribute network G ═ V, E, S, where G has a vertex set V, an edge set E, and a spatial position set S. The degree of vertices v (e.g., users in a social network) in G is represented by degg (v), each vertex v having a spatial position v.l ═ x, y ∈ S (e.g., the user' S enrollment position), x and y representing coordinates along the x-axis and y-axis, respectively, in two-dimensional space.
For convenience of description, the definitions of the symbols involved in the present invention are summarized as follows:
g (V, E, S): representing a geo-social graph having a set of vertices V, a set of edges E, and a set of spatial locations S;
(v.x, v.y): representing the position of one vertex V in the set of vertices V along the x-axis and the y-axis;
degG (v): representing the degree of one vertex v in G.
γ (N): representing the side length of node N in the index structure.
The research goal of the invention is to find a community represented by a connected subgraph from an undirected connected graph G, wherein the community meets the following conditions: structural cohesion, i.e., the connection of vertices in connected subgraphs is densest; the spatial cohesion, i.e. the vertices in the connected subgraph, is very compact in space.
In the embodiment of the present invention, the evaluation of the structural cohesion is described by taking k-core as an example, but it should be understood that the method of the present invention can also be extended to other algorithms for adapting structural cohesion, such as k-tress, clique, and the like.
For ease of explanation, the following concepts are first introduced:
1) definition of k-kore
For k-core, given a non-negative integer k, the k-core of G is the largest subgraph of G, where the degree of each vertex v in the subgraph is no less than k.
Specifically, in the present embodiment, a connected k-core in G (denoted as G) is usedk) To represent a community, called GkIs k. Given a graph, k-core may be obtained by An algorithm in the prior art, e.g., a linear kernel decomposition algorithm, the complexity of which is denoted herein as O (| E |) (e.g., the reference "An O (m) algorithm for core decomposition of networks", Batagelj, v., zaversonik, m., arXiv preprints/0310049 (2003)).
2) Definition of core number
For vertex v in a given G, its number of kernels is the highest order of the k-core containing v, denoted CG[v]。
3) Definition of co-located communities
In the embodiment of the invention, the co-located community refers to a connected subgraph (k-core) GkWherein the vertex positions in the subgraph can be surrounded by a circle of a predetermined diameter D. It is desirable herein that vertices in a co-sited community are located closer together, which can reflect the "co-siting" of this community.
In an embodiment of the invention, given a undirected property graph G and a diameter D, Co-sited Community search (MC)3) Returning any vertex groups and their positions, the following constraints are satisfied: the location of the apex can be surrounded by a circle of diameter D; the vertices form the highest order k-core.
FIG. 2 is a schematic diagram of an attribute network and co-located communities, where C1And C2Are two co-located communities in the attribute network, whose members may be surrounded by a circle of diameter D, C1Members of (2) include A, B, C, C2Includes D, G, H, F, E. C2Is 3-core, the core with the highest order (with respect to diameter D) in the two co-sited communities, hence C2Is the MC of the attribute network3I.e. the target community to be searched.
And step S120, searching a connected subgraph in the undirected connected graph to enable the connected subgraph to meet structural cohesion and spatial cohesion standards.
The present invention aims to find the most structural cohesion community and the community can be surrounded by a circle of diameter D, and various embodiments can be employed to search for communities from the attribute network that meet the structural cohesion and spatial cohesion criteria.
Example 1: space priority mode
In this embodiment, for the attribute network, first, all possible circles (diameter D) in space are searched; then, checking the maximum number of kernels of the vertex which can be surrounded by the circle; finally, the maximum number of kernels in all circles is returned.
In particular, all possible circles are enumerated, for example, two positions are fixed in the set S of spatial positions, the distance of which is less than or equal to D, and a maximum of two circles of diameter D are obtained spatially from these two positions. Then, vertices are obtained from the map to which these positions belong, and the maximum number of kernels is calculated using a known linear kernel decomposition algorithm. This approach requires checking the worst case circle with an overhead of O (V)2) And is therefore very time consuming.
Example 2: structure priority mode
In this embodiment, a structure-first search is performed, the idea being to use the network structure to speed up the search.
Specifically, first, kernel decomposition is performed to calculate the number of kernels per vertex; then, the maximum k value in the kernel is searched, denoted as kmaxAnd at kmax-in core, obtaining a position from a vertex; next, a search is performed at these positions by enumerating all possible circles (similar to the space-first method), after which the current best number of kernels (denoted k) can be obtainedcur) (ii) a Then, pair (k)max-1) -core further examination. Repeating the above process until k is reachedcur-core. In this way, verification of a reduced number of circles is facilitated. However, there are still limitations to the efficiency of this approach since globally cohesive subgraphs may not have local cohesion.
Example 3: k-kernel quadtree mode based on distance perception
Neither the space-first approach nor the structure-first approach described above can achieve good performance because of MC3The problem is to consider both spatial and structural cohesion, but both ways either ignore the spatial characteristics of the data or ignore the knotsAnd (5) structural characteristics.
In a preferred embodiment, a Distance-aware k-kernel Quadtree is employed to search for target communities, referred to herein as DkQ-TREE (Distance-aware k-core Quadtree). An index for pre-computing local structure cohesion can be constructed using a quadtree structure to speed up the search and prune the search space.
The following will specifically describe a tree index structure of a quadtree based on spatial index, and a community search method proposed based on the tree index structure to solve MC3And (5) problems are solved.
1) Index structure of quadtree
Known linear k-kernel decomposition algorithms can only compute the global kernel number of vertices, so local cohesion information is unknown during the query. In a quadtree-based index structure, the structural information and spatial information are integrated together to calculate local cohesion (with respect to diameter D).
quad-TREE structure referring to fig. 3, in brief, DkQ-TREE is constructed by dividing a root node into a whole space and dividing the whole space into four subspaces, each subspace corresponding to one child node of the root node. Each node is then repeatedly subdivided into four sub-nodes, e.g., for fig. 3, the entire space is a root node (root), whose four sub-nodes correspond to { a, B, C }, { K, J }, { L } and { D, E, F, G, H, I }, respectively, which similarly can be further subdivided.
In this embodiment, the local cohesion and other useful information for each tree node is pre-computed using a quadtree and based on the spatial monotonicity of the local cohesion. Spatial monotonicity refers to the fact that, given a spatial region R (e.g., a square), if the vertices in this region are able to form k-core of order h at most, then for any region R 'within R, the order of the k-core formed by the vertices in R' is no greater than h. The spatial monotonicity property has fewer vertices based on smaller regions.
In each node N of DkQ-TREE, the number of cores of each vertex of the node in the subgraph extracted from the region is pre-calculated, and the maximum number of cores of the vertex in the node is recorded and marked as LCN. This calculation is performed due to the following principle (referred to herein as lemma 1): given a query diameter D and tree nodes N, MC that can be surrounded by a circle of diameter D3Is not less than LCN
Since N can be surrounded by a circle of diameter D, the above principle can be demonstrated from the spatial monotonicity property. Thus, MC can be obtained from DkQ-TREE based on the pre-calculated information3Is estimated at the lower bound of the order of (a).
However, this is still not sufficient to obtain local cohesion, only the maximum number of nuclei in each node. As can be seen from FIG. 3, when some vertices are not on a node, the vertices of the node may form a k-core. Therefore, for a given diameter D, no bounds on the number of cores of these nodes can be obtained. Therefore, the distance mapping table DistMap of the vertex in each tree is further calculated. The idea is that, given a node N, for each value k>LCNThe nodes are extended to the vertices with the smallest distance d, so that the vertices involved during the extension can form a k-core, which distance d and its corresponding k are recorded in the distance map.
The distance map helps prune the search space according to the following principles (referred to herein as lemma 2): suppose a current MC3Is kcurGiven a query diameter D and a node N, if N.DistMap [ k ]cur]>D, then N cannot be to MC3Contribute any vertex, where N.DistMap [ k ]cur]Distance map representing node N, the optimum order of N being kcur
The above principle can also be demonstrated using the spatial monotonicity property, i.e. if ncur]>D, then means that when the boundary length of N is extended to diameter D, k cannot be found in this regioncurCore, so that the number of cores of any node in the region is less than kcurAnd can be trimmed.
In addition, to quickly obtain a vertex from a location, when the vertex has multiple locations, a vertex map table may also be used to organize the mapping information.
In summary, in the embodiment of the present invention, for each node of DkQ-TREE, the stored information includes: a vertex in the node; the maximum number of cores in the node; a vertex mapping table; a distance mapping table.
2) Index construction of quadtrees
Still referring to the quadtree structure shown in fig. 3, the whole space is a root node (root), four child nodes of the root node correspond to { a, B, C }, { K, J }, { L } and { D, E, F, G, H, I }, respectively, the nodes { a, B, C } are further subdivided into { a }, { B }, { C }, and the nodes { D, E, F, G, H, I } are further subdivided into { D }, { E }, { F } and { G, H, I }. When a new node is obtained, the vertex in the node is used for kernel decomposition and the maximum kernel number is stored. If the maximum kernel number is less than a certain value kεThen the node is not further split. For example, in FIG. 3, vertex { A, B, C } forms a 2-core, and this region is split to form { A }, { B }, and { C }. After splitting, any sub-regions cannot form a 2-core, and therefore, splitting of nodes corresponding to the sub-regions is stopped.
In addition, when a new node is obtained, a Distance Map (Distance Map) and a Vertex Map (Vertex Map) thereof are also constructed. Building a vertex map, i.e. marking the position of each vertex, e.g. in FIG. 3, the position map record v for vertex AAIs A (v)A' s locations: A) and the others are similar. The idea of building a distance map is to, for each value k, perform a binary search to extend the node to the vertex with the smallest distance, so that vertices introduced during the extension can form the k-kernel. For example, referring to FIG. 4, a node has only one vertex C, when extended to vertex B, forming a 1-core, the extended distance is d 1; when extended to vertex A, a 2-core is first formed, the extended distance being d 2. The distances d1 and d2 are stored to a distance map, for example, in the format 1-core: d 1; 2-core: d2.
3) community search method MC based on quadtree3Alg
In the embodiment of the invention, two algorithms are provided based on a quadtree index structure, and are respectively called MC for distinction3Alg algorithm and MC3Alg + Alc, MC3Alg + is MC3Alg algorithmImprovement of (1).
Briefly, MC3The Alg algorithm involves two iterative steps: pruning DkQ-TREE nodes; discovering MCs from nodes that cannot be pruned3. Specifically, MC3The Alg comprises the following steps:
step S211, pruning DkQ-TREE nodes
In this step, MC is obtained according to the above theorem 13Lower bound of the order.
Specifically, given a diameter D, traversing DkQ-TREE from top to bottom, obtaining all nodes with a side length less than D and whose parent nodes have a side length greater than D. These nodes are stored in a node list nodeList. Then, the maximum number of cores is obtained from the nodes in the node list, which is used as the lower bound, with kcurAnd (4) showing. Using the MC3Lower bound of order, according to lemma 2 (i.e. given query diameter D and node N, if N.DistMap [ k ]cur]>D, then N cannot be to MC3Contributing any vertices) further prune the nodes in nodeList.
In step S212, the target community is searched from the nodes remaining after pruning.
After pruning, the remaining nodes in nodeList are sorted according to the upper bound of the number of kernels obtained from the distance mapping table, and then verification of the best node N is started.
Specifically, given node N, if N1]≤D≤N.distMap[k2]Then k is1Is the upper bound on the number of kernels for the vertices in N. First, N is extended by length D and a kernel decomposition is performed on the extended square region. Then, it can safely be ignored that the number of kernels is less than kcurBecause these vertices cannot be included in the MC3In (1). To verify if there are k-cores with higher order for the remaining vertices in the extension area, rather than checking all possible circles as in the space-first approach. In one embodiment, a circle of revolution method is used, the basic idea being to place each vertex in node N on the boundary of a circle of diameter D, and then rotate the circle clockwise. When a vertex enters a circle, it is checked whether there is an order higher than kcurK-core of (1). If so, recordk-core and update kcur. For example, referring to FIG. 5, with vertex G on the boundary of the circle and rotating the circle clockwise, when F enters the circle, the 2-core formed by { G, F, H, I } can be found.
K may be updated after verifying NcurAnd further based on the updated kcurPrune more nodes in nodeList and then perform verification from the next best node. The above process is repeated until all nodes in the nodeList are processed.
For further clarity, example 1 below describes MC in pseudo code form3Framework of Alg. First, nodeList is obtained from DkQ-TREE (line 1); then, MC is obtained3Lower bound of order and use phi to store NmaxFor each node in nodeList, get its distance mapping table DistMap and check that it needs to be extended to the distance containing k-core; node deletion is done securely by lemma 2 (lines 5-8); acquiring the upper bound of the kernel number of the vertex in the node (line 9); next, the nodeLists are sorted in ascending order of the node's upper bound (line 10), for each node, expanded with length D and the vertices pruned as described above; for each vertex in N that is not clipped, the rotate circle method is used to check k-core and update φ (lines 11-15). The k-core with the highest order is finally stored in φ (line 16).
Figure BDA0002016919590000101
Still referring to FIG. 5, given a candidate node containing G, H, I, having G on the boundary of a circle and I, H, F, E, D in the rotated region of the circle, an ordered list { I, H, F, E, D } is obtained according to the order in which they entered the circle. Then, the circle is rotated clockwise, and whenever a vertex in { I, H, F, E, D } enters the circle (on its boundary), the rotation stops and checks if there is a k-core inside it. For example, when F enters a circle, a 2-core ({ G, I, H, F }) can be obtained in the circle. When the circle is rotated to vertex D, a 3-core ({ G, H, F, E, D }) is obtained. After processing H and I in the same manner, it can be seen that { G, H, F, E, D } is the k-core with the highest order in the node.
For MC3Alg algorithm, computational complexity analysis is as follows:
assume that on average each unit space region contains n vertices and m edges, and X nodes are obtained from DkQ-TREE given D.
First, the nodes are sorted according to the upper limit of the number of cores, and the complexity is O (XlogX). Then, for each node N having γ (N) ═ l, N is extended by a length D, i.e., γ (N)ex) 2D + l and the nuclei were decomposed in this square area. In the expanded square, there is (2D + l)2m edges, so the cost of nuclear decomposition is O ((2D + l)2m). Next, a circle is rotated on each vertex in N. In each circle, there are
Figure BDA0002016919590000111
A vertex and
Figure BDA0002016919590000112
and (7) edge.
Note that the k-core verification performed in the circle can be divided into three steps:
the inspection cost is
Figure BDA0002016919590000113
The cost of nuclear decomposition is
Figure BDA0002016919590000114
BFS (breadth first search algorithm) checking cost is
Figure BDA0002016919590000115
Therefore, the k-core verification cost is highest
Figure BDA0002016919590000116
In the worst case, a maximum of π D is performed for each vertex in N2N times (number of vertices in N is l)2n). Thus, MC3The overall complexity of the Alg algorithm is
Figure BDA0002016919590000117
4) Community search method MC based on quadtree3Alg+
MC3Alg is still not efficient enough and limited in large-scale attribute networks. This is because, first, in each node to be checked, there are many vertices, and each vertex needs to apply the rotation circle method; second, the extended area of the node has many vertices, so that the k-core needs to be verified many times when rotating the circle. To overcome these problems, a more efficient algorithm, referred to herein as MC, is provided3Alg+。MC3Alg + and MC3The main difference between Alg is the cost of authentication of the node, whereas the node pruning in DkQ-TREE is compared to MC3Alg is the same.
At MC3In Alg +, for each node N to be examined, a binary search is performed to find the maximum number of cores in that node. The upper limit of the number of kernels is obtained from the distance mapping table of N, and MC3Alg is similar, with the lower limit being the current optimum order. In the binary search process, whether the expansion area of N has the current kernel number k or not is checkedcK-core of (1). In this way, a larger k can be obtained quicklycThe beneficial effect is that firstly, the vertexes in the N for detection are reduced; second, the number of vertices in the extended region introduced in the circle rotation is reduced.
Next, to further reduce the vertices in N to be examined, the expanded square area is divided into m × m cells, and a small square is used to filter out vertices that cannot form a solution. The basic principle is that instead of checking vertices directly one by one, a square covering an s x s cell is used, which can enclose a circle of diameter D to search for all k-cores in the expanded square area. Moving (s x s) the square from the upper left corner to the lower right corner of the expanded square area (comprising m x m cells), checking whether there is k at each position of the squarec-core. Record contains kcAll squares of core, with circular rotation only for the vertices, i.e. in N and squares where m, s are positiveInteger and s is smaller than m, and in practical application, appropriate m and s can be set according to the diameter of the circle, the requirement on search granularity and the like. In this way, the verification granularity is a unit rather than a vertex, so the verification speed is faster.
Finally, a binary circle-of-revolution method is proposed to examine candidate vertices to improve verification cost. And MC3The main difference of Alg is that when rotating a circle, the rotation is not stopped when a new vertex enters the circle, but rather a binary search strategy is used to deal with this problem. Specifically, the rotation is stopped when such a vertex is reached, and k is satisfied first from the start of entering the vertex to the vertexc-core. Then, the circle with the vertex on the boundary is checked, if k existscCore, then record it and stop rotating; otherwise, starting from the checked circle, find that k can be satisfiedcThe next vertex of core. This approach is very efficient since large areas that do not contain any nuclei can be skipped.
Referring to the example of the binary search process shown in FIG. 6, given the same candidate nodes as in FIG. 4, a binary search is performed based on the number of cores. First, there is an upper bound upper-3 (from the distance map) and a lower bound lower-2 (the current best value), so the current core number kcIs that
Figure BDA0002016919590000121
Then, vertices G, H, I are set as boundary vertices. During the rotation, a bisection strategy is considered. Firstly, an ordered list { I, H, F, E, D } is obtained according to the sequence of entering a search circle, and the ordered list is marked as InAnglelList. Next, a binary search is performed on InAnglelList to find vertices that first satisfy 2-core. Because the rotated region { G, H, I } forms a 2-core, vertex H is found first, i.e., the circle is rotated to H and 2-core (i.e., { G, H, I } is found). Record and update lower 2+1 3. Now, kcIs 3, set vertex G as the boundary vertex and repeat the above process. When vertex D is on the boundary of the search circle, the rotation region ({ G, D, E, F, H } forms the 3-core. directly rotating the circle to vertex D, the 3-core can be found within the circle, in this way, when rotating to vertex DF. E, it does not stop, but rotates directly to vertex D. Finally, G, D, E, F, H is found to be the best core in the node.
For MC3Alg + algorithm, computational complexity analysis is as follows:
do and M C3All g the same assumption, at each extension node N that needs to be checkedex(γ(Nex) 2D + l) is performed. Suppose the maximum number of kernels obtained from the distance map is kmaxAnd the binary search for k is at most logkmaxNext, the process is carried out. The expanded square area is divided into T cells and some vertices are filtered out using small squares covering s. Small square coverage
Figure BDA0002016919590000131
A vertex and
Figure BDA0002016919590000132
side, need to move small square (T-s)2Next, the process is carried out. Therefore, the overhead of the moving process is at most
Figure BDA0002016919590000133
For each vertex in N, the overhead is at most during the rotation of the bisecting circle
Figure BDA0002016919590000134
(each circle covers
Figure BDA0002016919590000135
A vertex and
Figure BDA0002016919590000136
and (4) arranging edges. Thus, in the worst case, MC3The overall complexity of Alg + is
Figure BDA0002016919590000137
To further verify the effect of the present invention, a simulation experiment was performed to evaluate the technical effect of the above-described embodiment, in which M based on a quad tree was evaluatedC3Alg and MC3Alg + algorithm, structure priority mode and space priority mode. However, since the structure-first and space-first modes operate very slowly, their performance is only reported in one set of experiments below. The experimental conditions were set as follows:
1) settings on data sets
The experiment utilized four data sets, including three real data sets (Gowalla, FourSquare, Flickr) and one synthetic data set (YoutubeSyn). In Gowalla, each vertex is a user in Gowalla, and each edge represents a friendship between two users. Each user has many registrations and chooses the most common one of the registrations as his location. Further, experiments were also conducted for the case where the user had multiple registrations in this data set. In the FourSquare, each vertex is a user of the FourSquare website, and each edge represents a social relationship between two users. For each user, his most common registration information is selected as his location. In Flickr, the vertices are users and the edges represent the "following" relationship between two users. The location in which the user has the most photo tokens is marked. In Youtube syn, each vertex represents a user of Youtube, and each edge is a "following" relationship between two users. However, without the user's location information, a location is generated for each user. Furthermore, in the experiments, two distribution methods were also used to generate the positions, including random distribution and gaussian distribution. The details of the data set are shown in table 1, wherein,
Figure BDA0002016919590000141
is the average degree, maxkIs the maximum number of locations on the node.
Table 1: data set attributes
Figure BDA0002016919590000142
2) And setting parameters.
Setting the number of m (the number of grid cells in the extended search area) to 10, experiments have shown that this parameter does not have a large impact on performance, and when m is 10, the optimal runtime is achieved, so m is 10 as a default value in all experiments. In the experiment of the multiple locations of the user, for Gowalla, the location of the user is all the registered information of the user. For youtube syn, the user's location is randomly generated. In different distribution experiments, locations are generated that meet the requirements of two distributions, including a random distribution and a gaussian distribution. For all data sets, the bits are placed in squares of size [0,100] × [0,100 ].
3) And the experimental equipment.
Experiments were conducted on machines equipped with Intel i 7-67003.40 GHz processors and 16GB memory, Windows10 was installed, and all algorithms were implemented in java.
Experiment results show that factors such as changing the diameter D, having a plurality of registration positions at one vertex, changing the position distribution of users and the like have influence on the technical effect of the embodiment of the invention.
FIGS. 7(a) to 7(c) are schematic diagrams showing the correlation between the diameter and the operation time, and specifically, changing the diameter D affects the structure priority method, the space priority method, the MC3Alg and MC3Search area and efficiency of Alg +. See fig. 7(a) to 7(c), wherein the abscissa represents the diameter D, which varies from 2.5 to 12.5 (referring to the conversion of the actual coordinates to [0,100]]x[0,100]Coordinates after the square search area) and the ordinate represents the run time in seconds (sec). FIGS. 7(a) to 7(c) show the running times of four algorithms, i.e., spatial priority (spatial), structural priority (structural), MC3Alg and MC3FIG. 7(a) shows the results of experiments in Flickr, FIG. 7(b) shows the results of experiments in FourSquare, and FIG. 7(c) shows the results of experiments in Gowalla. It can be observed that MC3Alg + is always preferred over other algorithms because it has the most pruning and optimization strategies, while the spatial and structural precedence methods are very time consuming and will therefore be ignored in subsequent experiments.
FIGS. 8(a) to 8(b) are schematic diagrams showing the correlation between the number of positions and the running time, in which the abscissa represents the number of positions and the ordinate represents the number of positionsRun time (sec), FIG. 8(a) is the experimental result for the data set YoutubeSyn, and FIG. 8(b) is the experimental result for the data set Gowalla. When a vertex has multiple registered positions, more registered positions will result in more k-core checks. Therefore, the number of registrations may affect the MC3Alg and MC3Performance of Alg +. It can be observed that MC3Alg + is less affected by multiple registrations, since performing a binary search can speed up MC3And rotation process of Alg +. In addition, MC3Running time ratio MC of Alg +3Alg is about 7 times faster.
Fig. 9(a) to 9(b) are schematic diagrams showing the correlation between the location distribution and the runtime, where the abscissa is the diameter value and the ordinate is the runtime, fig. 9(a) corresponds to the gaussian distribution of the data set youtube syn, and fig. 9(b) corresponds to the random distribution of the data set youtube syn. It can be observed that MC3Alg + is always better than MC3And (4) Alg. It should be noted that MC3The superiority of Alg + is more pronounced in Gaussian distributions, since some nodes contain a very large number of vertices, which results in MC3Alg has a higher complexity in searching for these nodes.
10(a) through 10(b) are effect diagrams of extensibility, wherein the abscissa is the percentage of vertices, which refers to the percentage of the vertex number of the entire dataset (for example, 20% represents the experiment performed on a subdata set of 20% scale of the vertex number of a certain dataset), and the ordinate is the running time, FIG. 10(a) corresponds to a Flickr dataset, and FIG. 10(b) corresponds to a FourSquare dataset, and the extensibility of the embodiment of the present invention is verified by changing two datasets. It can be observed that both algorithms adapt well to dataset size and MC3Alg + also operates fastest due to more pruning strategies.
In summary, the present invention provides various embodiments for the search problem of the most cohesive co-located community, and integrates the spatial information and the local structure information in the preferred quadtree-based index structure (i.e. DkQ-TREE), so as to speed up the search of the target community. And based on DkQ-TREE, two effective algorithms are provided, and the efficiency and effectiveness of the provided algorithms are proved through carrying out a large amount of experiments on real and synthetic data sets. The community searching method provided by the embodiment of the invention can be used for behavior analysis, recommendation, disease prediction and the like of social network users.
It should be noted that, although the steps are described in a specific order, the steps are not necessarily performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order as long as the required functions are achieved. Furthermore, those skilled in the art can make appropriate modifications to some embodiments, such as rotating a circle counterclockwise, setting an appropriate diameter D based on the scale of the attribute network, user requirements, query speed requirements, etc., without departing from the spirit of the invention.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (8)

1. A community searching method of a property network comprises the following steps:
step S1: defining a search area range according to the spatial position of an attribute network user, wherein the attribute network is a social network;
step S2: searching a target community according to the contact compactness among network users in the attribute network, wherein the spatial position of the users in the target community is within the range of the defined search area;
wherein step S1 includes the following substeps:
the attribute network is characterized by an undirected connected graph G ═ V, E and S, wherein V represents a vertex set, E represents an edge set, S represents a space position set, and the vertex represents a network user;
searching for a target community represented by a connected subgraph in the undirected connected graph G, wherein the vertex position of the subgraph can be surrounded by a circle with the diameter D and the vertex in the subgraph forms the highest-order k-core relative to other subgraphs of the undirected connected graph G;
wherein, in step S2, searching for a target community represented by a connected subgraph according to the following steps:
step S21: constructing a quadtree index structure for the undirected connected graph G, wherein a root node corresponds to the whole space of G;
step S22: traversing the quad-tree index structure to obtain all nodes with the side length smaller than D and the parent node with the side length larger than D, and storing the nodes in a node list nodeList;
step S23: for each node in the node list nodeList, the maximum number of cores k is obtainedcur
Step S24: prune N.DistMap [ k ] from node listcur]>D node N, wherein Ncur]A distance map representing node N;
step S25: for the remaining nodes in nodeList, sorting in ascending order according to the upper bound of the number of kernels and verifying in sequence to search out the nodes satisfying the k-core with the highest order and capable of being surrounded by a circle with diameter D.
2. The method according to claim 1, wherein in step S25, for a node N in the node list nodeList, the following steps are performed:
expanding N by length D, performing kernel decomposition in the expanded square region and neglecting that the number of kernels is less than kcurThe vertex of (1);
verifying whether remaining vertices in the expanded square region have an order higher than kcurIf so, record the k-core and update kcur
3. The method of claim 2, wherein the remaining vertices in the expanded square region are verified for the presence of vertices of an order higher than kcurK-core of (2):
for one vertex in node N, place it on the boundary of a circle of diameter D and rotate the circle;
when a new vertex enters the circle, checking whether the order is higher than kcurK-core of (1).
4. The method of claim 2, wherein the remaining vertices in the expanded square region are verified for the presence of vertices of an order higher than kcurK-core of (2):
dividing the expanded square area into m × m cells, and searching for k-core in the expanded square area using a square covering s × s cells that can enclose a circle having a diameter D, where s, m are positive integers and s is smaller than m.
5. The method of claim 2, wherein the remaining vertices in the expanded square region are verified for the presence of vertices of an order higher than kcurK-core of (2):
for one vertex in node N, place it on the boundary of a circle of diameter D and rotate the circle;
when rotating a circle, k is satisfied when a new vertex into the circlecCore, stop rotation, where kcRepresenting the number of currently verified cores.
6. The method of claim 1, wherein searching for target communities represented in a connected subgraph is performed according to the following steps:
searching all circles with the diameter D in the undirected connectivity graph G;
for all searched circles, checking the maximum number of kernels of the vertices that can be surrounded by the circles and regarding the vertices surrounded by the circle with the maximum number of kernels as the target community.
7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
8. A computer device comprising a memory and a processor, on which memory a computer program is stored which is executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 6 when executing the program.
CN201910266196.1A 2019-04-03 2019-04-03 Community search method of attribute network Active CN110119462B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910266196.1A CN110119462B (en) 2019-04-03 2019-04-03 Community search method of attribute network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910266196.1A CN110119462B (en) 2019-04-03 2019-04-03 Community search method of attribute network

Publications (2)

Publication Number Publication Date
CN110119462A CN110119462A (en) 2019-08-13
CN110119462B true CN110119462B (en) 2021-07-23

Family

ID=67520775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910266196.1A Active CN110119462B (en) 2019-04-03 2019-04-03 Community search method of attribute network

Country Status (1)

Country Link
CN (1) CN110119462B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818178B (en) * 2019-10-30 2022-10-25 华东师范大学 Fast and efficient community discovery method and system based on (k, p) -core
CN111274498B (en) * 2020-01-22 2023-06-23 哈尔滨工业大学 Network characteristic community searching method
CN111401517B (en) * 2020-02-21 2023-11-03 华为技术有限公司 Method and device for searching perceived network structure
CN112445838B (en) * 2020-10-23 2022-03-22 浙江工商大学 Efficient space k-kernel mining method for space data
CN113254797B (en) * 2021-04-19 2022-09-20 江汉大学 Searching method, device and processing equipment for social network community

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070112781A1 (en) * 2005-11-17 2007-05-17 Mcmullen Cindy System and method for providing search controls in a communities framework
CN103425662B (en) * 2012-05-16 2017-08-25 腾讯科技(深圳)有限公司 Information search method and device in a kind of Web Community
GB201306942D0 (en) * 2013-04-17 2013-05-29 Tomtom Int Bv Methods, devices and computer software for facilitating searching and display of locations relevant to a digital map
CN104462260B (en) * 2014-11-21 2018-07-10 深圳大学 A kind of community search method in social networks based on k- cores
CN107247779A (en) * 2017-06-08 2017-10-13 天津神笔马良网络科技有限公司 Searching method, device and the mobile terminal of interaction community
CN108319728A (en) * 2018-03-15 2018-07-24 深圳大学 A kind of frequent community search method and system based on k-star

Also Published As

Publication number Publication date
CN110119462A (en) 2019-08-13

Similar Documents

Publication Publication Date Title
CN110119462B (en) Community search method of attribute network
JP6998964B2 (en) Methods and equipment for determining the geofence index grid
Wu et al. Finch: Evaluating reverse k-nearest-neighbor queries on location data
CN110347499B (en) Method for generating and deploying remote sensing image tiles in real time
Parimala et al. A survey on density based clustering algorithms for mining large spatial databases
Sankaranarayanan et al. Query processing using distance oracles for spatial networks
US10068033B2 (en) Graph data query method and apparatus
Pfoser et al. Querying the trajectories of on-line mobile objects
CN106709503B (en) Large-scale spatial data clustering algorithm K-DBSCAN based on density
US20240061842A1 (en) Spatial join query method and apparatus, electronic device, and storage medium
Luo et al. Efficient attribute-constrained co-located community search
Guohui et al. Continuous reverse k nearest neighbor monitoring on moving objects in road networks
Wu et al. A maximal ordered ego-clique based approach for prevalent co-location pattern mining
Aggarwal et al. External memory algorithms for outerplanar graphs
CN108198084A (en) A kind of complex network is overlapped community discovery method
US20180149485A1 (en) Road distance systems and methods
US20190108289A1 (en) System for efficiently carrying out a dynamic program for optimization in a graph
CN113792357B (en) Tree growth model construction method and computer storage medium
Glantz et al. Tree-based coarsening and partitioning of complex networks
Luo et al. Efficient search of the most cohesive co-located community in attributed networks
Yang et al. Categorical top-k spatial influence query
Jones et al. Triangulated spatial models and neighbourhood search: an experimental comparison with quadtrees
Lin et al. A new directional query method for polygon dataset in spatial database
Fellegara et al. Analysis of geolocalized social networks based on simplicial complexes
Liu et al. Research Review of Algorithm Model in Graphic Database System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 310000 Room 501, building 9, No. 20, kekeyuan Road, Baiyang street, Hangzhou Economic and Technological Development Zone, Zhejiang Province

Patentee after: Hangzhou Zhongke advanced technology development Co.,Ltd.

Address before: 310000 Room 501, building 9, No. 20, Science Park Road, Baiyang street, economic and Technological Development Zone, Jianggan District, Hangzhou City, Zhejiang Province

Patentee before: HANGZHOU ZHONGKE ADVANCED TECHNOLOGY RESEARCH INSTITUTE Co.,Ltd.

CP03 Change of name, title or address