CN113392143B

CN113392143B - Construction and processing method of reachability query index facing multiple relational graphs

Info

Publication number: CN113392143B
Application number: CN202110792966.3A
Authority: CN
Inventors: 王潇杨; 傅仙明; 吴艳萍; 陈晨; 卢旭峰; 张梦琪
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2021-07-14
Filing date: 2021-07-14
Publication date: 2022-08-02
Anticipated expiration: 2041-07-14
Also published as: CN113392143A

Abstract

The invention discloses a construction and processing method of reachability query index facing multiple relational graphs. The conventional reachability query method is not applicable to the multiple relationship graph. The invention provides two new reachable models in a multi-relation graph, and for a necessary relation reachable model, the necessary relation between two nodes can be satisfied: a path can be found between two nodes, each edge in the path containing all the relationships in a given set of relationships. For the existence relation reachable model, the existence relation reachable between two nodes meets the following conditions: a path can be found between two nodes, and each edge in the path at least contains one relation in a given relation set. In consideration of the attributes of the necessary relationship reachable model and the existing relationship reachable model, the invention provides a corresponding index construction and processing method to effectively reduce the query time. To get the index faster, the present invention proposes a novel pruning strategy to speed up the construction of the index.

Description

Construction and processing method of reachability query index facing multiple relational graphs

Technical Field

The invention belongs to the technical field of multimedia data mining, and particularly relates to a construction and processing method of reachability query indexes facing a multi-relation graph.

Background

There are many applications for graph structures, such as social networks, communication networks, road networks, etc. that can be modeled as a graph. For example, in Twitter, each user can be regarded as a node in the graph, and the relationship between users is an edge between nodes. Different relationships may be involved in the edges, as two users may be connected by "focus", "comment", "forward", etc. Similarly, in a scientific collaboration network (e.g., DBLP), each node represents an author, an edge between two nodes represents collaboration between the authors, the edge may contain areas of author collaboration, and since the author's research area is not usually single, there may be multiple relationships on the edge. In graph analysis, the reachability query problem is a basic operation. I.e. given a graph (typically a directed graph) and two nodes u and v, it is checked whether there is a path from the starting point v to the end point u. Recently, many studies have attempted to determine reachability between nodes with constraints. Jin et al combines the chain structure with the jump idea and proposes a 3-hop structure to reduce the index size of the compact graph. Zhou et al propose a DAG reduction method to accelerate reachability queries in large graphs. Valstar et al propose a landmark-based index that can be used for large graphs, select a small number of landmarks, and calculate an index corresponding to each landmark in advance, and when performing query, first perform BFS from a source node to the landmark, and then acquire a shortcut of a target node by using the index. However, previous studies on reachability issues only considered the case of only one relationship on an edge, and did not discuss the case of multiple relationships on an edge. In consideration of the fact that the edges in the multi-relationship graph have various relationships, the invention provides a necessary relationship reachable and existing relationship reachable model, and aims to effectively identify whether the nodes in the multi-relationship graph have certain specific relationships.

Disclosure of Invention

In order to determine the relationship between two nodes, the invention provides two new reachable models on the multi-relationship graph, namely necessary relationship reachable and existing relationship reachable. Wherein the necessary relationship can be satisfied: a path can be found between the two nodes, and each edge in the path contains all the relationships in the given relationship set R; the existence relation can be satisfied: a path can be found between two nodes, and each edge in the path at least contains one relation in the given relation set R.

The invention respectively constructs the index structure of the reachable query of the necessary relationship and the existing relationship based on the thought of 2-hop coverage, thereby effectively shortening the time required by the query. Aiming at the construction process of the index, the invention develops three novel pruning strategies, avoids a large amount of invalid calculations and obviously shortens the time for constructing the index.

Unnecessary nodes and index items in the index construction process are filtered through three pruning strategies, which comprise the following steps:

node-based pruning strategy: if one node v has already calculated its corresponding index, then encounter node v can skip node v directly while calculating the index of other nodes;

pruning strategy based on index entry: if the reachable relation between a pair of nodes corresponding to one index item can be obtained through the current index, the index item is directly skipped;

index-based pruning strategy: because the index item with the largest number of relationships in the index items is always processed with priority when the necessary relationship index is constructed, and the index item with the smallest number of relationships in the index items is always processed with priority when the relationship index exists, if the reachable relationship between the corresponding nodes in one index item can not be obtained from the existing index query, the index item is directly inserted into the index.

The structure of the index is specifically as follows:

for all nodes v, there will be two corresponding sets in the necessary relationship index

And

each index item in the set corresponds to a node u and a relation set R _u Indicating that the node v is R in the relation set R _u Node u may be reached when the subset of;

each index item in the set corresponds to a node u and a relation set R _u Indicating that the node u is R in the relation set R _u Node v may be reached when it is a subset; based on the three pruning strategies in step (2), the necessary relationship index is extremely small, and deleting any one of the indexes can cause the index to fail to correctly answer the reachable query.

For all nodes v, there will be two corresponding sets in the relational index

And

each index item in the set corresponds to a node u and a relation set R _u Indicating that the node v is R in the relation set R _u Node u can be reached when superset;

each index item in the set corresponds to a node u and a relation set R _u Indicating that the node u is R in the relation set R _u Can reach node v when superset; based on the three pruning strategies in step (2), the existing relational index is extremely small, and deletion of any one of the indexes can cause the index to fail to correctly answer the reachable query.

In the construction process of the index structure, a priority is allocated to each node according to the degree of the node, namely the number of neighbors of the node, wherein the higher the degree is, the higher the priority of the node is; processing the nodes according to the sequence of the priorities of the nodes from high to low; the index entries in the index are ordered according to the priority of the nodes, so that no additional ordering of the index entries is required when reachable queries are performed.

In the technical scheme for solving the technical problem, the reachable query of the necessary relationship specifically comprises the following steps: traversing node v in the necessary relationship index given the start point v and end point u of the query, and the set of relationships R

And of node u

(a) If it is currently

If the existing node is u and the corresponding relation set is an index item of a superset of the given relation set R, judging that the reachable path of the necessary relation from v to u exists;

(b) if it is currently

If the existing node is v and the corresponding relation set is an index item of a superset of the given relation set R, judging that the reachable path of the necessary relation from v to u exists;

(c) compare the current

The priority and the current of the node in the index entry in (1)

If the former is higher than the latter, then access is performed

The next index entry in; if the former is lower than the latter, then access is made

The next index entry in; if the former and the latter are the same and the corresponding relationship set is an index item of the superset of the given relationship set R, judging that the necessary relationship reachable path from v to u exists;

(d) if the conditions (a), (b) and (c) do not occur after the traversal is finished, the situation shows that the reachable path of the necessary relation does not exist.

The specific steps of performing reachable query on the existing relationship are as follows: traversing node v in the presence of relational index given starting point v and end point u of query, and relational set R

And of node u

(a) If it is currently

If the existing node is u and the corresponding relation set is an index item of the subset of the given relation set R, judging that the existing relation reachable path from v to u exists;

(b) if it is currently

If the existing node is v and the corresponding relation set is an index item of the subset of the given relation set R, judging that the existing relation reachable path from v to u exists;

(c) compare the current

The priority and the current of the node in the index entry in (1)

If the former is higher than the latter, then access is performed

The next index entry in; if the former and the latter are the same and the corresponding relationship set is an index item of the subset of the given relationship set R, judging that the existing relationship reachable path from v to u exists;

(d) and if the conditions (a), (b) and (c) do not occur after the traversal is finished, the reachable path of the existing relationship does not exist.

The invention also provides a computer device, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the construction and processing method of the reachability query index facing to the multiple relation graph.

The present invention also provides a storage medium storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the steps of the method for constructing and processing the reachability query index for multiple relational maps.

The invention has the beneficial effects that: building an index to answer reachability queries between nodes in a graph is a common solution to the reachability problem. However, since the relationship combinations among the nodes in the multi-relationship graph increase exponentially with the number of the relationships and the path length among the nodes, it is necessary to design a proper index structure. The invention provides an index structure based on 2-hop coverage, which can be used for answering the reachability of nodes in a graph under a given relationship. In addition, in order to ensure that the obtained index structure is extremely small and accelerate the construction process of the index, the invention provides three novel pruning strategies, so that the index can be quickly obtained. Therefore, the construction and processing method of the reachability query index oriented to the multiple relational graphs has great benefits for determining the relations between the nodes.

Drawings

FIG. 1 is a flow chart of a method for constructing and processing a reachability query index for a multiple relationship graph implemented in the present invention;

FIG. 2 is an exemplary diagram of a practical application scenario of the present invention;

FIG. 3 is a diagram of a multiple relationship diagram of an implementation of the present invention;

FIG. 4 is a diagram of the necessary relationship reachable query index constructed for the multiple relationship graph of FIG. 3;

FIG. 5 is a presence relationship reachable query index constructed on the multiple relationship graph of FIG. 3.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

The method is used for the reachability and relation reachability query of necessary relation and the existence relation. The necessary relationships can be satisfied: a path can be found between the two nodes, and each edge in the path contains all the relationships in the given relationship set R; the existence relation can be satisfied: a path can be found between two nodes, and each edge in the path at least contains one relation in the given relation set R. The method comprises three novel pruning strategies and an efficient index-based reachability query algorithm. The implementation of each part is described in detail below.

Three novel pruning strategies refer to skipping invalid nodes and index entries, thereby significantly reducing index construction time and space, including the following:

and (3) proving that: when the index is built, for each calculated node, all reachable nodes and path information of other nodes which can reach the node are recorded in the index, so that skipping over the nodes which have already calculated the index in the later process does not affect the correctness of the index.

and (3) proving that: the reachability of an index entry can already be derived from the current index, indicating that the index entry is redundant, so skipping the index does not affect the correctness of the index.

Index-based pruning strategy: because the index item with the most relation number in the index items is always processed preferentially when the necessary relation index is constructed, and the index item with the least relation number in the index items is always processed preferentially when the relation index exists, if the reachable relation between the corresponding nodes in one index item can not be obtained by the existing index query, the index item is directly inserted into the index;

and (3) proving that: when the necessary relationship index is constructed, the index item with the largest number of relationships in the index items is preferentially processed, so that the number of relationships corresponding to the index item entering the index later is less than the number of relationships in the index, the index item in the index cannot be replaced by the index item entering the index later, and the index item can be directly inserted into the index later without checking whether some index items need to be removed and then inserted. When the existing relationship index is constructed, the index item with the minimum relationship number in the index items is processed preferentially, so that the relationship number corresponding to the index item entering the index later is more than the relationship number in the index, the index item in the index cannot be replaced by the index item entering later, and the index item can be directly inserted into the index later without checking whether certain index items need to be removed and then inserted.

Respectively constructing an index structure of the reachable query of the necessary relation and the existing relation based on the 2-hop coverage, wherein the index structure is as follows:

And

each index item in the set corresponds to a node u and a relation set R _u Indicating that the node u is R in the relation set R _u Is a subset ofNode v can be reached; based on the three pruning strategies in step (2), the necessary relationship index is extremely small, and deleting any one of the indexes can cause the index to fail to correctly answer the reachable query.

For all nodes v, there will be two corresponding sets in the relational index

And

Performing reachable query on the necessary relation according to the index, specifically: traversing node v in the necessary relationship index given the start point v and end point u of the query, and the set of relationships R

And of node u

(a) If it is currently

(b) if it is currently

(c) compare the current

The priority and the current of the node in the index entry in (1)

If the former is higher than the latter, then access is performed

Performing reachable query on the existing relationship according to the index, specifically: traversing node v in the presence of relational index given starting point v and end point u of query, and relational set R

And of node u

(a) If it is currently

(b) if it is currently

(c) compare the current

The priority and the current of the node in the index entry in (1)

If the former is higher than the latter, then access is performed

Fig. 1 is a flowchart of a method for constructing and processing a reachability query index for a multiple relationship graph in an embodiment of the present invention.

Figure 2 shows a Twitter network with six users. In the figure, there is a relationship between like and request on a path from Tom to Bob, which means Tom will like and forward Bob's tweet. The follow and comment relationship from Bob to Mick means that Bob has focused on Mick and Bob reviews Mick's tweet. For the required relationship reachable query, Jack is able to reach Alice under the required relationship constraint { like, follow } because there is a path from Jack to Alice (Jack, Mick, Alice) where each edge on the path contains the relationship { like, follow }. Similarly, for a presence relationship reachable query, Tom may reach Alice under the presence relationship constraint { follow, reload }. As illustrated by way of example, the two proposed models are designed for different scenarios, and the necessary relationships may require many strict constraints.

Fig. 3 is a diagram of multiple relationships simulating the real world. Given a given relational constraint { r } ₁ r ₂ Is reachable for the necessary relationship, from the starting point v ₁ To the end point v ₄ Can find a path<v ₁ ,v ₃ ,v ₄ >(ii) a Reachable for presence relationship, from starting point v ₁ To the end point v ₅ Can find a path<v ₁ ,v ₃ ,v ₄ ,v ₅ >。

By the method of the invention, the necessary relation reachable query index and the existing relation reachable query index are respectively constructed for the multiple relation graphs in FIG. 3, as shown in FIG. 4 and FIG. 5.

Furthermore, the present invention performed extensive experiments on multiple relationship plots to evaluate the effectiveness and efficiency of the proposed method. To evaluate the performance of the proposed method, we performed experiments by changing the set of nodes and relationships of the inputs. The invention uses the algorithm time consumption to respectively measure the effectiveness and the efficiency of the proposed method. For each setting, the invention was run 100000 times and averaged. All programs are realized in standard c + +, and all experiments are carried out on a server equipped with an Intel Xeon 2.2GHz CPU and a 128GB main memory. Experiments show that for the necessary relation reachable query and the existing relation reachable query, the construction and processing method of the reachability query index for the multiple relation graphs, which is provided by the invention, is 25 times faster than the basic online algorithm on the maximum data set.

In one embodiment, a computer device is provided, which includes a memory and a processor, where the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps in the method for constructing and processing the reachability query index for multiple relationship graph in the foregoing embodiments.

In one embodiment, a storage medium storing computer-readable instructions is provided, and the computer-readable instructions, when executed by one or more processors, cause the one or more processors to execute the steps of the method for constructing and processing the reachability query index for multiple relationship graph in the embodiments described above. The storage medium may be a nonvolatile storage medium.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

The foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims

1. A construction and processing method of reachability query index facing multiple relation graphs is characterized in that the method is used for necessary relation reachable and existing relation reachable queries, and the necessary relation reachable needs to meet the following requirements: a path can be found between the two nodes, and each edge in the path contains all the relationships in the given relationship set R; the existing relation can meet the following requirements: a path can be found between the two nodes, and each edge in the path at least comprises one relation in the given relation set R; the method comprises the following steps:

(1) respectively constructing an index structure of the reachable query of the necessary relation and the existing relation based on 2-hop coverage, wherein the index structure comprises reachable paths between any two points under any relation;

(2) unnecessary nodes and index items in the index construction process are filtered through three pruning strategies, which comprise the following steps:

the index structure is specifically as follows:

And

each index item in the set corresponds to a node u and a relation set R _u Indicating that the node u is R in the relation set R _u Node v may be reached when it is a subset; based on the three pruning strategies in the step (2), the necessary relationship index is extremely small, and deletion of any one of the indexes can cause that the index cannot correctly answer the reachable query;

for all nodes v, there will be two corresponding sets in the relational index

And

each index item in the set corresponds to a node u and a relation set R _u Indicating that the node u is R in the relation set R _u Can reach node v when superset; based on the three pruning strategies in the step (2), the existing relational index is extremely small, and deletion of any one of the indexes can cause that the indexes cannot correctly answer the reachable query;

(3) based on the constructed index structure, given a starting point and an end point, reachable paths are found which satisfy necessary relationships or exist relationships.

2. The method according to claim 1, characterized in that in the construction process of the index structure, each node is assigned a priority according to the degree of the node, i.e. the number of neighbors of the node, the higher the degree is, the higher the priority of the node is; processing the nodes according to the sequence of the priorities of the nodes from high to low; the index entries in the index are ordered according to the priority of the nodes, so that no additional ordering of the index entries is required when reachable queries are performed.

3. The method according to claim 1, wherein the reachable query for the necessary relationship according to the index in (3) is specifically: traversing node v in the necessary relationship index given the start point v and end point u of the query, and the set of relationships R

And of node u

(a) If it is currently

(b) if it is currently

(c) compare the current

The priority and the current of the node in the index entry in (1)

If the former is higher than the latter, then access is performed

4. The method according to claim 1, wherein the performing reachable query on the existing relationship according to the index in (3) specifically includes: traversing node v in the presence of relational index given starting point v and end point u of query, and relational set R

And of node u

(a) If it is currently

(b) if it is currently

(c) compare the current

The priority and the current of the node in the index entry in (1)

If the former is higher than the latter, then access is performed

5. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the method of constructing and processing a multiple relationship graph oriented reachability query index of any of claims 1-4.

6. A storage medium storing computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the method of constructing and processing a multiple relationship graph oriented reachability query index of any one of claims 1-4.