CN113282415B - Method for matching patterns of labeled graph in distributed environment - Google Patents

Method for matching patterns of labeled graph in distributed environment Download PDF

Info

Publication number
CN113282415B
CN113282415B CN202110570428.XA CN202110570428A CN113282415B CN 113282415 B CN113282415 B CN 113282415B CN 202110570428 A CN202110570428 A CN 202110570428A CN 113282415 B CN113282415 B CN 113282415B
Authority
CN
China
Prior art keywords
node
graph
data
slave
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110570428.XA
Other languages
Chinese (zh)
Other versions
CN113282415A (en
Inventor
李靖东
王晓玲
卢兴见
张吉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Zhejiang Lab
Original Assignee
East China Normal University
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University, Zhejiang Lab filed Critical East China Normal University
Priority to CN202110570428.XA priority Critical patent/CN113282415B/en
Publication of CN113282415A publication Critical patent/CN113282415A/en
Application granted granted Critical
Publication of CN113282415B publication Critical patent/CN113282415B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a pattern matching method of a labeled graph in a distributed environment, which comprises the steps that a master node in the distributed environment divides a data graph and respectively sends each part of node data to each slave node, meanwhile, the labeled graph is distributed to each slave node, the slave nodes dynamically select a matching path according to the storage condition and the communication condition of local data, the graph pattern matching result is obtained and fed back to the master node, and the master node aggregates and outputs all graph pattern matching results. The invention fully considers the load balancing problem in the distributed environment while using the graph calculation mode with the task as the center, so as to fully utilize the CPU calculation force of each machine in the distributed environment and effectively improve the matching efficiency of the graph mode.

Description

Method for matching patterns of labeled graph in distributed environment
Technical Field
The invention belongs to the technical field of graph data mining, and particularly relates to a tagged graph pattern matching method in a distributed environment.
Background
A graph is commonly used to represent complex structured data as a generic data structure. It better stores and expresses entities and their associations relative to other data structures. In the real world, the graph has wide application in the fields of social network analysis, web network analysis, traffic network optimization, knowledge graph construction, computational chemistry, computational biology and the like. Aiming at the graph data with rich semantics, various styles and huge data volume, how to quickly and accurately acquire valuable information in the graph data becomes a very popular research direction at present.
With the continuous development of emerging technologies such as the internet of things and cloud computing, the rapid rise of novel internet applications such as social networks and the wide popularization of various electronic wearable devices, the scale and complexity of graph data are continuously increased, so that the existing graph computing method faces great challenges in performance and efficiency, and particularly aims at computationally intensive tasks such as graph pattern mining in large-scale graph data. One intuitive solution to solve these high complexity computing problems is to use multiple CPU cores to execute in parallel, however, the existing big data framework is mainly aimed at data intensive graph mining tasks, and data transmission often becomes a bottleneck, resulting in low CPU utilization. Solving computationally intensive graph mining tasks using these frameworks typically results in poorer performance.
In view of graph pattern matching task, the graph pattern matching task is a computationally intensive graph mining task, and the large scale of the data graph causes that the storage space and the computational power of a single machine are difficult to meet the task requirement, however, the existing distributed graph pattern matching method is mainly based on MapReduce design or vertex-centered design similar to the Pregel model. However, both of these well-known distributed architecture designs are not suitable for computationally intensive problems, where the MapReduce-based graph computing system nsale does not begin computationally intensive processing of the decomposable subgraphs until all the decomposable subgraphs are synchronously constructed, resulting in CPU underutilization, which is not suitable for computationally intensive problems. Meanwhile, a barrier synchronization stage exists in a batch synchronization parallel model used in Pregel, so that different machines have to wait for a long CPU processing time to synchronize, and the design also has difficulty in fully utilizing CPU calculation power.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a graph pattern matching method with labels in a distributed environment, which adopts a graph calculation mode with tasks as centers so as to fully utilize the CPU calculation force of each machine in the distributed environment and improve the efficiency of the graph pattern matching process.
In order to achieve the above object, the pattern matching method for the labeled graph in the distributed environment of the present invention comprises the following steps:
s1: obtaining graph data needing to be subjected to pattern matching of a labeled graph, wherein the graph data comprises a data graph and a labeled pattern graph, the data graph is an undirected graph containing node IDs, node label information and node association relations, and the labeled pattern graph comprises node labels and the node association relations;
s2: note that the master node in the distributed environment is M, and the slave node is S n N=1, 2, …, N represents the number of slave nodes, the master node M divides the data graph obtained in step S1 into N pieces of node data, each piece of node data contains a plurality of pieces of node information, each piece of node information includes a node ID, node tag information and a node association relationship, and then distributes each piece of node data to a corresponding slave node S n The method comprises the steps of carrying out a first treatment on the surface of the At the same time, the master node M respectively transmits the pattern diagram with the label to each slave node S n
S3: each slave node S n After receiving the node data and the labeled pattern diagram, the label in the labeled pattern diagram is marked as f k K=1, 2, …, K representing the number of labels in the labeled pattern graph, slave node S n In the received node data, the statistics results in a label f k Node set phi of (2) n,k Then, a node set with the least number of nodes is selected as a root node set R n The method comprises the steps of carrying out a first treatment on the surface of the Root node set R n The number of the nodes is D n Then slave node S n Respectively by root node set R n One of the nodes is used as a root node to execute D n The graph pattern matching task feeds back the obtained graph pattern matching result to the master node M;
s4: the master node M gathers the graph pattern matching results fed back by each slave node, eliminates repeated graph pattern matching results, and obtains a final graph pattern matching result set.
According to the pattern matching method for the tagged graph in the distributed environment, a master node in the distributed environment divides a data graph and sends each part of node data to each slave node respectively, meanwhile, the tagged graph is distributed to each slave node, the slave nodes dynamically select matching paths according to the storage condition and the communication condition of local data, the graph pattern matching result is obtained and fed back to the master node, and the master node aggregates and outputs all graph pattern matching results. The invention fully considers the load balancing problem in the distributed environment while using the graph calculation mode with the task as the center, so as to fully utilize the CPU calculation force of each machine in the distributed environment and effectively improve the matching efficiency of the graph mode.
Drawings
FIG. 1 is a flow chart of an embodiment of a tagged map pattern matching method in a distributed environment of the present invention;
fig. 2 is a data diagram in the present embodiment;
fig. 3 is a diagram of a labeled pattern in the present embodiment;
FIG. 4 is a flow chart of the execution of the pattern matching task of the present invention;
fig. 5 is a graph pattern matching result of the data graph of fig. 2 and the labeled graph pattern of fig. 3 obtained by using the present invention.
Detailed Description
The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.
Examples
FIG. 1 is a flow chart of an embodiment of a tagged map pattern matching method in a distributed environment of the present invention. As shown in fig. 1, the specific steps of the pattern matching method for the labeled graph in the distributed environment of the present invention include:
s101: obtaining graph data:
and obtaining graph data needing to be subjected to pattern matching of the labeled graph, wherein the graph data comprises a data graph and a labeled pattern graph, the data graph is an undirected graph containing node IDs, node label information and node association relations, and the labeled pattern graph comprises node labels and the node association relations.
That is, the data graph contains at least three-dimensional features, and if the original data graph contains other node attributes, but does not appear in the labeled pattern graph, a simplification process can be performed. The tagged pattern diagram data may be in each line format (start point attribute, end point attribute). Fig. 2 is a data diagram in the present embodiment. Fig. 3 is a diagram of a labeled pattern in the present embodiment. As shown in fig. 2 and 3, the data graph in this embodiment is a personnel relationship graph, and includes 14 personnel nodes, where the personnel nodes have 6 kinds of labels, and each kind of label represents a different personnel identity, and is represented by a different shape, that is, 6 kinds of identities in total.
The information contained in fig. 2 and 3 can be represented by tables. Table 1 is a data table of the data diagram shown in fig. 2.
Node ID Node attributes Association relation (adjacent node)
1 A 2,3,5
2 B 1,3,4
3 D 1,2,12
4 C 2
5 E 1
6 A 7,8
7 B 6,8,9,12
8 D 6,7
9 C 7
10 B 11,12,13,14
11 A 10,12
12 D 3,7,10,11
13 C 10
14 F 10
Table 1 table 2 is a data table of the tagged pattern diagram shown in fig. 3 in this example.
Start Point Attribute Termination point attribute
A B
A D
B C
B D
S102: graph data distribution:
since a distributed environment generally includes a master node and several slave nodes, the size of a data graph in a real application often exceeds the storage space of a single machine, and for a graph pattern matching task, it is unnecessary to backup all graph data at each machine and a matching result may be duplicated, so that the data graph is distributed by the master node first. While the pattern graph entered by the user tends to be much smaller in size than the data graph, the pattern graph can be sent directly to each slave node. That is, the specific process of graph data distribution is: note that the master node (master) in the distributed environment is M, and the slave node is S n N=1, 2, …, N represents the number of slave nodes, the master node M divides the data graph obtained in step S101 into N pieces of node data, each piece of node data contains a plurality of pieces of node information, each piece of node information includes a node ID, node tag information and a node association relationship, and then distributes each piece of node data toCorresponding slave node S n . At the same time, the master node M respectively transmits the pattern diagram with the label to each slave node S n
In order to realize uniform distribution of the data graph, a single slave node is prevented from containing a large amount of data to become a communication bottleneck, and the embodiment adopts a data graph distribution method based on a hash value of a node ID, which comprises the following specific processes: and respectively carrying out hash calculation on each node ID in the data graph, wherein a hash function is hash (ID)% N, so as to obtain a hash value corresponding to each node ID, and dividing nodes with the same hash value into the same piece of node data.
S103: pattern matching of the graph:
each slave node S n After receiving the node data and the labeled pattern diagram, the label in the labeled pattern diagram is marked as f k K=1, 2, …, K representing the number of labels in the labeled pattern graph, slave node S n In the received node data, the statistics results in a label f k Node set phi of (2) n,k Then, a node set with the least number of nodes is selected as a root node set R n . Root node set R n The number of the nodes is D n Then slave node S n Respectively by root node set R n One of the nodes is used as a root node to execute D n And the graph pattern matching task feeds back the obtained graph pattern matching result to the master node M.
FIG. 4 is a flow chart of the execution of the pattern matching task of the present invention. As shown in FIG. 3, the specific steps of the pattern matching task in the present invention include:
s401: initializing a root node:
let node serial number i=0, determine the root node label according to the root node of the current pattern matching taskLet node set B 0 The set of root nodes of the task is pattern-matched for the current graph.
S402: determining the next set of nodes:
because the node data is distributed to different slave nodes in the invention, when the subsequent matching nodes are selected (namely, the matching paths are determined), the fact that each slave node has differences in the process of graph matching tasks due to the difference of local data and the data quantity required to be pulled from other machines is considered, and meanwhile, the communication condition is influenced. The present invention thus proposes a candidate set based cost estimation model that can generate an adaptive graph-matching path from the cost function of each candidate tag. The design of the model is based on the following observations: (1) Different matching sequences will result in different communication costs, thereby affecting the time of the overall matching calculation process; (2) Considering the size of candidates present in the data map in determining the matching order may effectively reduce the subsequent search space. Specifically, the cost estimation model takes into account the following three factors: (1) Structure information (node degree, candidate set size) of the current matching node; (2) The matching probability (likelihood of premature termination) of the current matching path; (3) The current slave node's data storage status and workload conditions (number of tasks and nodes in the cache). The specific calculation process of the cost function is as follows:
acquiring labels in a pattern diagram asAdjacent node candidate label set a of nodes of (a) i+1 Record candidate tag set A i+1 The p-th tag in (b) is u p ,p=1,2,…,P i+1 ,P i+1 Representing candidate tag set A i+1 A number of candidate tags. Firstly, calculating to obtain candidate label u in the obtained data graph p Communication cost of all corresponding nodes pull (u p ):
cost pull (u p )=[C(u p )-C local (u p )]×W remote +C local (u p )×W local
Wherein C (u) p ) Tag u in data graph representing slave node queried by master node p The number of all corresponding nodes, C local (u p ) Indicating that the tag located locally to the current slave node is u p Is the number of nodes, W local Representing the unit communication cost, W, required to obtain the local node data of the slave node remote Representing the unit communication cost required to obtain node data from other slave nodes.
Calculate the candidate tag u p Traffic cost reduced by subsequent unmatched occurrence of next node label on matched path stop (u p ):
Wherein ρ is p Representing candidate tag u p The probability of unmatched subsequent occurrence when the next node label on the matching path is used as a calculation formula is as follows:
the next label on the matching path is calculated by adopting the following calculation formulaCost function cost (u) p ):
cost(u p )=cost pull (u p )-cost stop (u p )
Selecting candidate tag set A i+1 The label with the smallest cost function as the label of the next nodeThe slave node obtains the label of +.A. in the data graph through the inquiry of the master node>And the adjacent node belongs to node set B i Form node set B i+1
S403: judging whether i is less than T-1, wherein T represents the number of nodes in the pattern diagram with labels, if so, proceeding to step S404, otherwise proceeding to step S405.
S404: let i=i+1, return to step S402.
S405: backtracking to obtain a matching path:
with node set B T-1 Each node in the graph is used as a final node, and a matching path corresponding to the final node is obtained by backtracking, so that a graph pattern matching result is obtained.
Taking fig. 2 as an example, assume that a root node is node 1 in a graph pattern matching task performed by a certain slave node. The label is a because in the pattern diagram, the nodes adjacent to the node with attribute a are the nodes labeled B and D, respectively, and then the node that matches with which label preferentially is determined.
For the label B, the node set adjacent to the node 1 and labeled B in the data graph only comprises the node 2, and the label B is taken as the communication cost of the next label if the node 2 is on the current slave node pull (B)=W local
For the label D, the node set adjacent to the node 1 and labeled D in the data graph only comprises the node 3, and the label D is taken as the communication cost of the next label if the node 3 is not on the current slave node pull (D)=W remote
In the embodiment, since the labels B and D are all nodes that can be matched, there is no possibility of early matching termination, so that the non-matching probabilities corresponding to the two labels are both 0, and then the labels B and D are used as the labels of the next node on the matching path, and the traffic cost is reduced due to the fact that the non-matching occurs subsequently stop (B)=cost stop (D) All 0.
In summary, the cost functions of tags B and D are cost (B) =W local 、cost(D)=W remote It is evident that in general, W remote >W local Therefore, the label B is set as the next label, and the corresponding node set is the set constituted by the nodes 2. And the same is done, the 3 rd label is the label D, the node set is the set formed by the node 3, the 4 th label is the label C, the node set is the set formed by the node 4, and then the current graph mode matching can be obtained by backtrackingAnd (5) matching results.
In step S103, D on each slave node n The graph pattern matching task can be performed in a parallel mode to improve the graph pattern matching efficiency.
S104: result polymerization:
the master node M gathers the graph pattern matching results fed back by each slave node, eliminates repeated graph pattern matching results, and obtains a final graph pattern matching result set.
Fig. 5 is a graph pattern matching result of the data graph of fig. 2 and the labeled graph pattern of fig. 3 obtained by using the present invention. As shown in FIG. 5, the accurate graph pattern matching result can be obtained by adopting the method and the device.
In addition, the master node M may perform task scheduling on the slave nodes to implement load balancing, that is, after a certain slave node performs all graph pattern matching tasks, the master node M sends a schedulable message to the master node M, after receiving the schedulable message, the master node M initiates a query to the slave nodes that do not complete the graph pattern matching tasks, and queries to obtain the graph pattern matching tasks that have not yet been performed, and then reassigns the graph pattern matching tasks to the schedulable slave nodes.
In order to better illustrate the technical effects of the invention, the invention is experimentally verified on some data sets of practical application, and the data graph data sets used in the test are Email data sets and DBLP data sets, wherein the Email data sets are data graphs representing communication relations among people, the DBLP data sets are data graphs representing paper citation relations among people, and the two data sets are all open-sourced on SNAP and can be downloaded from http:// SNAP. The pattern diagram used in the test is a pattern diagram which is constructed according to a general pattern diagram generation method and sequentially increases in three scales, and the generation rule can be referred to in papers of Han M, kim H, gu G, et al, efficiency subgraph matching: harmonizing dynamic programming, adaptive matching order, and failing set together [ C ]// Proceedings of the 2019International Conference on Management of Data.2019:1429-1446 ] "
In the experimental verification, a distributed graph pattern matching method (BENU) with optimal performance in the prior art is selected as a comparison method, and compared with the matching time of the invention. Details of BENU methods can be found in the paper "Wang, zhaokang, et al," BENU: distributed subgraph enumeration with backtracking-based frame work, "2019IEEE 35th International Conference on Data Engineering (ICDE)," IEEE,2019 "
Table 3 is a comparison table of matching times of the present invention and BENU method on two sets of personal relationship graph data in this experimental verification.
TABLE 3 Table 3
As shown in Table 3, the efficiency of carrying out pattern matching of the labeled graph is higher than that of the BENU method, the matching time is shortened by 26% on average, and the efficiency is improved greatly.
Table 4 is a table comparing memory overhead and communication cost for the present invention and BENU method on two sets of personal relationship graph data in this experimental verification.
TABLE 4 Table 4
As shown in Table 4, the memory overhead and the communication cost required by the method for carrying out pattern matching of the tagged graph are lower than those of the BENU method, the memory overhead is reduced by 14% on average, the communication cost is reduced by 30% on average, and the resource consumption is greatly reduced.
In summary, the invention realizes the effective improvement of three performance evaluation indexes of matching efficiency, memory overhead and communication cost by using a calculation model based on a task as a center and an adaptive matching path selection technology, and has good application prospect.
While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims (4)

1. The pattern matching method for the labeled graph in the distributed environment is characterized by comprising the following steps of:
s1: obtaining graph data needing to be subjected to pattern matching of a labeled graph, wherein the graph data comprises a data graph and a labeled pattern graph, the data graph is an undirected graph containing node IDs, node label information and node association relations, and the labeled pattern graph comprises node labels and the node association relations;
s2: note that the master node in the distributed environment is M, and the slave node is S n N=1, 2, …, N represents the number of slave nodes, the master node M divides the data graph obtained in step S1 into N pieces of node data, each piece of node data contains a plurality of pieces of node information, each piece of node information includes a node ID, node tag information and a node association relationship, and then distributes each piece of node data to a corresponding slave node S n The method comprises the steps of carrying out a first treatment on the surface of the At the same time, the master node M respectively transmits the pattern diagram with the label to each slave node S n
S3: each slave node S n After receiving the node data and the labeled pattern diagram, the label in the labeled pattern diagram is marked as f k K=1, 2, …, K representing the number of labels in the labeled pattern graph, slave node S n In the received node data, the statistics results in a label f k Node set phi of (2) n,k Then, a node set with the least number of nodes is selected as a root node set R n The method comprises the steps of carrying out a first treatment on the surface of the Root node set R n The number of the nodes is D n Then slave node S n Respectively by root node set R n One of the nodes is used as a root node to execute D n The graph pattern matching task feeds back the obtained graph pattern matching result to the master node M;
s4: the master node M gathers the graph pattern matching results fed back by each slave node, eliminates repeated graph pattern matching results, and obtains a final graph pattern matching result set.
2. The method for matching patterns with labels in distributed environment according to claim 1, wherein in the step S2, when node data is distributed, a hash calculation is performed on each node ID in the data graph, a hash function is hash (ID)% N, a hash value corresponding to each node ID is obtained, and nodes with the same hash value are divided into the same piece of node data.
3. The method for pattern matching of a tagged graph in a distributed environment according to claim 1, wherein the execution flow of the pattern matching task in step S3 comprises the steps of:
s3.1: let node serial number i=0, determine the root node label according to the root node of the current pattern matching taskLet node set B 0 A set formed by the root nodes of the current graph pattern matching task;
s3.2: acquiring labels in a pattern diagram asAdjacent node candidate label set a of nodes of (a) i+1 Record candidate tag set A i+1 The p-th tag in (b) is u p ,p=1,2,…,P i+1 ,P i+1 Representing candidate tag set A i+1 Number of candidate tags; firstly, calculating to obtain candidate label u in the obtained data graph p Communication cost of all corresponding nodes pull (u p ):
cost pull (u p )=[C(u p )-C local (u p )]×W remote +C local (u p )×W local
Wherein C (u) p ) Tag u in data graph representing slave node queried by master node p The number of all corresponding nodes, C local (u p ) The representation is located at the present timeThe label local to the former slave node is u p Is the number of nodes, W local Representing the unit communication cost, W, required to obtain the local node data of the slave node remote Representing a unit communication cost required for acquiring node data from other slave nodes;
calculate the candidate tag u p Traffic cost reduced by subsequent unmatched occurrence of next node label on matched path stop (u p ):
Wherein ρ is p Representing candidate tag u p The probability of unmatched subsequent occurrence when the next node label on the matching path is used as a calculation formula is as follows:
the next label on the matching path is calculated by adopting the following calculation formulaCost function cost (u) p ):
cost(u p )=cost pull (u p )-cost stop (u p )
Selecting candidate tag set A i+1 The label with the smallest cost function as the label of the next nodeThe slave node obtains the label of +.A. in the data graph through the inquiry of the master node>And the adjacent node belongs to node set B i Form node set B i+1
S3.3: judging whether i is less than T-1, wherein T represents the number of nodes in the pattern diagram with the label, if so, entering a step S3.4, otherwise, entering a step S3.5;
s3.4: let i=i+1, return to step S3.2;
s3.5: with node set B T-1 Each node in the graph is used as a final node, and a matching path corresponding to the final node is obtained by backtracking, so that a graph pattern matching result is obtained.
4. The method according to claim 1, wherein in step S4, the master node M performs task scheduling on the slave nodes to implement load balancing, that is, after a certain slave node performs all the graph pattern matching tasks, sends a schedulable message to the master node M, and after receiving the schedulable message, the master node M initiates a query to the slave nodes that have not completed the graph pattern matching tasks, and the query obtains the graph pattern matching tasks that have not yet been performed, and then reassigns the graph pattern matching tasks to the schedulable slave nodes.
CN202110570428.XA 2021-05-25 2021-05-25 Method for matching patterns of labeled graph in distributed environment Active CN113282415B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110570428.XA CN113282415B (en) 2021-05-25 2021-05-25 Method for matching patterns of labeled graph in distributed environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110570428.XA CN113282415B (en) 2021-05-25 2021-05-25 Method for matching patterns of labeled graph in distributed environment

Publications (2)

Publication Number Publication Date
CN113282415A CN113282415A (en) 2021-08-20
CN113282415B true CN113282415B (en) 2023-10-31

Family

ID=77281415

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110570428.XA Active CN113282415B (en) 2021-05-25 2021-05-25 Method for matching patterns of labeled graph in distributed environment

Country Status (1)

Country Link
CN (1) CN113282415B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110727760A (en) * 2019-09-08 2020-01-24 天津大学 Method for carrying out distributed regular path query on large-scale knowledge graph

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7174328B2 (en) * 2003-09-02 2007-02-06 International Business Machines Corp. Selective path signatures for query processing over a hierarchical tagged data structure
GB2541231A (en) * 2015-08-13 2017-02-15 Fujitsu Ltd Hybrid data storage system and method and program for storing hybrid data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110727760A (en) * 2019-09-08 2020-01-24 天津大学 Method for carrying out distributed regular path query on large-scale knowledge graph

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向大规模图数据的分布式子图匹配算法;许文;宋文爱;富丽贞;吕伟;;计算机科学(04);全文 *

Also Published As

Publication number Publication date
CN113282415A (en) 2021-08-20

Similar Documents

Publication Publication Date Title
Guo et al. Community discovery by propagating local and global information based on the MapReduce model
Chen et al. Parallel DBSCAN with priority r-tree
CN102915347A (en) Distributed data stream clustering method and system
Liu et al. Resource-constrained federated edge learning with heterogeneous data: Formulation and analysis
US9922133B2 (en) Live topological query
Du Energy analysis of Internet of things data mining algorithm for smart green communication networks
Li et al. Scalable Graph500 design with MPI-3 RMA
CN106599190A (en) Dynamic Skyline query method based on cloud computing
Li et al. Parallel skyline queries over uncertain data streams in cloud computing environments
Xu The analytics and applications on supporting big data framework in wireless surveillance networks
Souravlas et al. Hybrid CPU-GPU community detection in weighted networks
CN113282415B (en) Method for matching patterns of labeled graph in distributed environment
Chen et al. Targeted influence maximization based on cloud computing over big data in social networks
Faysal et al. Hypc-map: A hybrid parallel community detection algorithm using information-theoretic approach
Wan et al. Dgs: Communication-efficient graph sampling for distributed gnn training
Jin et al. Mpmatch: a multi-core parallel subgraph matching algorithm
CN109254844B (en) Triangle calculation method of large-scale graph
Li et al. Parallel k-dominant skyline queries over uncertain data streams with capability index
CN113240089B (en) Graph neural network model training method and device based on graph retrieval engine
Yuan et al. Gcache: neighborhood-guided graph caching in a distributed environment
CN105354243B (en) The frequent probability subgraph search method of parallelization based on merger cluster
Jin et al. A data-locality-aware task scheduler for distributed social graph queries
Awekar et al. Parallel all pairs similarity search
Zhang et al. A dynamic management method of domestic internet of things based on cloud computing architecture
Niblack et al. Generating connected skeletons for exact and approximate reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant