CN113505278A - Graph matching method and device, electronic equipment and storage medium - Google Patents
Graph matching method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN113505278A CN113505278A CN202110727102.3A CN202110727102A CN113505278A CN 113505278 A CN113505278 A CN 113505278A CN 202110727102 A CN202110727102 A CN 202110727102A CN 113505278 A CN113505278 A CN 113505278A
- Authority
- CN
- China
- Prior art keywords
- graph
- node
- matching
- subgraph
- initial pattern
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003860 storage Methods 0.000 title claims abstract description 93
- 238000000034 method Methods 0.000 title claims abstract description 81
- 238000012545 processing Methods 0.000 claims abstract description 29
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 24
- 238000010586 diagram Methods 0.000 claims description 62
- 230000014509 gene expression Effects 0.000 claims description 29
- 238000000354 decomposition reaction Methods 0.000 claims description 12
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 8
- 238000010367 cloning Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 19
- 230000000875 corresponding effect Effects 0.000 description 57
- 230000006870 function Effects 0.000 description 12
- 230000006835 compression Effects 0.000 description 8
- 238000007906 compression Methods 0.000 description 8
- 238000012360 testing method Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000008520 organization Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000011960 computer-aided design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000003012 network analysis Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application provides a graph matching method, a graph matching device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a graph matching statement, and analyzing the graph matching statement to obtain an initial pattern graph; determining a root node from the initial pattern graph, and decomposing the initial pattern graph based on the root node to obtain a target star subgraph; accessing nodes in the data graph through an iterator interface provided by a storage engine, and matching star subgraphs in the data graph to obtain a compressed matching result corresponding to the target star subgraph; and processing the matching result corresponding to the star subgraph by a parallel pipeline type multi-path connection algorithm to obtain the matching result of the initial pattern graph. By analyzing the matching statements and utilizing the target star subgraph for matching, a storage mode taking the node as the center can avoid random access to a disk in the matching process, the matching efficiency is improved, and efficient graph matching is realized.
Description
Technical Field
The present application relates to the field of graph data technologies, and in particular, to a graph matching method and apparatus, an electronic device, and a storage medium.
Background
With the development of large-scale graph data represented by social networks, efficient management and analysis of large-scale graph data has become a common challenge for both the industry and academia. In the graph model, because the attribute graph has good expression capability, the actual problem can be conveniently modeled, and the attribute graph model becomes the de facto standard of a graph database. In graph computation problems, graph matching requires finding all eligible subgraphs that can be matched with a pattern graph in a large data graph. Many scenes such as recommendation systems, electronic circuit computer aided design, protein relationship network analysis and the like need to solve the problem of graph matching, and graph matching is also the basis for graph databases to complete other complex operations, and the efficient processing of the problem of attribute graph matching is particularly important.
However, the current graph matching involves complex calculation, and the graph matching is difficult to be realized efficiently.
Disclosure of Invention
The embodiment of the application provides a graph matching method, a graph matching device, electronic equipment and a storage medium, and can effectively solve the problem that graph matching is difficult to achieve efficiently.
According to a first aspect of embodiments of the present application, there is provided a graph matching method, including: acquiring a graph matching statement, and analyzing the graph matching statement to obtain an initial pattern graph; determining a root node from the initial pattern graph, and decomposing the initial pattern graph based on the root node to obtain a target star subgraph, wherein the target star subgraph comprises the root node, leaf nodes adjacent to the root node in the initial pattern graph and a push-down constraint condition; accessing nodes in a data graph through an iterator interface provided by a storage engine, matching the target star subgraph in the data graph to obtain a compressed matching result corresponding to the target star subgraph, and storing the data graph on an external memory by the storage engine in a storage mode taking the nodes as centers in advance to obtain the compressed matching result; and processing the matching result corresponding to the target star subgraph by a parallel pipeline type multi-path connection algorithm to obtain the matching result of the initial pattern graph.
According to a second aspect of embodiments of the present application, there is provided a graph matching apparatus, including: the analysis module is used for acquiring the graph matching statement and analyzing the graph matching statement to obtain an initial pattern graph; the decomposition module is used for determining a root node from the initial pattern graph and decomposing the initial pattern graph based on the root node to obtain a target star subgraph, wherein the target star subgraph comprises the root node, leaf nodes adjacent to the root node in the initial pattern graph and a push-down constraint condition; the matching module is used for accessing nodes in a data graph through an iterator interface provided by a storage engine, matching the target star subgraph in the data graph to obtain a compressed matching result corresponding to the target star subgraph, and storing the data graph on an external memory by the storage engine in a storage mode with the nodes as centers in advance to obtain the compressed matching result; and the processing module is used for processing the matching result corresponding to the target star subgraph through a parallel pipeline type multipath connection algorithm to obtain the matching result of the initial pattern graph.
According to a third aspect of embodiments of the present application, there is provided an electronic device comprising one or more processors; a memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method as applied to an electronic device, as described above.
According to a fourth aspect of the embodiments of the present application, there is provided a computer-readable storage medium having a program code stored therein, wherein the method described above is performed when the program code runs.
By adopting the graph matching method provided by the embodiment of the application, graph matching sentences are obtained and are analyzed to obtain an initial pattern graph; determining a root node from the initial pattern graph, and decomposing the initial pattern graph based on the root node to obtain a target star subgraph; accessing nodes in a data graph through an iterator interface provided by a storage engine, matching the target star subgraph in the data graph to obtain a compressed matching result corresponding to the target star subgraph, and storing the data graph on an external memory by the storage engine in a storage mode taking the nodes as centers in advance to obtain the compressed matching result; and processing the matching result corresponding to the target star subgraph by a parallel pipeline type multi-path connection algorithm to obtain the matching result of the initial pattern graph. By analyzing the matching statement and utilizing the target star subgraph for matching, a storage mode taking the node as the center can avoid random access to a disk in the matching process, the matching efficiency is improved, efficient graph matching is realized, the compressed matching result is processed by a parallel pipeline type connection algorithm through compressing the matching result to obtain a final result, and the memory consumption can be effectively reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a process flow diagram of a graph matching method provided by an embodiment of the present application;
FIG. 2 is a flow chart of a graph matching method provided by an embodiment of the present application;
FIG. 3 is a flow chart of a graph matching method provided by another embodiment of the present application;
FIG. 4 is a diagram illustrating an initial schema graph obtained by analyzing graph description statements according to an embodiment of the present application;
FIG. 5 is a flow chart of a graph matching method according to yet another embodiment of the present application;
fig. 6 is a schematic diagram of a target star subgraph obtained based on the initial schema graph decomposition according to an embodiment of the present application;
FIG. 7 is a flow chart of a graph matching method provided in accordance with yet another embodiment of the present application;
FIG. 8 is a schematic diagram illustrating a data organization format on a disk of a node-centric storage method according to an embodiment of the present application;
FIG. 9 is a schematic illustration of a data graph provided in accordance with an embodiment of the present application;
FIG. 10 is a schematic diagram of the logical structure of the data graph provided in FIG. 9 under a node-centric storage approach;
FIG. 11 is a diagram illustrating intermediate (uncompressed, compressed) results of the target star subgraph matching and results obtained by using a join algorithm according to an embodiment of the present application;
FIG. 12 is a schematic diagram of a test provided by one embodiment of the present application;
FIG. 13 is a test result for different data sets on Graphflow for the schema diagram provided in FIG. 12;
FIG. 14 is a test result corresponding to a different data set on Neo4j for the schema provided in FIG. 12;
FIG. 15 is a graph of compression ratios for intermediate results for different data sets and patterns provided by one embodiment of the present application;
FIG. 16 is a functional block diagram of a graph matching apparatus provided in accordance with one embodiment of the present application;
fig. 17 is a block diagram of an electronic device for executing a graph matching method according to an embodiment of the present application.
Detailed Description
With the development of large-scale graph data represented by social networks, efficient management and analysis of large-scale graph data has become a common challenge for both the industry and academia. In the graph model, because the attribute graph has good expression capability, the actual problem can be conveniently modeled, and the attribute graph model becomes the de facto standard of a graph database. In graph computation problems, graph matching requires finding all eligible subgraphs that can be matched with a pattern graph in a large data graph. Many scenarios, such as recommendation systems, electronic circuit computer aided design, protein relational network analysis, etc., require solving graph matching problems, and graph matching is also the basis for graph databases to accomplish other complex operations. It is particularly important to efficiently handle the attribute map matching problem.
However, there are a number of problems with existing graph matching systems: 1) an industrial level graph database, such as Neo4j, can easily describe the attribute map matching problem through the Cypher language, but its performance is poor; 2) the academic world provides a plurality of graph matching systems for optimizing the performance, but due to the complexity of the attribute graph in storage and calculation, the systems are simplified excessively, and the optimization aiming at a simplified model is difficult to apply to the attribute graph matching problem; 3) due to the complexity of the graph matching problem, the existing system needs a large amount of memory resources, which hinders the application of the system.
In order to solve these problems, the inventors found that the problem of random access to a disk caused by the existing outgoing and incoming edge separation storage method can be avoided by using a storage method with nodes as centers, and the storage problem of an attribute map is solved. Through star subgraph decomposition and graph matching statement optimization, intermediate results are reduced as early as possible, and memory overhead in graph matching can be reduced through compression and pipelined join.
Therefore, the embodiment of the application provides a graph matching method, which comprises the steps of obtaining graph matching statements, and analyzing the graph matching statements to obtain an initial pattern graph; determining a root node from the initial pattern graph, and decomposing the initial pattern graph based on the root node to obtain a target star subgraph; accessing nodes in a data graph through an iterator interface provided by a storage engine, matching the target star subgraph in the data graph to obtain a compressed matching result corresponding to the target star subgraph, and storing the data graph on an external memory by the storage engine in a storage mode taking the nodes as centers in advance to obtain the compressed matching result; and processing the matching result corresponding to the target star subgraph by a parallel pipeline type multi-path connection algorithm to obtain the matching result of the initial pattern graph. By analyzing the matching statement and utilizing the target star subgraph for matching, a storage mode taking the node as the center can avoid random access to a disk in the matching process, the matching efficiency is improved, efficient graph matching is realized, the compressed matching result is processed by a parallel pipeline type connection algorithm through compressing the matching result to obtain a final result, and the memory consumption can be effectively reduced.
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Generally, the graph matching involves a data graph and a pattern graph, wherein the pattern graph includes a small number of points and edges, which are described by attribute graph matching statements for users, and the data graph generally refers to data generated and collected in production and life, such as data collected from a social network and having a large scale. Graph matching is to find all sub-graphs in the data graph that match the pattern graph. Referring to fig. 1, a flowchart of a graph matching method according to an embodiment of the present application is shown. Firstly, graph matching statements can be obtained, the graph matching statements are analyzed through an analyzer, and predicates are pushed down to a planner; decomposing the initial pattern graph into star subgraphs by the planner; and performing star subgraph matching by using a data graph storage engine taking a node as a center to obtain a compressed matching result, and then performing parallel pipeline connection and decompression to obtain a final matching result.
Referring to fig. 2 in detail, a diagram matching method provided in the embodiment of the present application is shown, and may be applied to an electronic device, where the electronic device may be a smart phone, a computer, a server, or the like, and the method may specifically include the following steps.
When performing graph matching, an initial pattern graph needs to be obtained first, and the initial pattern graph can be obtained by analyzing a graph matching statement. The graph matching statement is input by a user, so that the graph matching statement can be obtained, and after the graph matching statement is obtained, the graph matching statement can be further analyzed to obtain the initial pattern graph.
Generally, the graph matching statement comprises a pattern graph description part and a constraint condition description part, and when the graph matching statement is analyzed, the constraint condition description part can be converted into a form of a predicate expression through the Morgan law; classifying the predicate expression according to a node constraint condition, an edge constraint condition and a plurality of node constraint conditions; and taking the predicate expressions corresponding to the node constraint conditions and the edge constraint conditions as push-down constraint conditions, and constructing the initial pattern diagram based on the push-down constraint conditions and the pattern diagram description part.
After obtaining the initial pattern graph, a root node may be determined from the initial pattern graph, and the initial pattern graph may be decomposed into a target star subgraph based on the root node, where the target star subgraph includes the root node, leaf nodes adjacent to the root node in the initial pattern graph, and a push-down constraint.
When the initial pattern diagram is decomposed into the target star subgraph, the initial pattern diagram can be cloned to obtain a copy pattern diagram; determining a root node from the initial pattern graph; acquiring leaf nodes adjacent to the root node and a push-down constraint condition in the initial pattern graph, and constructing a star subgraph based on the root node, the leaf nodes and the push-down constraint condition; and deleting the root node and the edge adjacent to the root node from the copy mode graph until no node with out degree or in degree of 0 exists in the copy mode graph or no node exists in the copy mode graph, and taking the star subgraph as a target star subgraph.
In some embodiments, when the number of root nodes determined from the initial pattern graph is greater than 1, decomposing the initial pattern graph based on each root node may obtain a target star subgraph corresponding to each root node. For example, the number of root nodes determined from the initial pattern graph is 5, and a target star subgraph corresponding to the 5 root nodes can be obtained.
And step 130, accessing nodes in a data graph through an iterator interface provided by a storage engine, matching the target star subgraph in the data graph to obtain a compressed matching result corresponding to the target star subgraph, and storing the data graph on an external memory by the storage engine in a storage mode with the nodes as centers in advance to obtain the compressed matching result.
After the initial pattern diagram is decomposed to obtain star subgraphs, the star subgraphs can be matched with the data diagram to obtain a compressed matching result corresponding to the star subgraphs.
The storage engine is obtained by storing the data graph on an external memory, and the data graph is stored in a mode of taking a node as a center. The storage mode taking the node as the center is adopted, namely, a data graph is obtained firstly; storing each node in a global index mode according to the node label of each node in the data graph; for each node, determining an adjacent node of the node and an edge adjacent to the node from the data graph, and storing the adjacent node of the node and the edge adjacent to the node in a local index mode according to the node label of the adjacent node of the node. In this storage mode, nodes with the same node label are stored continuously, and given a node label, all nodes with the node label can be found through the global index. For each node in the data graph, information such as the number of bytes in space occupied by node information, node ID, degree of entry, degree of exit and the like is recorded, the adjacent nodes of each node are grouped by node labels, the adjacent nodes of the nodes can be quickly searched according to the node labels through local indexes, and the information of the edge of each adjacent node is directly stored with the adjacent nodes, so that the information recording of the degree of exit and the degree of entry can be realized, and more topology information is reserved.
The storage engine provides two iterator interfaces, and the storage engine can be accessed through the two interfaces, namely the nodes in the data graph are accessed, the target star subgraph is matched in the data graph, and the compressed matching result corresponding to the target star subgraph is further obtained. Specifically, the matching result obtained by matching the target star subgraph can be compressed and stored in a mode of delaying Cartesian product and combining equivalent nodes, so that the intermediate result in graph matching is reduced, and the memory overhead is reduced.
And 140, processing the matching result corresponding to the target star subgraph through a parallel pipeline type multi-path connection algorithm to obtain the matching result of the initial pattern graph.
After the matching result corresponding to the target star subgraph is obtained, the matching result can be processed by adopting a parallel pipeline type multi-path connection algorithm to obtain the matching result with the initial pattern graph.
According to the graph matching method, graph matching sentences are obtained and analyzed to obtain an initial pattern graph; determining a root node from the initial pattern graph, and decomposing the initial pattern graph based on the root node to obtain a target star subgraph; accessing nodes in a data graph through an iterator interface provided by a storage engine, matching the target star subgraph in the data graph to obtain a compressed matching result corresponding to the target star subgraph, and storing the data graph on an external memory by the storage engine in a storage mode taking the nodes as centers in advance to obtain the compressed matching result; and processing the matching result corresponding to the target star subgraph by a parallel pipeline type multi-path connection algorithm to obtain the matching result of the initial pattern graph. By analyzing the matching statement and utilizing the target star subgraph for matching, a storage mode taking the node as the center can avoid random access to a disk in the matching process, the matching efficiency is improved, efficient graph matching is realized, the compressed matching result is processed by a parallel pipeline type connection algorithm through compressing the matching result to obtain a final result, and the memory consumption can be effectively reduced.
Referring to fig. 3, another embodiment of the present application provides a graph matching method, which focuses on the process of analyzing graph matching statements to obtain an initial pattern graph based on the foregoing embodiments.
And 210, converting the constraint condition description part into a form of a predicate expression through Morgan law.
A user can interact with the electronic equipment through the graph matching sentences to realize the function of graph matching. The graph matching statement refers to a statement that can describe a pattern graph that a user wants to match, and for example, the graph matching statement may be a Cypher graph query statement. After receiving the graph matching statement, the constraint condition description part in the graph matching statement may be processed after performing necessary syntax check and type check on the graph matching statement. Specifically, the syntax check and the type check may be performed in a conventional manner in the prior art, and are not particularly limited herein.
Taking a Cypher graph query statement as an example, the Cypher graph query statement generally includes a pattern graph description part (MATCH statement) AND a constraint condition description part (WHERE statement), after syntax checking AND type checking pass, the WHERE statement is analyzed, AND the WHERE statement is converted into a form of a predicate expression by morgan law, WHERE the predicate expression is composed of expressions, operators, AND values, AND the predicate expressions may be connected by AND or other symbols, which is not specifically limited herein, AND the following description takes the form of AND connecting the predicate expressions as an example, AND for example, a graph matching statement is:
MATCH(u1:Person)-[:FOLLOWS]->(u2:Person)-[:FOLLOWS]->(u1),
(u1)-[:FOLLOWS]->(u3:Person)-[:FOLLOWS]->(u1),
(u1)-[:REPOSTS]->(u4:Media),
(u2)-[:LIKES]->(u4)<-(u3)
WHERE u2>u1 AND NOT(u3<=u1 OR u4>=8)
wherein, the WHERE statement is "WHERE u2> u1 AND NOT (u3< ═ u1 OR u4> -8)", AND after the conversion by the morgan law, is "WHERE u2> u1 AND u3> u1 AND u4< 8".
And step 220, classifying the predicate expressions according to node constraints, edge constraints and a plurality of node constraints.
After the constraint is partially converted into the form of the predicate expression, each predicate expression can be classified according to a node constraint, an edge constraint, and a plurality of node constraints. The node constraint is a predicate expression relating to one node, the edge constraint is a predicate expression relating to only one edge (two adjacent nodes), and the sum node constraint is an expression relating to a plurality of nodes. As described in the foregoing example, the predicate expressions in the WHERE statement after the conversion are "u 2> u 1", "u 3> u 1" and "u 4< 8", WHERE the node constraint is "u 4< 8", and the edge constraint is: "u 2> u 1" and "u 3> u 1".
And step 230, taking the predicate expressions corresponding to the node constraint conditions and the edge constraint conditions as push-down constraint conditions, and constructing the initial pattern diagram based on the push-down constraint conditions and the pattern diagram description part.
And taking predicate expressions corresponding to the node constraint and the edge constraint as the push-down constraints, namely "u 2> u 1", "u 3> u 1" and "u 4< 8" in the foregoing example are all push-down constraints, so that the initial pattern diagram can be partially constructed based on the push-down constraints and the pattern diagram description. Referring to FIG. 4, a diagram of an initial schema graph resulting from analysis of a graph describing statement is shown. In fig. 4, Person and Media represent node labels, and FOLLOWS, like, REPOSTS represent edge labels, which both appear in the schema graph description part of the graph matching statement.
And 250, accessing nodes in a data graph through an iterator interface provided by a storage engine, matching the target star subgraph in the data graph to obtain a compressed matching result corresponding to the target star subgraph, and storing the data graph on an external memory by the storage engine in a storage mode with the nodes as centers in advance to obtain the compressed matching result.
And step 260, processing the matching result corresponding to the target star subgraph through a parallel pipeline type multipath connection algorithm to obtain the matching result of the initial pattern graph.
The steps 240 to 260 can refer to the corresponding parts of the previous embodiments, and are not described herein again.
According to the graph matching method provided by the embodiment of the application, the constraint condition description part is converted into a form of a predicate expression through Morgan law; classifying the predicate expression according to a node constraint condition, an edge constraint condition and a plurality of node constraint conditions; and taking the predicate expressions corresponding to the node constraint conditions and the edge constraint conditions as push-down constraint conditions, and constructing the initial pattern diagram based on the push-down constraint conditions and the pattern diagram description part. The analysis of the graph matching sentences can support the efficient execution of the graph matching sentences, compared with the prior art that the graph model is simplified excessively, the elements such as labels, directions of edges or multiple edges in the graph are ignored, and the system fully supports the attribute graph model. And then decomposing the initial pattern graph to obtain a target star subgraph, matching by using the target star subgraph, avoiding random access to a disk in matching by using a node-centered storage mode, improving matching efficiency, and processing a compressed matching result by using a parallel pipeline type connection algorithm through compressing the matching result to obtain a final result, so that the memory consumption can be effectively reduced.
Referring to fig. 5, a graph matching method is provided in a further embodiment of the present application, which focuses on the process of decomposing an initial schema graph to obtain a star subgraph based on the foregoing embodiment.
Step 310 may refer to corresponding parts of the foregoing embodiments, and will not be described herein.
And step 320, cloning the initial pattern diagram to obtain a copy pattern diagram.
After the initial pattern diagram is obtained, the initial pattern diagram may be cloned to obtain a duplicate pattern diagram for subsequent use.
The initial pattern graph includes a plurality of nodes, and a root node can be determined from the plurality of nodes. Specifically, the node with a larger sum of out-degree and in-degree, a larger number of push-down constraint conditions, and a lower frequency of node label occurrence in the data graph may be preferentially selected as the root node. Specifically, the node value corresponding to each node in the initial mode map may be calculated; and determining the node corresponding to the maximum node value as the root node. The larger the node value, the more the node meets the condition of the root node.
The node value corresponding to each node in the initial mode map may be calculated using the following formula:
wherein u represents a node of the initial pattern graph; freq (u.label) represents the frequency of occurrence of the label of the node u in the data graph node label; d+Representing the out degree of the node u; d-Representing the degree of entry of the node u; | ψ (u) | represents the number of the push-down constraint conditions relating to the node u; m represents the number of edges in the initial pattern diagram; v (p) represents all nodes in the initial pattern graph; u. ofiRepresenting any node in the initial pattern graph; phi (u)i) I represents and node uiThe number of associated push down constraints.
As shown in fig. 4, the obtained initial pattern diagram has 4 nodes in total, and the node values corresponding to the 4 nodes can be obtained after calculation according to the formula. In this way, the nodes with the maximum node values, that is, the root nodes are the node u1 and the node u4, can be determined.
After a root node is determined from the initial schema graph, leaf nodes and push-down constraints adjacent to the root node may be obtained from the initial schema graph, and a star subgraph may be constructed based on the following node, the leaf nodes and the push-down constraints.
And 350, deleting the root node and the edge adjacent to the root node from the copy mode graph until no node with out degree or in degree of 0 exists in the copy mode graph or the star subgraph is taken as a target star subgraph when no node exists in the copy mode graph.
After the star subgraph is constructed, the root node and the edges adjacent to the root node can be deleted from the copy mode graph until no node with out degree and in degree of 0 exists in the copy mode graph or when no node exists in the copy mode graph, the previously obtained star subgraph is taken as a target star subgraph. After the root node and the adjacent edges of the root node are deleted, the nodes u2, u3, u4, the edges between the node u2 and the node u4, and the edges between the node u3 and the node u4 are left in the copy mode graph, and at this time, no node with out degree and in degree of 0 exists in the copy mode graph. Referring to fig. 6, a schematic diagram of a target star subgraph obtained based on the initial schema graph decomposition is shown. In the initial pattern graph P in fig. 6, leaf nodes adjacent to the node u1 are u2, u3 and u4, and the push-down constraints are "u 2> u 1", "u 3> u 1", "u 4< 8", so that a target star sub-graph S1 can be constructed, and the target star sub-graph S1 corresponds to the root node u 1.
In some embodiments, if there is a node with an out-degree or an in-degree of 0 in the copy mode graph after the deletion operation, the node with the out-degree or the in-degree of 0 is deleted, and the star subgraph is taken as the target star subgraph.
And deleting the root node and the edge corresponding to the root node from the replica pattern graph, so that the topology information in the initial pattern graph can be retained to the maximum extent, more topology information contained in the target star subgraph obtained by decomposition is obtained, and more topology information means less invalid matching in the intermediate result.
It should be noted that the root nodes determined in the initial pattern diagram include u1 and u4, and when the initial pattern diagram is decomposed to obtain the target star subgraph, one root node is selected each time. That is, after the target star subgraph of the root node u1 is decomposed, u4 may be selected again, and the above steps 340 to 350 are repeated to decompose the target star subgraph of the root node u4, specifically, refer to the target star subgraph S2 shown in fig. 6, corresponding to the root node u 4. The obtained target star subgraph comprises all necessary topological information in the initial pattern graph and the push-down constraint condition.
And 360, accessing nodes in a data graph through an iterator interface provided by a storage engine, matching the target star subgraph in the data graph to obtain a compressed matching result corresponding to the target star subgraph, and storing the data graph on an external memory by the storage engine in a storage mode with the nodes as centers in advance to obtain the compressed matching result.
And 370, processing the matching result corresponding to the target star subgraph through a parallel pipeline type multi-path connection algorithm to obtain the matching result of the initial pattern graph.
According to the graph matching method provided by the embodiment of the application, the initial pattern graph is cloned to obtain the copy pattern graph, the target star subgraph is obtained through decomposition based on the copy pattern graph, and the topological information in the initial pattern graph can be retained to the maximum extent, so that more topological information in the decomposed target star subgraph can be ensured, further, the invalid matching of intermediate results can be reduced in the subsequent matching process, and the graph matching efficiency is improved.
Referring to fig. 7, another embodiment of the present application provides a graph matching method, which focuses on the process of matching a target star subgraph and obtaining a matching result of an initial pattern graph based on the foregoing embodiment, and specifically includes the following steps.
The corresponding parts of the foregoing embodiments can be referred to in steps 410 to 420, which are not described herein again.
When the target star subgraph is matched, nodes in a data graph can be accessed through an iterator interface provided by a storage engine, and the star subgraphs are matched in the data graph to obtain a compressed matching result corresponding to the target star subgraph. And the storage engine stores the data graph on an external memory in a storage mode taking a node as a center in advance to obtain the data graph. Specifically, when the data map is stored, the data map may be acquired; storing each node on an external memory in a global index mode according to the node label of each node in the data graph; for each node, determining an adjacent node of the node and an edge adjacent to the node from the data graph, and storing the adjacent node of the node and the edge adjacent to the node on an external memory in a local index mode according to a node label of the adjacent node of the node.
Referring to fig. 8, a schematic diagram of data organization format on a disk for a node-centric storage method is shown. In fig. 8, vid denotes a node ID, nid denotes an adjacent node ID, pos denotes a node address, npos denotes an adjacent node address, vlabel denotes a node label, nlabel denotes an adjacent node label, and elabel denotes an edge label; num _ n _ to _ v represents the number of directed edges from the adjacent node n to the node v, and num _ v _ to _ n represents the number of directed edges from the node v to the adjacent node n.
As shown in fig. 8, the storage engine obtained on the external storage using the node-centric storage manner for storing the data graph includes a global index and each node in the data graph, each node has a local index, and edges adjacent to the node are stored together with neighboring nodes of the node. In this storage mode, nodes with the same node label are stored continuously, and given a node label, all nodes with the label can be found through the global index. And recording the number of bytes in space occupied by the node information, the node ID, the in-degree information and the out-degree information for each node in the data graph. The adjacent nodes of each node are also grouped by the node labels, and the corresponding adjacent nodes can be quickly searched according to the labels through local indexes. For each adjacent node, the information of the edge is directly stored with the adjacent node, and the label information of the relevant incoming edge and outgoing edge is recorded.
Continuing to refer to fig. 9 and 10, fig. 9 shows a schematic diagram of a data graph, and fig. 10 shows a schematic diagram of a logical structure under the node-centric storage method of the data graph provided in fig. 9.
After the data graph is stored according to a node-centered storage method, two iterator interfaces are provided for the outside, namely a first iterator interface and a second iterator interface. Wherein, VertexIter gives a node label, finds out the corresponding node through the global index, and then sequentially traverses the nodes; the Neighboriter gives a node label to a node in a data graph accessed by the VertexIter, corresponding adjacent nodes are searched through local indexes, then the adjacent nodes are traversed sequentially, relevant outgoing and incoming information can be directly obtained, random access to an external memory can be avoided through sequential access, and graph matching performance is improved.
In some embodiments, the disk data organization format of the node-centric storage method may not be limited to the format shown in fig. 8, as long as vertextiter and NeighborIter can be efficiently implemented, and other data organization formats may also be adopted, the first iterator interface may also be an interface having the same function as vertextiter, and the second iterator interface may also be an interface having the same function as NeighborIter, which is not specifically limited herein.
Based on the storage engine, the target star subgraph can be matched in the following manner.
And step 440, when the root node in the target star subgraph has a node constraint condition, filtering the node by using the node constraint condition to obtain a candidate node.
In the embodiment of the present application, taking the first iterator interface as a VertexIter interface and the second iterator interface as a NeighborIter interface as an example for explanation, first, nodes having a root node tag in the storage engine may be traversed through a global index by using VertexIter, where the nodes are candidate nodes that may match the root node. When the root node in the target star subgraph has the node constraint condition, the node constraint condition can be used for filtering the node with the root node label to obtain the candidate node.
And 450, accessing the adjacent nodes of the candidate nodes in the target star subgraph through the local indexes by using a second iterator interface, filtering the adjacent nodes through the node constraint conditions or the edge constraint conditions when the adjacent nodes have the node constraint conditions or the edge constraint conditions, and compressing the filtered adjacent nodes to obtain a matching result corresponding to the target star subgraph.
After the candidate node is obtained, the neighbor nodes of the candidate node in the target star subgraph are accessed through the local index by using the Neighboriter, and because the information of the edge and the information of the neighbor nodes are stored together in the storage engine, the random access of a disk cannot be generated by the check of the neighbor nodes. And if the adjacent nodes have node constraint conditions or edge constraint conditions, further filtering the adjacent nodes by using the node constraint conditions or the edge constraint conditions, and compressing the filtered adjacent nodes to obtain a matching result corresponding to the target star subgraph. Referring to fig. 11, a schematic diagram of intermediate (uncompressed, compressed) results of the target star subgraph matching and results obtained after using the join algorithm is shown.
The matching results of the target star subgraph are stored column by column according to the nodes, as shown in fig. 11(b), here, since the nodes u2 and u3 have equivalence, the matching results of the two nodes are the same. It can be seen that the matching result record of our target star subgraph avoids the cartesian product calculation as shown in fig. 11(a), and is compressed and stored, and T1 and T2 are the compressed matching results obtained after matching the target star subgraph, which may also be referred to as compressed intermediate results.
And 460, processing the matching result corresponding to the target star subgraph through a parallel pipeline type multi-path connection algorithm to obtain the matching result of the initial pattern graph.
And after the matching results of all the target star subgraphs are obtained, processing the matching results corresponding to the target star subgraphs through parallel pipeline type multi-path join to obtain the matching results of the initial pattern graph. Assuming that the initial schema graph is decomposed into k target star subgraphs, the first k columns in the final result are the matching results of the root nodes of the target star subgraphs, followed by the results from other leaf node join. As shown in fig. 11(c), the first two columns of final results obtained after matching the result join of 2 target star subgraphs are matching of u1 and u4, respectively, followed by u2 and u 3. Unlike the traditional pairwise join approach, the use of pipelined multi-way joins can avoid instantiating intermediate results. Assuming that the root nodes of k target star subgraphs obtained by decomposition form a set Vc, the compressed representation of the intermediate results of these target star subgraphs is T1, T2.
According to the graph matching method provided by the embodiment of the application, the storage engine stores the data graph through a node-centered storage method, the two iterators access the interfaces, random access of a magnetic disk can be avoided in the matching process, the intermediate result is compressed based on target star subgraph matching, and the matching efficiency can be effectively improved and the memory overhead can be reduced by adopting parallel pipeline type multi-path join.
In order to verify that the method in the foregoing embodiment can achieve corresponding effects, a 64GB memory and an 800GB solid state disk may be designed, and the method is used to test 5 data sets. These 5 datasets are soc-Epinons (EP), web-Google (GO), web-BerkStan (BS), soc-Livejournal (LJ), com-Orkut (OK), respectively. Referring to fig. 12, a schematic diagram of the test used is shown. In fig. 12, the first 8 pattern diagrams do not contain multiple edges, and the last 4 contain multiple edges.
The results of the tests were compared to Graphflow and to the most widely used graph database Neo4j in the industry, where the upper time limit for each test was set to 25 minutes. Since the graph flow does not support the attribute graph with multiple edges, only the first 8 pattern graphs are run on the graph flow, and the test result is shown in fig. 13, it can be seen that the running speed of the graph matching method provided in the embodiment of the present application is generally better than that of the graph flow, the performance is improved by up to 26 times, and the method can process the matching tasks that the graph flow cannot complete due to insufficient memory (the pattern graphs 4, 6, and 8 of the LJ data set, and the pattern graph 8 of the OK data set).
The running speed of Neo4j is obviously much slower, the later 4 pattern graphs which are not supported by graphyflow are run on Neo4j, the experimental result is shown in fig. 14 (the OK data set does not contain polygons, so that the polygons are not listed in the figure), the graph matching method provided in the embodiment of the application can obtain the acceleration ratio which is 2100 times as high as possible, and the matching tasks (11 and 12 of BS and 10, 11 and 12 of LJ) which are failed to run due to insufficient memory can be processed.
Furthermore, the graph matching method provided by the embodiment of the application saves memory overhead, that is, when the target star subgraph is matched, the obtained matching result corresponding to the target star subgraph is a result compressed by a method of delaying cartesian product calculation and merging equivalent nodes, so that the storage overhead of intermediate results is saved, and a new intermediate result does not need to be stored in a subsequent join stage, thereby greatly saving memory. To verify this effect, the compression rate using the graph matching method provided by the embodiment of the present application was tested. The compression ratio is the ratio of the size of uncompressed data to the size of compressed data, reflecting the multiple of saving memory space. Referring to fig. 15, which shows the compression ratios for the intermediate results for the different data sets and pattern diagrams, in fig. 15, there are over 93% compression ratios greater than 10, over 73% compression ratios greater than 100, and over 59% compression ratios greater than 1000 for the different data sets and pattern diagrams in the previous experiment. Therefore, the graph matching method provided by the embodiment of the application greatly saves memory occupation.
Referring to fig. 16, an embodiment of the present application provides a graph matching apparatus 500, which can be applied to an electronic device, where the graph matching apparatus 500 includes an analysis module 510, a decomposition module 520, a matching module 530, and a processing module 540. The analysis module 510 is configured to obtain a graph matching statement, and analyze the graph matching statement to obtain an initial pattern graph; the decomposition module 520 is configured to determine a root node from the initial pattern graph, and decompose the initial pattern graph based on the root node to obtain a target star subgraph, where the target star subgraph includes the root node, a leaf node adjacent to the root node in the initial pattern graph, and a push-down constraint condition; the matching module 530 is configured to access a node in a data graph through an iterator interface provided by a storage engine, match the target star sub-graph in the data graph, and obtain a compressed matching result corresponding to the target star sub-graph, where the storage engine stores the data graph in an external memory in a storage manner with the node as a center in advance; the processing module 540 is configured to process the matching result corresponding to the target star sub-graph through a parallel pipeline type multi-path connection algorithm, so as to obtain the matching result of the initial pattern graph.
Further, the graph matching statement includes a pattern graph description part and a constraint description part, and the analysis module 510 is further configured to convert the constraint description part into a form of a predicate expression through the morgan law; classifying the predicate expression according to a node constraint condition, an edge constraint condition and a plurality of node constraint conditions; and taking the predicate expressions corresponding to the node constraint conditions and the edge constraint conditions as push-down constraint conditions, and constructing the initial pattern diagram based on the push-down constraint conditions and the pattern diagram description part.
Further, the decomposition module 520 is further configured to clone the initial pattern diagram to obtain a duplicate pattern diagram; determining a root node from the initial pattern graph; acquiring leaf nodes adjacent to the root node and a push-down constraint condition in the initial pattern graph, and constructing a star subgraph based on the root node, the leaf nodes and the push-down constraint condition; and deleting the root node and the edge adjacent to the root node from the copy mode graph until the star subgraph is taken as a target star subgraph when no node with out degree or in degree of 0 exists or no node exists in the copy mode graph.
Further, the decomposition module 520 is further configured to calculate a node value corresponding to each node in the initial mode map; and determining the node corresponding to the maximum node value as the root node.
Further, the node value corresponding to each node in the initial mode map is calculated by the following formula:
wherein u represents a node of the initial pattern graph; freq (u.label) represents the frequency of occurrence of the label of the node u in the data graph node label; d+Representing the out degree of the node u; d-Representing the degree of entry of the node u; | ψ (u) | represents the number of the push-down constraint conditions relating to the node u; m represents the number of edges in the initial pattern diagram; v (p) represents all nodes in the initial pattern graph; u. ofiRepresenting any node in the initial pattern graph; phi (u)i) I represents and node uiThe number of associated push down constraints.
Further, the matching module 530 is further configured to traverse nodes having root node labels in the storage engine through a global index using a first iterator interface; when the root node in the target star subgraph has a node constraint condition, filtering the node by using the node constraint condition to obtain a candidate node; and accessing adjacent nodes of the candidate nodes in the target star subgraph through a local index by using a second iterator interface, filtering the adjacent nodes through node constraint conditions or edge constraint conditions when the adjacent nodes have the node constraint conditions or the edge constraint conditions, and compressing the filtered adjacent nodes to obtain a matching result corresponding to the target star subgraph.
Further, the first iterator interface is a VertexIter interface, and the second iterator interface is a NeighborIter interface.
Further, the storage engine is obtained by: acquiring a data map; storing each node on an external memory in a global index mode according to the node label of each node in the data graph; for each node, determining an adjacent node of the node and an edge adjacent to the node from the data graph, and storing the adjacent node of the node and the edge adjacent to the node on an external memory in a local index mode according to a node label of the adjacent node of the node.
The graph matching device provided by the embodiment of the application acquires graph matching statements and analyzes the graph matching statements to obtain an initial pattern graph; determining a root node from the initial pattern graph, and decomposing the initial pattern graph based on the root node to obtain a target star subgraph; accessing nodes in a data graph through an iterator interface provided by a storage engine, matching the target star subgraph in the data graph to obtain a compressed matching result corresponding to the target star subgraph, and storing the data graph on an external memory by the storage engine in a storage mode taking the nodes as centers in advance to obtain the compressed matching result; and processing the matching result corresponding to the target star subgraph by a parallel pipeline type multi-path connection algorithm to obtain the matching result of the initial pattern graph. By analyzing the matching statement and utilizing the target star subgraph for matching, a storage mode taking the node as the center can avoid random access to a disk in the matching process, the matching efficiency is improved, efficient graph matching is realized, the compressed matching result is processed by a parallel pipeline type connection algorithm through compressing the matching result to obtain a final result, and the memory consumption can be effectively reduced.
It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working process of the above-described apparatus may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
Referring to fig. 17, an embodiment of the present application provides a block diagram of an electronic device, where the electronic device 600 includes a processor 610, a memory 620, and one or more applications, where the one or more applications are stored in the memory 620 and configured to be executed by the one or more processors 610, and the one or more programs are configured to perform the graph matching method.
The electronic device 600 may be a terminal device capable of running an application, such as a smart phone or a tablet computer, or may be a server. The electronic device 600 in the present application may include one or more of the following components: a processor 610, a memory 620, and one or more applications, wherein the one or more applications may be stored in the memory 620 and configured to be executed by the one or more processors 610, the one or more programs configured to perform the methods as described in the aforementioned method embodiments.
The processor 610 may include one or more processing cores. The processor 610 interfaces with various components throughout the electronic device 600 using various interfaces and circuitry to perform various functions of the electronic device 600 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 620 and invoking data stored in the memory 620. Alternatively, the processor 610 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 610 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 610, but may be implemented by a communication chip.
The Memory 620 may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). The storage 620 may refer to a memory, and may also refer to an external memory, such as a hard disk. The memory 620 may be used to store instructions, programs, code sets, or instruction sets. The memory 620 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The data storage area may also store data created during use by the electronic device 600 (e.g., phone books, audio-visual data, chat log data), and so forth.
The electronic equipment provided by the embodiment of the application acquires the graph matching statement and analyzes the graph matching statement to obtain the initial pattern graph; determining a root node from the initial pattern graph, and decomposing the initial pattern graph based on the root node to obtain a target star subgraph; accessing nodes in a data graph through an iterator interface provided by a storage engine, matching the target star subgraph in the data graph to obtain a compressed matching result corresponding to the target star subgraph, and storing the data graph on an external memory by the storage engine in a storage mode taking the nodes as centers in advance to obtain the compressed matching result; and processing the matching result corresponding to the target star subgraph by a parallel pipeline type multi-path connection algorithm to obtain the matching result of the initial pattern graph. By analyzing the matching statement and utilizing the target star subgraph for matching, a storage mode taking the node as the center can avoid random access to a disk in the matching process, the matching efficiency is improved, efficient graph matching is realized, the compressed matching result is processed by a parallel pipeline type connection algorithm through compressing the matching result to obtain a final result, and the memory consumption can be effectively reduced.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.
Claims (11)
1. A method of graph matching, the method comprising:
acquiring a graph matching statement, and analyzing the graph matching statement to obtain an initial pattern graph;
determining a root node from the initial pattern graph, and decomposing the initial pattern graph based on the root node to obtain a target star subgraph, wherein the target star subgraph comprises the root node, leaf nodes adjacent to the root node in the initial pattern graph and a push-down constraint condition;
accessing nodes in a data graph through an iterator interface provided by a storage engine, matching the target star subgraph in the data graph to obtain a compressed matching result corresponding to the target star subgraph, and storing the data graph on an external memory by the storage engine in advance according to a storage mode taking the nodes as centers;
and processing the matching result corresponding to the target star subgraph by a parallel pipeline type multi-path connection algorithm to obtain the matching result of the initial pattern graph.
2. The method of claim 1, wherein the graph matching statement comprises a pattern graph description part and a constraint condition description part, and wherein obtaining the graph matching statement and analyzing the graph matching statement to obtain an initial pattern graph comprises:
converting the constraint condition description part into a form of a predicate expression through the Morgan law;
classifying the predicate expression according to a node constraint condition, an edge constraint condition and a plurality of node constraint conditions;
and taking the predicate expressions corresponding to the node constraint conditions and the edge constraint conditions as push-down constraint conditions, and constructing the initial pattern diagram based on the push-down constraint conditions and the pattern diagram description part.
3. The method of claim 1, wherein determining a root node from the initial schema graph and decomposing the initial schema graph based on the root node to obtain a target star subgraph comprises:
cloning the initial pattern diagram to obtain a copy pattern diagram;
determining a root node from the initial pattern graph;
acquiring leaf nodes adjacent to the root node and a push-down constraint condition in the initial pattern graph, and constructing a star subgraph based on the root node, the leaf nodes and the push-down constraint condition;
and deleting the root node and the edge adjacent to the root node from the replica pattern graph until no node with out degree or in degree of 0 exists in the replica pattern graph or no node exists, and taking the star subgraph as the target star subgraph.
4. The method of claim 3, wherein determining a root node from the initial pattern graph comprises:
calculating a node value corresponding to each node in the initial mode graph;
and determining the node corresponding to the maximum node value as the root node.
5. The method of claim 4, wherein the node value corresponding to each node in the initial schema graph is calculated by the following formula:
wherein u represents a node of the initial pattern graph; freq (u.label) represents the frequency of occurrence of the label of the node u in the data graph node label; d+Representing the out degree of the node u; d-Representing the degree of entry of the node u; | ψ (u) | represents the number of the push-down constraint conditions relating to the node u; m represents the number of edges in the initial pattern diagram; v (p) represents all nodes in the initial pattern graph; u. ofiRepresenting any node in the initial pattern graph; phi (u)i) I represents and node uiThe number of associated push down constraints.
6. The method of claim 1, wherein the accessing nodes in a data graph through an iterator interface provided by a storage engine, and matching the target star sub-graph in the data graph to obtain a compressed matching result corresponding to the target star sub-graph comprises:
traversing nodes in the storage engine having root node labels through a global index using a first iterator interface;
when the root node in the target star subgraph has a node constraint condition, filtering the node by using the node constraint condition to obtain a candidate node;
and accessing adjacent nodes of the candidate nodes in the target star subgraph through a local index by using a second iterator interface, filtering the adjacent nodes through node constraint conditions or edge constraint conditions when the adjacent nodes have the node constraint conditions or the edge constraint conditions, and compressing the filtered adjacent nodes to obtain a matching result corresponding to the target star subgraph.
7. The method of claim 6, wherein the first iterator interface is a VertexIter interface and the second iterator interface is a Neighboriter interface.
8. The method according to any of claims 1-7, wherein the storage engine is obtained by:
acquiring a data map;
storing each node on an external memory in a global index mode according to the node label of each node in the data graph;
for each node, determining an adjacent node of the node and an edge adjacent to the node from the data graph, and storing the adjacent node of the node and the edge adjacent to the node on an external memory in a local index mode according to a node label of the adjacent node of the node.
9. A graph matching apparatus, characterized in that the apparatus comprises:
the analysis module is used for acquiring the graph matching statement and analyzing the graph matching statement to obtain an initial pattern graph;
the decomposition module is used for determining a root node from the initial pattern graph and decomposing the initial pattern graph based on the root node to obtain a target star subgraph, wherein the target star subgraph comprises the root node, leaf nodes adjacent to the root node in the initial pattern graph and a push-down constraint condition;
the matching module is used for accessing nodes in a data graph through an iterator interface provided by a storage engine, matching the target star subgraph in the data graph to obtain a compressed matching result corresponding to the target star subgraph, and storing the data graph on an external memory by the storage engine in a storage mode with the nodes as centers in advance to obtain the compressed matching result;
and the processing module is used for processing the matching result corresponding to the target star subgraph through a parallel pipeline type multipath connection algorithm to obtain the matching result of the initial pattern graph.
10. An electronic device, characterized in that the electronic device comprises:
one or more processors;
a memory electrically connected with the one or more processors;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-8.
11. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110727102.3A CN113505278B (en) | 2021-06-29 | 2021-06-29 | Graph matching method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110727102.3A CN113505278B (en) | 2021-06-29 | 2021-06-29 | Graph matching method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113505278A true CN113505278A (en) | 2021-10-15 |
CN113505278B CN113505278B (en) | 2024-08-20 |
Family
ID=78010870
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110727102.3A Active CN113505278B (en) | 2021-06-29 | 2021-06-29 | Graph matching method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113505278B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114625811A (en) * | 2022-05-16 | 2022-06-14 | 支付宝(杭州)信息技术有限公司 | Method and system for improving sub-graph matching efficiency |
CN115018280A (en) * | 2022-05-24 | 2022-09-06 | 支付宝(杭州)信息技术有限公司 | Risk graph pattern mining method, risk identification method and corresponding devices |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107193882A (en) * | 2017-04-27 | 2017-09-22 | 东南大学 | Why not query answer methods based on figure matching on RDF data |
CN108520035A (en) * | 2018-03-29 | 2018-09-11 | 天津大学 | SPARQL parent map pattern query processing methods based on star decomposition |
KR101945406B1 (en) * | 2018-06-08 | 2019-02-08 | 한국과학기술정보연구원 | Real-relationships based similar sub-graph matching method |
CN112667860A (en) * | 2020-12-30 | 2021-04-16 | 海南普适智能科技有限公司 | Sub-graph matching method, device, equipment and storage medium |
-
2021
- 2021-06-29 CN CN202110727102.3A patent/CN113505278B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107193882A (en) * | 2017-04-27 | 2017-09-22 | 东南大学 | Why not query answer methods based on figure matching on RDF data |
CN108520035A (en) * | 2018-03-29 | 2018-09-11 | 天津大学 | SPARQL parent map pattern query processing methods based on star decomposition |
KR101945406B1 (en) * | 2018-06-08 | 2019-02-08 | 한국과학기술정보연구원 | Real-relationships based similar sub-graph matching method |
CN112667860A (en) * | 2020-12-30 | 2021-04-16 | 海南普适智能科技有限公司 | Sub-graph matching method, device, equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
HAIDA ZHANG 等: "Efficient and High-Quality Seeded Graph Matching: Employing High Order Structural Information", 《ARXIV:1810.11152V1》, 26 October 2018 (2018-10-26), pages 1 - 14 * |
兰超;张勇;邢春晓;: "分布式Top-k子图匹配技术", 清华大学学报(自然科学版), no. 08, 15 August 2016 (2016-08-15), pages 871 - 877 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114625811A (en) * | 2022-05-16 | 2022-06-14 | 支付宝(杭州)信息技术有限公司 | Method and system for improving sub-graph matching efficiency |
CN115018280A (en) * | 2022-05-24 | 2022-09-06 | 支付宝(杭州)信息技术有限公司 | Risk graph pattern mining method, risk identification method and corresponding devices |
Also Published As
Publication number | Publication date |
---|---|
CN113505278B (en) | 2024-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10430469B2 (en) | Enhanced document input parsing | |
US10747958B2 (en) | Dependency graph based natural language processing | |
US11281864B2 (en) | Dependency graph based natural language processing | |
EP3198478A1 (en) | Method and system for implementing efficient classification and exploration of data | |
US20180253653A1 (en) | Rich entities for knowledge bases | |
CN113505278B (en) | Graph matching method and device, electronic equipment and storage medium | |
CN111475588B (en) | Data processing method and device | |
CN106682514B (en) | System calling sequence feature pattern set generation method based on subgraph mining | |
US9706005B2 (en) | Providing automatable units for infrastructure support | |
US11288266B2 (en) | Candidate projection enumeration based query response generation | |
CN111444220A (en) | Cross-platform SQ L query optimization method combining rule driving and data driving | |
CN114817243A (en) | Method, device and equipment for establishing database joint index and storage medium | |
CN115358397A (en) | Parallel graph rule mining method and device based on data sampling | |
US8650180B2 (en) | Efficient optimization over uncertain data | |
CN106599122B (en) | Parallel frequent closed sequence mining method based on vertical decomposition | |
Löhnertz et al. | Steinmetz: Toward Automatic Decomposition of Monolithic Software Into Microservices. | |
CN112970011A (en) | Recording pedigrees in query optimization | |
CN110765100B (en) | Label generation method and device, computer readable storage medium and server | |
CN115186738B (en) | Model training method, device and storage medium | |
Vázquez-Barreiros et al. | Enhancing discovered processes with duplicate tasks | |
CN115809294A (en) | Rapid ETL method based on Spark SQL temporary view | |
CN111159203B (en) | Data association analysis method, platform, electronic equipment and storage medium | |
CN113962549A (en) | Business process arrangement method and system based on power grid operation knowledge | |
CN112750047A (en) | Behavior relation information extraction method and device, storage medium and electronic equipment | |
EP3944127A1 (en) | Dependency graph based natural language processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |