CN112559807B

CN112559807B - Graph pattern matching method based on multi-source point parallel exploration

Info

Publication number: CN112559807B
Application number: CN202011410948.6A
Authority: CN
Inventors: 黄文杰; 高杨; 陈伟; 王新根; 黄滔
Original assignee: Zhejiang Bangsheng Technology Co ltd
Current assignee: Zhejiang Bangsheng Technology Co ltd
Priority date: 2020-12-03
Filing date: 2020-12-03
Publication date: 2022-06-21
Anticipated expiration: 2040-12-03
Also published as: CN112559807A

Abstract

The invention discloses a graph pattern matching method based on multi-source point parallel exploration, which can be used for fuzzy query of graph patterns starting from a determined point in a graph database. According to the invention, the mode graph to be queried is decomposed according to the hierarchical structure, and graph traversal query is carried out by taking the graph layer as a unit, so that the exploration depth can be obviously reduced, and the parallel exploration performance is improved. The invention provides concepts of a central mode set and an edge mode set, which are used for controlling an exploration process, converting an exploration task from a subgraph as a center into a point as a center and realizing an algorithm on a general distributed graph computing platform. The invention provides a matching result refining method for combining multi-source point search results, which is characterized in that a plurality of auxiliary source points are appointed for repeated search, and the constraint on the matching result is enhanced by utilizing layer differences brought by different visual angles, so that the matching precision is improved.

Description

Graph pattern matching method based on multi-source point parallel exploration

Technical Field

The invention relates to the field of distributed computing in the field of big data, in particular to a graph pattern fuzzy matching method of a directed graph with a label based on a distributed graph computing framework.

Background

With the development of the internet, social and economic activities of human beings depending on the internet are more and more common, and the data scale is objectively and rapidly increased. The graph can be used for modeling socio-economic activities with association relations, and searching complex association patterns from the graph becomes a common method for analyzing the association relations of the entities, but also consumes a large amount of computing resources. In the context of increasing graph size, it is important to reduce the computational resources required for graph pattern matching.

Taking the enterprise trade network as an example, the transaction entity is used as a node, and the transaction behavior is used as a directed edge. An enterprise is represented as a node in a network, with the nodes labeled to represent products or services that can be provided by each. The existence of directed edges between nodes indicates that a trade transaction occurs between enterprises, and the weight of the directed edges indicates the amount of the transaction. The mode of the trade network is sensitive to the graph topology structure, and the graph simulation algorithm suitable for social network analysis cannot be directly applied to the network. The subgraph isomorphism method which strictly restricts the topology is an NP complete problem, and the exponentially increased time complexity is not suitable for the query of a large-size graph.

Disclosure of Invention

The invention aims to provide a fuzzy matching method for a graph pattern oriented to a directed graph with a label, which balances matching efficiency and precision on the premise of keeping polynomial time complexity. The method is characterized by layer decomposition, multi-source point exploration and result fusion.

The purpose of the invention is realized by the following technical scheme: a graph pattern matching method based on multi-source point parallel exploration comprises the following steps:

(a) let the pattern diagram to be matched be Q ═ V_q，E_q，L_q) Wherein V is_qRepresenting a set of nodes of a pattern graph, E_qSet of edges representing connected nodes, L_qA set of labels representing nodes of the schema graph; selecting a node s E V of the pattern graph_qAnd as a search source point, performing depth traversal by taking a source point s as a center, and marking the level of the mode graph node by using a depth value. The pattern graph is decomposed into a plurality of layers, and each layer comprises nodes of the same layer and edges connecting the nodes.

(b) And calculating a central pattern set according to the structure and node semantics of the pattern graph to be matched. And calculating an edge mode set according to the dependency relationship of the adjacent layers. V_qEach node of (a) contains its own set of central patterns, which includes the tags of that node, and the ids of its parent and child nodes. For nodes with level d > 0, the edge pattern set is only a subset of the central pattern setContaining the parent-child node id of level d-1. The set of central patterns and the set of edge patterns contain all structural constraints in the graph layer-by-layer exploration process. And defining the depth of the layer as D, performing exploration on the layer at least D times and at most 2D times, wherein the former D times are an expansion stage, adding new matchable nodes and removing mismatched nodes exist at the same time, and a convergence stage is performed after the exploration D times, so that new nodes are not added and only the mismatched nodes are removed.

(c) And acquiring a data graph to be matched, and selecting a starting point in the data graph to explore the neighbor nodes. Each node only maintains its matching set, and the matching set contains the node in the data graph and the V of the pattern graph_qA set of all nodes that can be matched. Each node of the data graph may also maintain its own state set for computation and constraint of the edge pattern set or the center pattern set. In the previous expansion stage for D times, the step of exploring the D +1 layer by the layer D is as follows: traversing nodes of a matching set which is not empty in the data graph, selecting data graph nodes of the matching set which comprise d-layer nodes of the pattern graph, and adding neighbor nodes of the selected data graph nodes into an exploration queue; each node in the exploration queue collects a matching set of neighbor nodes, and the matching sets of the neighbor nodes of which the matching sets are not empty are merged into a state set; traversing the pattern graph V for each node in the discovery queue_qJudging that the data graph nodes and the mode graph nodes have the same label and the data graph node state set is a superset of the mode graph node edge mode set by the nodes with the middle hierarchy of d + 1; and if the judgment result is true, adding the current pattern graph node into the matching set of the data graph nodes. Meanwhile, in the whole matching process, the non-empty node of each matching set of the data graph also collects the matching sets of the neighbors of the nodes of the data graph and forms a state set, the mode graph nodes which are not updated in the matching sets are traversed, and whether the current state set is a superset of the central mode set of the mode graph nodes is judged; if not, the pattern graph node is removed from the matching set. And repeatedly executing the searching step, wherein only one adjacent layer is matched in each searching until all layers are searched. After the exploration step is finished, the nodes with non-empty matching sets can continuously collect the neighbor states and continuously verify whether the matching sets of the nodes contain unmatched nodes until all the matched nodesUntil the set is no longer changed.

(d) And acquiring nodes of which the matching sets are not empty in the data graph, and extracting the subgraph as a new data graph. The subgraph is used as a subset of the data graph and comprises nodes and edges, wherein the nodes and the edges are not empty in a matching set in one exploration from s. And (4) obtaining a plurality of sub-graphs by one data graph matching task, and using the id of the starting point of the data graph in the step (c) as a unique identifier. For a data graph G ═ (V, E, L), where V denotes the set of nodes of the data graph,

as the data graph edge set, L is a label set of the data graph, and generally, there are

Selecting a plurality of secondary source points in the pattern graph, and repeatedly exploring on the subgraph by using the method provided by the step (c). The secondary source points are defined as other nodes in the pattern graph that are not identical to the source point s. And finally, solving intersection of the multi-source point exploration results, and removing nodes with empty matching sets to obtain refined matching results.

Further, the pattern diagram Q ═ V (V)_q，E_q，L_q) And (V, E, L) are directed graphs with node labels. The node labels may be symbols that distinguish nodes of different roles. Each node of the schema graph may also have an additional explicit label indicating that the data graph node matching this node should exclude other schema graph nodes. The method specifically comprises the following steps: the node x in the data graph that has matched the expicit tag in the pattern graph cannot match other nodes at the same time. That is, the matching set M contains x and the matching set modulo | M | ≦ 1, or does not contain x and | M | < | V_qL. By specifying an explicit node, the matched pattern can be further specified.

Further, there may be a plurality of ways of selecting an exploration source point in step (a), including:

(a1) one node in the pattern graph is randomly selected.

(a2) The node with the smallest pattern graph eccentricity and the largest degree is selected.

(a3) When the data graph is a dynamic graph, nodes with the same incremental node labels as the data graph are selected.

(a4) And manually selecting according to specific business logic.

Further, in (a2), the eccentricity of each node may be expressed as a maximum edgeless distance of the node to other nodes of the pattern graph. The undirected edge distance means that all directed edges in the graph replace the undirected edges and then the actual distance is calculated.

Further, in the step (a), the pattern map is subjected to depth traversal by using the source point as a starting point through a depth priority method, and the layers are divided by marking the levels of the nodes with the traversed depth.

Further, in step (c), the search process of the graph is performed from the lower level to the higher level in the order of the depth values from small to large. The nodes of the new graph layer do not depend on each other, the access sequence among the nodes is not concerned during exploration, and the nodes of the data graph independently update the matching sets of the nodes in parallel.

Further, in step (d), the secondary source points are selected in order of decreasing node eccentricity of the pattern diagram. When the node eccentricity is the same, the point with the largest degree is preferably selected. The calculation mode of the node degree is to convert all directed edges into undirected edges and count the number of adjacent edges.

Further, in step (c), the calculation may be performed by means of a distributed graph calculation framework, or may be performed by a stand-alone calculation. In step (d), the selection and hierarchical processing of multiple secondary source points may be performed synchronously while performing step (c), and the repeated exploration step should be performed in parallel in a single-machine multi-task manner, with each task processing a sub-graph.

Further, in a pattern matching task started from a global uncertain point, a point with the same data graph node label and a pattern graph source point s label can be selected as a starting point; in a graph database fixed point mode query task, a starting point of a graph traversal query in a data graph can be selected as an exploration starting point.

Further, the data graph can model the interactive relationship between real world entities, and the domain expert designs the pattern graph according to the business experience, so as to inquire the coincidence relationAn inter-modal entity. Specifically, the trade relationship between enterprises can be modeled, and the data can be defined as G ═ V, E, L. Wherein V is the unique identifier of the enterprise; e is a data graph edge set, if the enterprise A purchases some service or commodity from the enterprise B, an edge pointing to the B from the A is created; l represents node attribute and can mark information such as product types and the like which can be provided by enterprises. Pattern diagram Q ═ V_q，E_q，L_q) An inquiry pattern designed for a business expert, V_qA set of nodes representing a schema graph; e_qRepresenting an edge set of the pattern graph, representing an incidence relation of the query; l is_qThen it is the same set of labels as the set of labels L of the data graph G. And the service expert designs the pattern diagram according to historical experience without knowing the structure of the data diagram in advance. Through the designed pattern diagram, other enterprises with which specific interaction patterns exist can be inquired by starting from enterprises of specified types.

The invention has the beneficial effects that: the method uses the layer as the center to explore, and defines a center pattern set and an edge pattern set to be matched and constrained. Compared with a graph simulation algorithm, the method can be used for matching a more accurate result on the structure in polynomial time. The method adopts a mode of repeatedly exploring multi-source points to refine the result, further strengthens structural constraint from multiple visual angles, and provides the capability of balancing the calculation efficiency and the matching precision. The method can be used for inquiring the fixed interaction mode on the natural graph structures such as the enterprise trade network and the like, and provides a certain inquiry flow control capability for business experts.

Drawings

FIG. 1 is a general flow diagram of the present invention;

FIG. 2 is a schematic diagram pre-processing flow;

FIG. 3 is a data diagram exploration flow;

FIG. 4 is a result iterative refinement process;

FIG. 5 is a sample data diagram;

fig. 6 is a pattern diagram sample.

Detailed Description

In order that those skilled in the art will better understand the invention, the invention will now be described in further detail with reference to the accompanying drawings and specific examples.

As shown in fig. 1, the overall process of the present invention comprises: selecting a source point from the pattern diagram by a proper mode, and decomposing the pattern diagram into a plurality of layers by taking the source point as a center. The layers have a hierarchical relationship, and after a starting point is selected on the data graph, the data graph is explored layer by layer from a bottom layer to a high layer by taking the layers as a unit. The pattern diagram source points and the data diagram starting points need to satisfy certain constraints. Then, optionally, one or more secondary source points are selected on the pattern diagram, the exploration step is repeatedly executed on the explored subgraph, and the intersection is obtained on the result, so that the precision is improved. And the searched result is represented by a connected subgraph, and all subgraph sets obtained by searching different starting points of the data graph are selected as graph pattern matching results.

The pattern diagram is defined as Q ═ V_q，E_q，L_q，d_q) Wherein V _ q is a node set of the pattern diagram;

is a mode graph edge set; t is a set of tags, L_q：V_q→ T maps a node id to a tag; d is a radical of_q：V_q→ 0, 1 represents a decision to determine whether a node is an explicit node. One specific example is shown in FIG. 6, where a schema diagram of a business expert defined query contains 7 nodes, an

And (3) node aggregation:

V_q＝{1，2，3，4，5，6，7}

and (3) edge aggregation:

E_q＝{(1，2)，(2，3)，(3，2)，(3，4)，(5，2)，(5，6)，(5，7)，(7，5)}

and (3) label set:

L_q＝{A，B，C，D，E}

wherein V _q7 kinds of enterprises are represented; e_qIndicating trade traffic between enterprises, such as whether to purchase goods or services, etc.; l is_qSpecifying the goods or services that each business can provide.

As shown in FIG. 2, the schema diagram preprocessing section contains the 1 st and 2 nd steps of the overall process. Firstly, a distance matrix D between nodes is calculated and is defined as the undirected edge distance between any two nodes of the pattern diagram Q. The calculation flow is as follows: first, remove the directional information on the top of Q and convert it into an undirected graph. Then, the weight of the edge is set to be 1, the shortest distance between every two nodes is calculated through a shortest path algorithm such as Dijkstra or Floyd, and the calculation result is filled in a distance matrix D. The distance matrix D is of size | V_q|×|V_qSymmetric matrix of | D_i，jRepresenting the shortest distance from node i to node j. Taking the data diagram shown in FIG. 6 as an example, D_4，7＝D_7，4Is 4, and D_6，60. Distances between other nodes are similar, and D is a symmetric matrix with a diagonal element of 0.

The node eccentricity E can be calculated from the node distance matrix D. For node i, its eccentricity E_iSatisfy the formula

Expressed as the maximum distance of node i from the other nodes. Node 2 in FIG. 6 has an eccentricity of [1, 0, 1, 2, 1, 2 ] of the 2 nd row vector of the vector matrix D]I.e. 2. In the same way, E₁＝3，E₃＝2，E₄＝4，E₅＝3，E₆＝4，E₇＝4。

There are a number of ways to select the source point s on the pattern diagram. The simplest way is random selection, and a more preferable method is to select the node with the smallest eccentricity E and the largest node degree in the pattern graph as the source point. If the degrees of the plurality of candidate nodes are the same, one may be randomly selected. The degree of a node is defined as the number of the connected undirected edges of the node, and can also be expressed by formula

And (4) calculating. As shown in FIG. 6, deg₁＝1，deg₂＝4，deg₃＝3，deg₄＝1，deg₅＝4，deg₆＝1，deg ₇2. At this point, source s may select node 2, whose eccentricity E ₂2 min. The selection of the source point has certain influence on the matching result, and other nodes can be selected as the source point by the service expert according to actual experience.

And calculating the level of each node by combining the source point s and the node distance matrix D, and further decomposing the pattern diagram Q into a plurality of layers. For node i, its level G is defined as G_i＝D_s，i. After the hierarchy is obtained, the set of points V can be hierarchically sorted_qDivision into hierarchical sets Y_s＝{y₀，y₁，…，y_m}. Has y₀Is { s } and

the maximum level is m. As shown in the schematic diagram of FIG. 6, the source point s is set to be node 2 and y₀＝{2}，y₁＝{1，3，5}，y₂With a maximum level of 2, {4, 6, 7 }.

The final step in the pattern map preprocessing is to compute the center pattern set C and the edge pattern set P. C and P contain all constraint information in the exploration process, and can be conveniently serialized on a plurality of nodes of distributed computing. For node i of the pattern graph Q, its central pattern set

Wherein, the first and the second end of the pipe are connected with each other,

is a collection of parent nodes of the node i,

is a set of node i child nodes, having

And

t_iis a node label, having t_i＝L_q(i)。d_iIs a binary judgment when d_iConsider node i when 1 to be explicit, otherwise d_iThis attribute requires the business expert to mark by experience, 0. Similar to the center pattern set, the edge pattern set is defined as

Wherein t is_iAnd d_iIs defined exactly as the central pattern set.

Is composed of

A subset of

Function y: v_q→ N is used to compute the hierarchy to which the node belongs. In a similar manner, the first and second substrates are,

also satisfy

Taking node 3 in FIG. 6 as an example, its central pattern set

Included

t₃＝B，d₃0. Node 3 belongs to layer 1 and has edge mode set

I.e. only layer 0 nodes.

In the scenario of determining point queries by a graph database, the conventional idea is to perform queries in a graph traversal manner. The process of graph traversal is from a determined pointAnd continuously searching outwards along the adjacent edge until the specified condition is met. At this time, the query request already contains the starting point V of graph traversal, and V can be directly selected_qThe node with the middle label identical to the label of v is used as a source point.

In a dynamic graph scenario, it is generally desirable to explore directly from the incremental portion of the data graph. At this time, the source point may not be fixed, and all nodes of the pattern graph are taken as potential source points to be preprocessed in sequence. When the delta deltag of the data graph is received, the labelset deltat for all nodes in deltag is also obtained. Final source point selects only V_qThe middle label contains the node in Δ T and the final selection is determined with low eccentricity and high node degree as criteria. Because the preprocessing of the pattern graph is independent of the data graph, the mode of preprocessing each node as a source point in advance does not increase the matching time.

As shown in fig. 3, the data diagram exploration flow includes

steps

3 and 4 of the overall flow diagram. Defining a data graph G ═ V, E, L, where V is a set of data graph nodes, E is a set of data graph edges, L: v → T represents the function that maps the data graph node id to a tag, and the definition of the tag set T is the same as the mode icon tag set definition. Taking the enterprise trading network shown in fig. 5 as an example, the data diagram includes 10 enterprises, each of which can provide 5 kinds of commodities. V is the unique identification set of the enterprise, here represented by node id, i.e., 1, 2, …; e, establishing a directed edge from node 1 to node 2 if enterprise 1 purchases a certain commodity or service from enterprise 2. L represents the products or services that each enterprise can offer.

The exploration starting point of the data graph G may be selected in various ways. For an indeterminate point query, all data graph nodes with the same labels as the source points are selected as exploration starting points, and a starting point set is defined as { V ∈ V: l (v) ═ L_q(s) }. For query with fixed points, there is a set of fixed points V_eAs a feasible starting point, satisfy

At this time, V can be adjusted_eAnd (4) treating the target as a search starting point set in a dynamic source point mode. SourceThe selection of points is s ∈ V_q∧x∈V_e∧L_q(s) ═ l (x), where v is the selected source point. For exploration on the dynamic graph, the method can be regarded as a query with a certain point and V_eΔ V, where Δ V is the dynamic map increment portion. And determining a pattern diagram source point and a corresponding data diagram starting point, and starting an exploration task. The data graph searching tasks started from each searching starting point are independent from each other and can be processed in parallel.

Before the exploration is started from the starting point of the data graph, a matching set M needs to be established for each node. For node i in the data graph G, the matching set M is_iThe nodes of the pattern graph expressed as matching with the node i are

At the beginning

Each exploration task maintains an explored node set defined as

Selecting a source point s, searching a starting point u, and dividing M_uIs set as M_uS, and t is 1. And then sending out scheduling signals to the parent node and the child node of the u, and processing the nodes receiving the scheduling signals in parallel in the next iteration. In each iteration, the node i receiving the scheduling signal performs the following processing: the level t that this iteration should match is first determined. Then from V_aObtain a matching set of parent nodes

Matching set with child nodes

If there are multiple parent and child nodes, they are merged into a matching set. Then, from y_tThe nodes with labels L (i) in the selection form a set

Is provided with

Sequential judgment

Whether the node in (1) satisfies the constraint of the edge pattern set is

If node v does not satisfy the constraint, calculate

Sequentially judging M_iWhether the middle node satisfies the central pattern set constraint, have

If node v does not satisfy the constraint, M is computed_i＝M_i- { v }. In particular, when M_iWhen containing a node marked as an explicit, direct command

Final calculation

Combining the matching sets if

And the node i sends a scheduling signal to the neighbor, calculates t as t +1, and performs the next iteration. Here m is the level of the largest layer. When the algorithm enters the convergence stage after iteration m times, V_aAll nodes send scheduling signals to neighbors, and the node i receiving the scheduling signals collects the matching sets of the father node and the child nodes by the method

And

is sequentially judged fromRaw matching set M_iWhether each node v in (b) satisfies

If not, calculate M_i＝M_i- { v }. If it occurs

Then V_a＝V_a- { i }. The convergence phase iterates to V_aThe process can be stopped without any change, and the process iterates for m times at most. As shown in the data diagram of FIG. 5, the matching result set V using the node 5 as the

search starting point

_a2, 3, 4, 5, 6, 7, 8, 10, where M is₂＝{3}，M₃＝{4}，M₄＝{1}，M₅＝{2，4}，M₆＝{7}，M₇＝{5}，M₈＝{4，6}，M ₁₀6. The node 5 and the node 8 can simultaneously satisfy two interactive relations, and certain ambiguity exists. If the pattern diagram d is set₂1, then M is obtained₅The ambiguity can be eliminated to some extent by {2}, i.e. only including the pattern graph explicit nodes. Whether there is such a need should be met with a specific business context, depending on the specific experience of the business expert.

After searching m layers, according to the searched node set

Generating subgraphs from a data graph

As a result of the first stage. If the result meets the matching precision requirement, the next refining step can be directly skipped, and the result is directly returned.

As illustrated in FIG. 4, the results of the preliminary exploration

The refining process of (2) corresponds to the 5 th and 6 th steps of the total flow. K secondary source points are first selected from the pattern diagram Q. The selection rule is as follows: will V_qThe nodes are sorted from large to small according to the eccentricity E, if the eccentricity is the same, the nodes are sorted from large to small according to the node degrees, and the first K nodes are selected as the auxiliary starting points. The mode graph is still decomposed into a plurality of layers by adopting a method of dividing the hierarchy according to the node distance matrix D. Adopting a data diagram starting point selection mode applied to uncertain point query

A plurality of parallel graph exploration tasks are started, and the task number set is X ═ 1, 2, …, n }. For the

The middle node i and the graph exploration task x obtain a matching set of

After all the exploration tasks are executed, each node calculates

Finally removing

And regenerating the subgraph as the final result output when the middle matching set is empty nodes. The result refining process requires the initiation of n exploration tasks, typically

The method has small scale, can be directly downloaded to a single computing node for processing, and avoids the overhead brought by distributed communication.

The above-described embodiments are intended to illustrate rather than limit the invention, and any modifications and variations of the present invention are within the spirit and scope of the appended claims.

Claims

1. A graph pattern matching method based on multi-source point parallel exploration is characterized by comprising the following steps:

(a) the pattern diagram to be matched is recorded as

Wherein, in the step (A),

a collection of nodes of the pattern graph is represented,

represents a collection of edges connecting the nodes,

a set of labels representing nodes of the schema graph; selecting a node of a pattern graph

As exploration source point, source point

Performing depth traversal for the center, and marking the level of the mode graph nodes by using depth values; decomposing the pattern graph into a plurality of layers, wherein each layer comprises nodes of the same level and edges connecting the nodes;

(b) calculating a central pattern set according to the structure and node semantics of the pattern graph to be matched; calculating an edge mode set according to the dependency relationship of the adjacent layers,

each node of (2) contains its own central mode set, which includes the label of the node and the id of its parent-child node; for hierarchy

The node of (1) has an edge mode set as a subset of a central mode set, and only comprises a hierarchy

Id of parent-child node; the central mode set and the edge mode set contain all structural constraints in the layer-by-layer exploration process of the graph; defining the depth of the layer as

If the layer is searched for at least D times, the layer is searched for at most

Then, before

Secondly, an expansion stage, namely adding new matchable nodes and removing mismatched nodes, wherein a convergence stage is performed after D times of exploration, new nodes are not added, and only the mismatched nodes are removed;

(c) acquiring a data graph to be matched, and selecting a starting point in the data graph to explore a neighbor node; each node only maintains its own matching set, and the matching set contains the node and the pattern graph in the data graph

A set of all nodes that can be matched in (1); each node of the data graph also maintains its own state set for computing constraints of the edge mode set or the central mode set; before

A sub-expansion stage consisting of layers

Exploration of

The step of the layer is as follows: traversing nodes of the data graph with non-empty matching sets, and selecting the matching sets to contain the pattern graph

The data graph nodes of the layer nodes add the neighbor nodes of the selected data graph nodes into the exploration queue; each node in the exploration queue collects a matching set of neighbor nodes, and the matching sets of the neighbor nodes of which the matching sets are not empty are merged into a state set; traversing the schema graph for each node in the exploration queue

In the middle level is

The node (2) judges that the data graph nodes and the mode graph nodes have the same label and the data graph node state set is a superset of the mode graph node edge mode set; if the judgment is true, adding the current pattern graph node into the matching set of the data graph nodes; meanwhile, in the whole matching process, each non-empty node of the matching set of the data graph also collects the matching sets of the data graph node neighbors and forms a state set, the mode graph nodes which are not updated in the matching sets are traversed, and whether the current state set is a superset of the central mode set of the mode graph nodes or not is judged; if the judgment is no, the pattern graph node is removed from the matching set; repeatedly executing the exploration step, wherein only one adjacent layer is matched in each exploration until all layers are explored; after the exploration step is finished, the nodes with non-empty matching sets can continuously collect the neighbor states and continuously verify whether the matching sets of the nodes contain unmatched nodes or not until all the matching sets are not changed,

(d) acquiring nodes of which the matching sets are not empty in the data graph, and extracting subgraphs as new data graphs; the subgraph, which is a subset of the data graph, contains the subgraphs

Matching nodes with non-empty sets and edges connected with the nodes in the starting exploration; a plurality of sub-graphs are obtained by one data graph matching task, and the id of the starting point of the data graph in the step (c) is used as a unique identifier; for a data diagram

Wherein, in the step (A),

a set of nodes representing a data graph,

as a set of edges of the data graph,

is a labelset of a data graph, having

(ii) a Selecting a plurality of secondary source points in the pattern diagram, and repeatedly exploring on the subgraph by using the method provided by the step (c); the secondary source point is defined as the mode diagram neutralization source point

Other nodes that are not identical; finally, solving an intersection set of the multi-source point exploration results, and removing nodes with empty matching sets to obtain refined matching results;

the data diagram can model the interactive relation between real world entities, and a domain expert designs a pattern diagram according to business experience so as to inquire the entities conforming to the interactive pattern; specifically, the trade relationship between enterprises can be modeled, and data can be defined as

(ii) a Wherein

Is a unique identifier of the enterprise;

if enterprise A purchases some service or commodity from enterprise B, then an edge pointing to B from A is created;

representing node attributes, and marking the product type information which can be provided by the enterprise; schematic diagram

The query pattern designed for the business expert,

a set of nodes representing a schema graph;

representing an edge set of the pattern graph, representing an incidence relation of the query;

then it is a sum data graph

Tag set of

The same set of tags; a business expert designs a pattern diagram according to historical experience without knowing a data diagram structure in advance; through the designed pattern diagram, other enterprises with which specific interaction patterns exist can be inquired by starting from enterprises of specified types.

2. The graph pattern matching method based on multi-source point parallel exploration according to claim 1, wherein the pattern graph

And data diagram

All are directed graphs with node labels; the node label is a symbol for distinguishing nodes with different roles;each node of the schema graph also carries an additional explicit label indicating that a data graph node matching this node should exclude other schema graph nodes; the method comprises the following specific steps: nodes with explicit labels in matched pattern graph in data graph

Other nodes cannot be matched at the same time; i.e. matching set

Included

And matched set model

Or do not comprise

And is

(ii) a By specifying an explicit node, the matching pattern can be specified.

3. The graph pattern matching method based on multi-source point parallel exploration according to claim 1, wherein a plurality of modes for selecting exploration source points exist in step (a), and the modes comprise:

(a1) randomly selecting a node in the pattern graph;

(a2) selecting a node with the smallest eccentricity and the largest degree of the pattern graph;

(a3) when the data graph is a dynamic graph, selecting nodes with the same labels as the incremental nodes of the data graph;

(a4) and manually selecting according to specific business logic.

4. The graph pattern matching method based on multi-source point parallel exploration according to claim 3, wherein in (a2), the eccentricity of each node is expressed as the maximum undirected edge distance of the node to other nodes of the pattern graph; the undirected edge distance means that all directed edges in the graph replace the undirected edges and then the actual distance is calculated.

5. The graph pattern matching method based on multi-source point parallel exploration according to claim 1, wherein in step (c), the exploration process of the graph is gradually explored from a lower level to a higher level according to the sequence of depth values from small to large; the nodes of the new graph layer do not depend on each other, the access sequence among the nodes is not concerned during exploration, and the nodes of the data graph independently update the matching sets of the nodes in parallel.

6. The graph pattern matching method based on the multi-source point parallel exploration according to claim 1, wherein in the step (d), the sub-source points are selected according to the sequence of the node eccentricity of the pattern graph from large to small; when the node eccentricity is the same, preferentially selecting the point with the largest degree; the calculation mode of the node degree is to convert all directed edges into undirected edges and count the number of adjacent edges.

7. The graph pattern matching method based on multi-source point parallel exploration according to claim 1, wherein in step (c), the calculation can be performed by means of a distributed graph calculation framework, and a single-machine calculation mode can also be adopted; in step (d), the selection and hierarchical processing of multiple secondary source points can be performed synchronously while performing step (c), and the repeated exploration step should be performed in parallel in a single-machine multi-task manner, with each task processing a sub-graph.

8. The graph pattern matching method based on multi-source point parallel exploration according to claim 1, wherein data graph node labels and pattern graph source points can be selected in a pattern matching task starting from a global uncertain point

The point with the same label is taken as a starting point; in the figureIn the fixed-point mode query task of the database, the starting point of the graph traversal query in the data graph can be selected as the exploration starting point.