CN117539928A - Method and device for sub-graph matching through calculation engine - Google Patents

Method and device for sub-graph matching through calculation engine Download PDF

Info

Publication number
CN117539928A
CN117539928A CN202311551230.2A CN202311551230A CN117539928A CN 117539928 A CN117539928 A CN 117539928A CN 202311551230 A CN202311551230 A CN 202311551230A CN 117539928 A CN117539928 A CN 117539928A
Authority
CN
China
Prior art keywords
sub
graph
target
engine
configuration information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311551230.2A
Other languages
Chinese (zh)
Inventor
游东海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202311551230.2A priority Critical patent/CN117539928A/en
Publication of CN117539928A publication Critical patent/CN117539928A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The embodiment of the specification provides a method and a device for sub-graph matching through a computing engine, wherein the method comprises the following steps: decomposing the sub-graph matching task into a plurality of super steps in turn according to a batch synchronous parallel BSP calculation mode, and generating configuration information corresponding to the plurality of super steps; determining a target execution module corresponding to a target calculation engine from a plurality of execution code modules respectively corresponding to a plurality of preset calculation engines; and sending the target execution module and the configuration information to a target calculation engine, wherein the target calculation engine is used for executing the target execution module according to the configuration information to obtain a sub-graph matching result corresponding to the sub-graph matching task.

Description

Method and device for sub-graph matching through calculation engine
Technical Field
One or more embodiments of the present disclosure relate to the field of graph computation, and in particular, to a method and apparatus for sub-graph matching by a computing engine.
Background
The Knowledge Graph (knowledgegraph) describes entities in the objective world and the relationship between the entities in the form of a structured language network, and provides a way for organizing and managing mass information of the objective world. At present, the knowledge graph has wide application in many fields such as natural language processing, machine learning, data mining, recommendation systems and the like. Sub-graph matching (Subgraph Matching) is a common task in knowledge-graph applications, the main purpose of which is to find sub-graphs isomorphic with a given small graph within a large graph, and to explore important sub-graphs in the knowledge graph. In an actual production scenario, different knowledge-graph application vendors often want to perform sub-graph matching tasks for knowledge graphs using the computing power of different computing engines they have deployed, but existing schemes that perform sub-graph matching by computing engines often cannot perform sub-graph matching tasks by different general purpose computing engines.
Disclosure of Invention
One or more embodiments of the present specification describe a sub-graph matching method and apparatus such that sub-graph matching tasks may be performed by a variety of different computing engines.
According to a first aspect, there is provided a method of sub-graph matching by a computing engine, comprising:
decomposing the sub-graph matching task into a plurality of super steps in turn according to a batch synchronous parallel BSP calculation mode, and generating configuration information corresponding to the plurality of super steps; determining a target execution module corresponding to a target calculation engine from a plurality of execution code modules respectively corresponding to a plurality of preset calculation engines;
and sending the target execution module and the configuration information to a target calculation engine, wherein the target calculation engine is used for executing the target execution module according to the configuration information to obtain a sub-graph matching result corresponding to the sub-graph matching task.
In one possible implementation, the programming language based on which the plurality of target execution codes are based includes one of Python language, scalea language, java language, and R language.
In a possible implementation, the configuration information includes each of the plurality of supersoles and sub-operations included by each of the plurality of supersoles.
In one possible implementation manner, executing the target execution code according to the configuration information includes: and sequentially executing the plurality of supersoles, wherein executing a first supersole included in the plurality of supersoles includes executing a sub-operation included in the first supersole.
In a possible implementation manner, the sub-operations included in the first step include:
and calling an application interface corresponding to the sub-operation on the target computing engine according to the sub-operation included in the first step.
In a possible implementation manner, the sub-operations included in the first superstep are Match sub-operations and Join sub-operations, the Match sub-operations are used for matching a graph object meeting a predetermined condition, the Join sub-operations are used for connecting a matching result of the Match sub-operations with an initial sub-graph of the first superstep, and an application interface corresponding to the sub-operations on the target computing engine is a map flattened snap interface.
In one possible implementation, the target computing engine includes a plurality of computing nodes including a first node;
the target computing engine executes the target execution code according to the configuration information, and the target computing engine comprises:
the first node executes a part corresponding to the first node in each super step;
the method further comprises the steps of:
and before the first node executes the part corresponding to the first node in each step, saving the graph data associated with the first node in each step to the local of the first node.
In one possible implementation, the sub-operations included in the superstep include performing graph feature computation.
In one possible implementation, the graph feature calculation includes determining importance features of graph nodes based on a web page level PageRank algorithm.
In one possible embodiment, the sub-graph matching task is used for sub-graph matching for a knowledge graph.
According to a second aspect, there is provided a sub-map matching apparatus, the apparatus comprising:
the configuration unit is configured to decompose the sub-graph matching task into a plurality of steps in sequence according to the batch synchronous parallel BSP calculation mode, and generate configuration information corresponding to the plurality of steps; determining a target execution module corresponding to a target calculation engine from a plurality of execution code modules respectively corresponding to a plurality of preset calculation engines;
and the execution unit is configured to send the target execution module and the configuration information to a target calculation engine, and is used for the target calculation engine to execute the target execution module according to the configuration information so as to obtain a sub-graph matching result corresponding to the sub-graph matching task.
According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
According to a fourth aspect, there is provided a computing device comprising a memory having executable code stored therein and a processor which when executing the executable code implements the method of the first aspect. The methods, apparatus, computing devices, storage media in the above aspects are utilized such that sub-graph matching tasks may be performed by a variety of different computing engines.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 shows a schematic diagram of a scheme for sub-graph matching by a compute engine;
FIG. 2 is a schematic diagram of a method for sub-graph matching by a compute engine provided by an embodiment of the present description;
FIG. 3 illustrates a flow chart of a method for sub-graph matching by a compute engine provided by an embodiment of the present description;
FIG. 4 is a schematic diagram of sub-graph matching lockstep partitioning provided by an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of invoking a corresponding interface according to a super-step sub-step provided in an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of pre-storing data in a local node according to an embodiment of the present disclosure;
fig. 7 shows a block diagram of an apparatus for sub-graph matching by a computing engine according to an embodiment of the present specification.
Detailed Description
The following describes the scheme provided in the present specification with reference to the drawings.
As described above, the Knowledge Graph (knowledgegraph) describes the entities and the relationships between the entities in the objective world in the form of a structured semantic Graph, and provides a way to organize and manage massive information in the objective world. At present, the knowledge graph has wide application in many fields such as natural language processing, machine learning, data mining, recommendation systems and the like. The sub-graph matching (Subgraph Matching) task, which is a common task in graph computing applications, is primarily aimed at finding sub-graphs within a large graph that are isomorphic with a given small graph, and is intended to explore important sub-graphs in the large graph. Knowledge-graph applications are essentially one of the graph computation applications, and thus, sub-graph matching tasks are also common tasks in knowledge-graph applications.
A Compute Engine (computer Engine) generally refers to a program that processes and computes data, particularly large-scale data. In a production scenario, different application vendors often perform various computing tasks, including graph computing tasks, such as sub-graph matching tasks, through computing capabilities provided by a computing engine. However, existing schemes for sub-graph matching by a compute engine are typically constructed based on a graph computation mode and a graph computation interface of a dedicated graph compute engine (i.e., the portion of the particular compute engine that is used for graph computation) that is included in the particular compute engine. Because the graph computation modes and graph computation interfaces of the specialized graph computation engines included by different computation engines are often different, such sub-graph matching schemes can generally only be run by a particular computation engine. FIG. 1 shows a schematic diagram of a scheme for sub-graph matching by a compute engine. As shown in fig. 1, for example, for computing engine a, a sub-graph matching scheme 1 for computing engine a may be established according to a graph computing mode and a graph computing interface of a dedicated graph computing engine a included in computing engine a; for example, for compute engine B, sub-graph matching scheme 2 for compute engine B may be established according to a graph computation mode and graph computation interface of a dedicated graph compute engine B included by compute engine B. However, in actual production scenarios, the computing engines that different graph application vendors (e.g., knowledge graph applications) have deployed are often different, and they often wish to perform sub-graph matching tasks (e.g., sub-graph matching tasks for knowledge graphs) with the computing capabilities of the different computing engines that they have deployed. Therefore, the existing scheme of sub-graph matching by the computing engine has the following problems: on the one hand, the single sub-graph matching scheme can only aim at a single computing engine, and cannot adapt to different computing engines, namely cannot perform sub-graph matching through different computing engines. On the other hand, if different sub-graph matching schemes are established for the dedicated graph computation engines of different computation engines, respectively, a lot of manpower and computation resource costs are required to be consumed.
In order to solve the above technical problems, an embodiment of the present disclosure provides a method for sub-graph matching by a computing engine. The core idea is as follows: the sub-graph matching task is decomposed through a batch synchronous parallel (BSP, bulk Synchronous Parallelism) computing mode, and particularly the sub-graph matching task can be decomposed into a plurality of super steps in sequence, and configuration information corresponding to the plurality of super steps is generated. And determining a target execution module corresponding to the target calculation engine from a plurality of execution code modules respectively corresponding to the preset calculation engines. And then, the target execution code and the configuration information are sent to a target calculation engine, and the target calculation engine can execute the target execution module according to the configuration information to obtain a calculation result of the sub-graph matching task, namely, a sub-graph matching result. Fig. 2 shows a schematic diagram of a method of sub-graph matching by a compute engine according to an embodiment of the present description. As shown in fig. 2, the sub-graph matching task may be broken down into multiple supersoles, and then configuration information corresponding to the multiple supersoles, as well as execution code modules for different compute engines, are sent to different compute engines (e.g., compute engine a, compute engine b..compute engine X) for execution on the different compute engines and to obtain sub-graph matching results.
The method has the advantages that: in one aspect, the BSP computing mode is a basic computing mode for computing engines that is typically supported by existing computing engines of various types. Therefore, the sub-graph matching task is decomposed through the BSP computing mode, so that various computing engines can execute a plurality of steps obtained after decomposition, and the sub-graph matching task can be executed only through a specific computing engine because of depending on the graph computing mode or interface of the graph computing engine included in the specific computing engine. On the other hand, a target execution module corresponding to a computing engine (for example, a target computing engine) which wants to execute sub-graph matching tasks is determined from a plurality of execution code modules corresponding to a plurality of computing engines, and configuration information corresponding to a plurality of supersteps is sent to the target computing engine, so that different types of target computing engines can obtain different execution code modules and the same superstep configuration information. Furthermore, different types of target calculation engines can execute different execution code modules according to the same superstep configuration information to obtain calculation results of sub-graph matching tasks. In summary, by the method, a sub-graph matching scheme adapted to multiple computing engines can be provided, that is, a scheme for sub-graph matching by different computing engines can be provided. Compared with the special graph calculation engines aiming at different calculation engines respectively, different special sub-graph matching schemes are established, so that the manpower and calculation resource cost is greatly reduced.
A method for sub-graph matching by a computing engine according to an embodiment of the present disclosure is described in detail below. Fig. 3 shows a flowchart of a method for sub-graph matching by a computing engine according to an embodiment of the present disclosure. As shown in fig. 3, the method at least comprises the following steps:
step S301, decomposing a sub-graph matching task into a plurality of steps in sequence according to a batch synchronous parallel BSP calculation mode, and generating configuration information corresponding to the plurality of steps; determining a target execution module corresponding to a target calculation engine from a plurality of execution code modules respectively corresponding to a plurality of preset calculation engines;
step S303, sending the target execution module and the configuration information to a target computing engine, where the target computing engine executes the target execution module according to the configuration information to obtain a sub-graph matching result corresponding to the sub-graph matching task.
Firstly, in step S301, according to a batch synchronization parallel BSP calculation mode, a sub-graph matching task may be decomposed into a plurality of sequential supersteps, and configuration information corresponding to the plurality of supersteps may be generated; and determining a target execution module corresponding to the target calculation engine from a plurality of execution code modules respectively corresponding to the preset calculation engines.
BSP (Bulk Synchronous Parallelism) computing mode, BSP computing mode (Bulk Synchronous Parallel) is a parallel computing mode commonly used for large-scale data processing and distributed computing by multiple computing nodes. The basic idea is to divide the computing task into several computing parts, which are called supersteps (Super steps). At each superstep, each compute node receives data for computation and performs local computation based on the data (Local Computation). After the local calculation is finished, each calculation node sends the calculation result to other nodes, waits for the calculation result sent by other nodes, and then performs summarization, wherein the process of waiting and summarizing the calculation result is also called fence synchronization (Barrier Synchronization). Since the BSP computing mode is a basic computing mode, distributed computing resources can be effectively utilized to process large-scale data well, and various computing engines typically support the BSP computing mode. The sub-graph matching task is decomposed through the BSP calculation mode, so that different calculation engines can execute a plurality of steps obtained after decomposition.
In this step, the sub-graph matching task may be decomposed into a plurality of sequential supersteps according to the batch synchronization parallel BSP calculation mode, and configuration information corresponding to the plurality of supersteps may be generated. In different embodiments, the sub-graph matching task may be used to perform sub-graph matching for different specific types of graph data. For example, in one embodiment, the sub-graph matching task may be used to perform sub-graph matching against a knowledge graph. In different embodiments, according to the difference of the sub-graph matching task, the specific way of decomposing the sub-graph matching task into a plurality of steps in turn may be different, and the divided steps may be used for different specific operations or calculations, which is not limited in this specification. Fig. 4 shows a schematic diagram of sub-graph matching superstep division provided in an embodiment of the present disclosure. In the example shown in fig. 4, the sub-graph matching task is to find a sub-graph from the original data graph, for example, that includes a node of type A, B, C with a node of type a having an outgoing edge (denoted as e_a_b) pointing to a node of type B and an incoming edge (denoted as e_c_a) of a node of type C with an outgoing edge (denoted as e_b_c) pointing to a node of type C. In different embodiments, the sub-graph structure described above may represent business relationships between different specific business objects. For example, in one example, recurring funds transfer relationships between different users may be represented, such as type A user transferring money to type B user, type B user transferring money to type C user, type C user in turn transferring money to type A user. As shown in fig. 4, this sub-graph matching task may be translated into, for example, 5 supersoles: super step 1 to super step 5. Wherein, step 1 may include obtaining a type a node from the original data graph. Superstep 2 may include retrieving from the original data graph B-type nodes and C-type nodes having an e_a_ B, E _c_a relationship with the a-type nodes in superstep 1. Super step 3 may include obtaining nodes satisfying the e_b_c relationship according to the B type node and the C type node obtained in super step 2. The step 4 may include, for example, screening the C-type nodes according to the attribute of the C-type nodes, to obtain C-type nodes that satisfy the screening condition. In one example, the step 5 may further return specific node attribute data, such as a name of an a-type node and an identification of a B-type node, from the A, B, C-type node data satisfying the e_a_ B, E _c_a and e_b_c relationships acquired by the preceding step.
After obtaining a plurality of supersteps according to the sub-graph matching task, configuration information corresponding to the plurality of supersteps can be generated. In one embodiment, the configuration information may include each of the plurality of supersoles, and refinement operations (or sub-operations) included by each of the plurality of supersoles. As previously described, in different embodiments, the specific manner in which the sub-graph matching task is broken down into multiple supersoles may vary depending on the sub-graph matching task. Furthermore, the plurality of steps obtained after the division may be different, and the sub-operations included in each step may be different. In one embodiment, the multiple steps obtained after the dividing may include a first step, where the first step includes a Match sub-operation for matching a graph object meeting a predetermined condition and a Join sub-operation for connecting a matching result of the Match sub-operation with an initial sub-graph of the first step. In various embodiments, the graph objects may be nodes or edges in a graph, or their Feature (or attribute) values, for example. In different embodiments, the predetermined condition may also be a different specific condition. Specifically, for example, as shown in FIG. 4, step 3 may include a Match sub-operation to Match B-type nodes and C-type nodes having E_A_B (A- > B), E_C_A (A < -C) relationships with the A-type nodes obtained in step 1, and E_A_ B, E _C_A relationships therebetween. Super step 3 may further include Join sub-operations for connecting the above-mentioned matching result with the type a node obtained in super step 1. As shown in FIG. 4, the step 4 and the step 5 also comprise a Match sub-operation and a Join sub-operation, which are similar to those of the step 3 in the steps to which the Match sub-operation and the Join sub-operation belong, and are not described in detail herein.
In another embodiment, the sub-operations included in the superstep may further include performing graph feature calculations. By graph feature computation, it is meant that the computation of the features of nodes or edges in a graph is based on a particular algorithm, and in different embodiments, may be used to compute for different specific graph features. In a specific embodiment, the graph feature calculation may include determining importance features of graph nodes based on a web page level (PageRank) algorithm.
In this step, a target execution module corresponding to the target calculation engine may be determined from a plurality of execution code modules corresponding to a plurality of preset calculation engines, respectively. As previously described, in different embodiments, the multiple compute engines may be different specific types of compute engines. Further, the determined target computing engine may also be a different type of computing engine. Furthermore, in different embodiments, the programming language upon which the plurality of execution modules are based may be different specific programming languages. In one embodiment, for example, one of the Python language, the Scala language, the Java language, and the R language may be included. In addition to schemes that use a specialized graph computation engine to perform sub-graph matching tasks, some existing sub-graph matching schemes perform sub-graph matching tasks through the graph query language (Graph Query Language, GQL). However, graph query languages are typically not capable of performing complex graph feature computations, such as the PageRank algorithm. And the execution code module written by using a general programming language like the Python language, the Java language and the like has stronger computing power, and can perform more complex graph feature computation, such as computation of importance features of graph nodes based on a webpage level algorithm.
Then, in step S303, the target execution module and the configuration information are sent to a target computing engine, where the target computing engine executes the target execution module according to the configuration information, to obtain a sub-graph matching result corresponding to the sub-graph matching task.
In this step, the target execution module and the configuration information obtained in step S303 may be sent to the target calculation engine. After receiving the target execution module and the configuration information, the target calculation engine may execute the target execution module according to the received configuration information.
In the embodiment where the configuration information includes each of the supersoles and the sub-operations in the supersoles, the plurality of supersoles may be sequentially performed, where performing a first one of the plurality of supersoles may include performing the sub-operation included in the first one. Specifically, in one embodiment, the sub-operations included in performing the first step may be performed by: and calling an application interface corresponding to the sub-operation on the target computing engine according to the sub-operation included in the first step. In various embodiments, different corresponding interfaces on the target compute engine may be invoked depending on the sub-operations included in the first step. In a specific embodiment, the sub-operation included in the first step may be, for example, a Match sub-operation and a Join sub-operation, and the application interfaces corresponding to the Match sub-operation and the Join sub-operation on the target computing engine are map flattened snap interfaces, and fig. 5 illustrates a schematic diagram provided in the embodiment of the present disclosure for calling the corresponding interfaces according to the step sub-step. In the example shown in fig. 5, the fiatmap interface on the target compute engine (e.g., compute engine a) may be invoked, for example, according to Match and Join sub-operations included in step 3. In another embodiment, the calling party may use code that implements the interface on a different compute engine to make the call, as the interface is called. Thus, in different embodiments, even though the names or parameters of interfaces of different compute engines corresponding to the same sub-operations (e.g., match sub-operation and Join sub-operation) are different, the writing of the execution code modules may itself be based on the adaptation of the names or parameters of the different interfaces and implement the execution logic inside the interfaces using highly reused implementation code. That is, different execution code modules are highly code reusable even for different compute engines, which further reduces the manpower and computing resource costs consumed in building execution code modules for different types of compute engines.
As previously described, the computing engine may generally include a plurality of computing nodes, each of which may perform parallel computation by the plurality of nodes, and each of which may respectively compute a computing portion of each of the supersoles corresponding to the node. For example, one superstep, including for example, match and Join sub-operations, may perform local calculations at each node (perform Match and Join for partial data), respectively, and then each node synchronizes the local calculation results to the other nodes. Thus, in one embodiment, the target computing engine includes a plurality of computing nodes including a first node. Thus, the first node may perform a portion of each superstep corresponding to the first node. Further, in a specific embodiment, the graph data associated with the first node in each of the supersoles may be saved locally to the first node before the first node performs a portion of each of the supersoles corresponding to the first node. Fig. 6 shows a schematic diagram of pre-storing data in a local node according to an embodiment of the present disclosure. In the example shown in fig. 6, the step obtained after the decomposition of the sub-graph matching task includes, for example, step i, and the computing engine executing the sub-graph matching task includes, for example, node j among the plurality of nodes included in the sub-graph matching task, where when the node j is before executing, for example, the portion of the sub-graph matching task corresponding to the node (for convenience of description, may be simply referred to as node j before executing step i), data associated with step i may be read, for example, from the graph data server to the local of node j, and thus the node j may access the data from the local during executing step i. Compared with a conventional graph data reading mode, for example, data is read from a graph data server in the process of executing the super step i, the data can be read from the local, so that the data accessing efficiency in the execution of the super step i can be greatly improved, the influence of the network state in the execution of the super step i is avoided, and the execution efficiency of the super step is improved.
In different embodiments, the corresponding data on different nodes may also be pre-partitioned. In one example, the partitioning may be performed according to hash values of the graph objects, so as to ensure load balancing of computing portions executed on different nodes, thereby further improving overall execution efficiency of the over-walking.
On the other hand, corresponding to the above method, the embodiment of the present specification also discloses a device for sub-graph matching by the computing engine. Fig. 7 shows a block diagram of an apparatus for sub-graph matching by a computing engine according to an embodiment of the present specification. As shown in fig. 7, the apparatus 700 includes:
the configuration unit 701 is configured to decompose the sub-graph matching task into a plurality of steps in sequence according to the batch synchronous parallel BSP calculation mode, and generate configuration information corresponding to the plurality of steps; determining a target execution module corresponding to a target calculation engine from a plurality of execution code modules respectively corresponding to a plurality of preset calculation engines;
the execution unit 702 is configured to send the target execution module and the configuration information to a target computing engine, and is configured to execute the target execution module according to the configuration information by the target computing engine, so as to obtain a sub-graph matching result corresponding to the sub-graph matching task.
Yet another aspect of the embodiments provides a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform any of the methods described above.
In yet another aspect, embodiments of the present disclosure provide a computing device including a memory having executable code stored therein and a processor that, when executing the executable code, performs any of the methods described above.
It should be understood that the description of "first," "second," etc. herein is merely for simplicity of description and does not have other limiting effect on the similar concepts.
Although one or more embodiments of the present description provide method operational steps as described in the embodiments or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in an actual device or end product, the instructions may be executed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment, or even in a distributed data processing environment) as illustrated by the embodiments or by the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, it is not excluded that additional identical or equivalent elements may be present in a process, method, article, or apparatus that comprises a described element.
For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, when one or more of the present description is implemented, the functions of each module may be implemented in the same piece or pieces of software and/or hardware, or a module that implements the same function may be implemented by a plurality of sub-modules or a combination of sub-units, or the like. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
One skilled in the relevant art will recognize that one or more of the embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
One or more embodiments of the present specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments. In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present specification. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
The foregoing is merely an example of one or more embodiments of the present specification and is not intended to limit the one or more embodiments of the present specification. Various modifications and alterations to one or more embodiments of this description will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of the present specification, should be included in the scope of the claims.

Claims (13)

1. A method of sub-graph matching by a compute engine, comprising:
decomposing the sub-graph matching task into a plurality of super steps in turn according to a batch synchronous parallel BSP calculation mode, and generating configuration information corresponding to the plurality of super steps; determining a target execution module corresponding to a target calculation engine from a plurality of execution code modules respectively corresponding to a plurality of preset calculation engines;
and sending the target execution module and the configuration information to a target calculation engine, wherein the target calculation engine is used for executing the target execution module according to the configuration information to obtain a sub-graph matching result corresponding to the sub-graph matching task.
2. The method of claim 1, wherein the programming language upon which the plurality of executing code modules are based comprises one of Python language, scalea language, java language, and R language.
3. The method of claim 1, wherein the configuration information includes each of the plurality of supersoles and sub-operations included by each of the supersoles.
4. The method of claim 3, wherein executing the target execution module according to the configuration information comprises:
and sequentially executing the plurality of supersoles, wherein executing a first supersole included in the plurality of supersoles includes executing a sub-operation included in the first supersole.
5. The method of claim 4, wherein performing the sub-operation comprised by the first superstep comprises:
and calling an application interface corresponding to the sub-operation on the target computing engine according to the sub-operation included in the first step.
6. The method of claim 5, wherein the first superstep includes a sub-operation for matching a Match sub-operation and a Join sub-operation, the Match sub-operation is used for matching a graph object meeting a predetermined condition, the Joi n sub-operation is used for connecting a matching result of the Match sub-operation with an initial sub-graph of the first superstep, and an application interface corresponding to the sub-operation on the target computing engine is a mapped flattened platmap interface.
7. The method of claim 1, wherein the target computing engine comprises a plurality of computing nodes, including a first node;
the target calculation engine executes the target execution module according to the configuration information, and the target calculation engine comprises:
the first node executes a part corresponding to the first node in each super step;
the method further comprises the steps of:
and before the first node executes the part corresponding to the first node in each step, saving the graph data associated with the first node in each step to the local of the first node.
8. A method according to claim 3, wherein the sub-operations comprised by the superstep comprise performing a graph feature calculation.
9. The method of claim 8, wherein the graph feature calculation includes determining importance features of graph nodes based on a web page level PageRank algorithm.
10. The method of claim 1, wherein the sub-graph matching task is used for sub-graph matching for a knowledge graph.
11. An apparatus for sub-graph matching by a compute engine, the apparatus comprising:
the configuration unit is configured to decompose the sub-graph matching task into a plurality of steps in sequence according to the batch synchronous parallel BSP calculation mode, and generate configuration information corresponding to the plurality of steps; determining a target execution module corresponding to a target calculation engine from a plurality of execution code modules respectively corresponding to a plurality of preset calculation engines;
and the execution unit is configured to send the target execution module and the configuration information to a target calculation engine, and is used for the target calculation engine to execute the target execution module according to the configuration information so as to obtain a sub-graph matching result corresponding to the sub-graph matching task.
12. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-10.
13. A computing device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-10.
CN202311551230.2A 2023-11-17 2023-11-17 Method and device for sub-graph matching through calculation engine Pending CN117539928A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311551230.2A CN117539928A (en) 2023-11-17 2023-11-17 Method and device for sub-graph matching through calculation engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311551230.2A CN117539928A (en) 2023-11-17 2023-11-17 Method and device for sub-graph matching through calculation engine

Publications (1)

Publication Number Publication Date
CN117539928A true CN117539928A (en) 2024-02-09

Family

ID=89795328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311551230.2A Pending CN117539928A (en) 2023-11-17 2023-11-17 Method and device for sub-graph matching through calculation engine

Country Status (1)

Country Link
CN (1) CN117539928A (en)

Similar Documents

Publication Publication Date Title
US20140136555A1 (en) Appliance for accelerating graph database management and analytics systems
US20070021995A1 (en) Discovering patterns of executions in business processes
Sumithra et al. Using distributed apriori association rule and classical apriori mining algorithms for grid based knowledge discovery
US10592494B2 (en) Collection record for overlapping data stream collections
US20160292581A1 (en) Minimized processing of streaming changes into a semantic reasoner
WO2020092135A1 (en) Extracting web api endpoint data from source code
CN106569896A (en) Data distribution and parallel processing method and system
CN114691658A (en) Data backtracking method and device, electronic equipment and storage medium
CN111061763B (en) Method and device for generating rule execution plan of rule engine
US9766929B2 (en) Processing of data stream collection record sequence
US10102251B2 (en) Lockless open collection data structure
CN114003775A (en) Graph data processing and querying method and system
Jaghoori et al. Symmetry and partial order reduction techniques in model checking Rebeca
Cao et al. TASS: Transaction assurance in service selection
Shrivastava et al. Real time transaction management in replicated DRTDBS
US10547565B2 (en) Automatic determination and just-in-time acquisition of data for semantic reasoning
CN117539928A (en) Method and device for sub-graph matching through calculation engine
Alam et al. Generating massive scale-free networks: Novel parallel algorithms using the preferential attachment model
CN110516130A (en) Metadata processing method and device, storage medium, electronic device
CN110851178B (en) Inter-process program static analysis method based on distributed graph reachable computation
US20090271382A1 (en) Expressive grouping for language integrated queries
Thor et al. Cloudfuice: A flexible cloud-based data integration system
CN111199386A (en) Workflow engine and implementation method thereof
CN111782373A (en) Job scheduling method and device
US10719304B2 (en) Computer program generation using a library

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination