WO2019062156A1

WO2019062156A1 - Storage procedure executing method and device, and storage medium

Info

Publication number: WO2019062156A1
Application number: PCT/CN2018/087384
Authority: WO
Inventors: 李旭良; 单卫华; 董阳
Original assignee: 华为技术有限公司
Priority date: 2017-09-27
Filing date: 2018-05-17
Publication date: 2019-04-04
Also published as: CN107729421B; CN107729421A

Abstract

The present application discloses a storage procedure executing method and device, and a storage medium, belonging to the technical field of big data. Said method is used for a distributed database, and comprises: a target data storage node receiving first data information and first step indication information which are sent by a first data storage node; performing storage procedure processing on the first data on the basis of the first data information, the first step instruction information and stored topology information of the target storage process, so as to obtain second data; determining a second data storage node and second step instruction information on the basis of the second data, the stored partition information and the first step instruction information, and sending the second data information and the second step instruction information to a second data storage node. Thus, each data storage node may directly process data and send a data processing result to the second data storage node, without the need of returning same to the data storage node manager, reducing the amount of data transmission, improving the execution efficiency and operation performance of the storage procedure.

Description

Method, device and storage medium for executing stored procedure

Technical field

The present application relates to the field of big data technologies, and in particular, to a method, an apparatus, and a storage medium for executing a stored procedure.

Background technique

In recent years, with the rapid growth of data volume, distributed database technology has developed rapidly. Distributed database refers to a network of geographically dispersed data storage nodes connected to form a logically centralized database. The massive data partitions can be stored in each data storage node, which solves the problem of database scalability. A distributed database can complete the corresponding business by calling and executing a stored procedure. The stored procedure is a callable object stored in the database, which is essentially a set of SQL (Structured Query Language) statements that can implement specific functions. A collection, that is, a stored procedure, includes multiple arithmetic steps, which can be called by storing a procedure name and input data. Moreover, in a distributed database architecture, for a stored procedure to be called, if the data required for execution of the stored procedure is stored in a plurality of data storage nodes, the execution of the stored procedure must be performed by multiple data storage nodes. The interaction is complete.

In the related art, a distributed database usually performs a stored procedure in a master-slave control manner, that is, when a data storage node receives a call request of a stored procedure, if the data storage node has a stored procedure management function, the data The storage node can be used as a storage node manager (Master) to perform unified management and scheduling on multiple data storage nodes that execute the stored procedure. First, the storage node manager may determine, according to the first data and the stored partition information, a first data storage node for processing the first data, where the first data is input data carried by the call request of the stored procedure, and the partition information A storage location for indicating data, the first data storage node being a data storage node storing the first data. Then, the storage node manager sends the first data and the first operation step of the stored procedure to the first data storage node, to instruct the first data storage node to perform the first operation step based on the first data, to obtain the first Two data. Thereafter, the first data storage node needs to pass back the second data to the storage node manager, so that the storage node manager continues to determine the second data storage node for processing the second data based on the stored partition information, and The second data and the second operation step of the stored procedure are sent to the second data storage node to instruct the second data storage node to perform a second operation step based on the second data to obtain the third data. Thereafter, the second data storage node continues to pass the third data back to the storage node manager, so that the storage node manager repeatedly performs the step of calling the next data storage node until the last data of the execution of the stored procedure is invoked. The storage node is configured such that the last data storage node performs the last operation step of the stored procedure to obtain the last data, and returns the last data as output data to the storage node manager.

In the above master-slave control mode, each data storage node that executes the stored procedure needs to receive data from the storage node manager, and also needs to return the processing result of the data to the data storage node, so the storage node manager and each data storage node There will be a large amount of data transfer between them, which affects the execution efficiency and running performance of the stored procedure.

Summary of the invention

In order to solve the problem that the amount of data transmitted between the storage node manager and each data storage node is large and affects the execution efficiency and the running performance of the storage process, the present application provides a storage method execution method and device. Storage medium. The technical solution is as follows:

In a first aspect, a method for executing a stored procedure is provided, which is applied to a distributed database; the distributed database includes a plurality of data storage nodes for executing a target storage process, and the first of the plurality of data storage nodes The data storage node, the target data storage node, and the second data storage node are three data storage nodes that sequentially execute the target storage process; the method includes:

Receiving, by the target data storage node, first data information and first step indication information sent by the first data storage node, where the first data information is first data or indication information of the first data, where a data is data required when the target stored procedure is executed, the first step information is used to indicate a plurality of operation steps included in the target storage process by the first operation step that the target data storage node needs to perform Position in

The target data storage node performs a storage process on the first data to obtain a second data based on the first data information, the first step indication information, and the stored topology information of the target storage process. The topology information of the target storage process is used to indicate a plurality of operation steps included in the target storage process and an execution sequence of the multiple operation steps;

The target data storage node determines the second data storage node and the second step indication information based on the second data, the partition information, and the first step indication information, and sends the second data storage node to the second data storage node. Two data information and the second step indication information, the partition information is used to indicate a storage location of the data, and the second step indication information is used to indicate that the second data storage node needs to perform a second operation step in the The location in the plurality of operation steps included in the target storage process, the second data information being indication information of the second data or the second data.

The target data storage node is any one of the plurality of data storage nodes for executing the target storage process, and the first data storage node is the previous data of the target data storage node arranged according to the execution order of the target storage process. The storage node, the second data storage node is a next data storage node of the target data storage node arranged in the order of execution of the target storage process.

In the embodiment of the present invention, each data storage node can directly process the received data by itself, then determine the next data storage node, and directly send the data processing result to the next data storage node without using the data again. The processing result is passed back to the data storage node manager, thereby greatly reducing the data transmission amount and improving the execution efficiency and running performance of the storage process.

In a specific implementation, when the first data storage node is an input node, the first data is input data acquired by the input node based on a call request of the target storage process, and the input node is the distribution a data storage node in the database that receives the call request of the target stored procedure;

When the first data storage node is an intermediate node, the first data is a data processing result obtained by the intermediate node processing the data based on the received data information, where the intermediate node is used to execute the target A data storage node of any of the plurality of operational steps included in the stored procedure.

In the embodiment of the present invention, the first data may be initial input data sent by the input node, that is, data input by the user when the target storage process is invoked, or intermediate data sent by the intermediate node, that is, included in the execution target storage process. The data generated during the course of any of the operational steps. Accordingly, the target data storage node may be the next data storage node of the input node or the next data storage node of any intermediate node.

In a specific implementation, the target data storage node performs a storage process on the first data based on the first data information, the first step indication information, and the stored topology information of the target storage process, to obtain The second data includes:

Determining, by the target data storage node, the first data based on the first data information;

Determining, by the target data storage node, a first operation step that the target data storage node needs to perform, based on the first step indication information and topology information of the target storage process;

The target data storage node performs the first operation step based on the first data to obtain the second data.

The topology information of the target storage process is used to indicate execution logic inside the target storage process, and is specifically used to indicate multiple operation steps included in the target storage process and an execution sequence of the multiple operation steps.

In the embodiment of the present invention, since the target data storage node stores the topology information of the target storage process, in the process in which the target storage process is executed, the target data storage node may directly directly follow the step indication information sent by the previous data storage node and The topology information of the target stored procedure stored by itself determines the operation steps that need to be performed by itself, and may send step indication information to the next data storage node to indicate the operation steps that the next data storage node needs to perform.

In this way, without the scheduling and management of the storage node manager, multiple data storage nodes can sequentially execute the target storage process, and can sequentially generate the generated intermediate data, thereby greatly reducing the data transmission amount and improving the execution of the storage process. Efficiency and operational performance.

In a specific implementation, the target data storage node determines the first data based on the first data information, including:

When the first data information is the indication information of the first data, the target data storage node acquires the first data from the stored data based on the indication information of the first data.

The first data information may be index information of the first data or the like. In the embodiment of the present invention, when the data amount of the first data is large, the first data storage node may convert the first data into the indication information of the first data, and send the indication information of the first data to the target data storage node. , thereby reducing the amount of data transmission and improving the execution efficiency of the stored procedure.

In another embodiment, the method further includes:

Receiving, by the target data storage node, an identifier of the target storage process sent by the first data storage node,

Obtaining topology information of the target storage process based on the identifier of the target storage process; and/or,

The target data storage node sends an identifier of the target stored procedure to the second data storage node.

In the embodiment of the present invention, when each data storage node stores topology information of multiple stored procedures, each data storage node for executing the target storage process may also transmit an identifier of the target stored procedure to indicate the next data. The storage node is currently executing the stored procedure, and obtains the topology information of the target stored procedure from the stored topology information of the plurality of stored procedures according to the identifier of the delivered target stored procedure, as a basis for determining the execution of the operation step, thereby improving Execute the accuracy of the stored procedure.

In a specific implementation, when the first step indication information is used to indicate that the first operation step is another operation step other than the last operation step among the plurality of operation steps included in the target storage process, The second step indication information is used to indicate the next operation step of the first operation step;

When the first step indication information is used to indicate that the first operation step is the last one of the plurality of operation steps included in the target storage process, the second step indication information is used to indicate an output step, The outputting step is for indicating that the received data is output as output data of the target storage process.

In a specific implementation, the target data storage node determines the second data storage node and the second step indication information, based on the second data, the partition information, and the first step indication information, including:

Determining, by the target data storage node, a data storage node pre-stored with the second data from the distributed database based on the second data and partition information;

The target data storage node determines a data storage node in which the second data is stored in advance as the second data storage node.

In the embodiment of the present invention, since the target data storage node stores the partition information, in the process in which the target storage process is executed, the target data storage node may directly determine the pre-storage according to the processed second data and the stored partition information. a data storage node having the second data, and determining a data storage node in which the second data is stored in advance as a second data storage node for processing the second data.

In this way, the processed intermediate data does not need to be sent to the storage node manager, and the storage node manager schedules the next data storage node, and the target data storage node itself can determine the next data storage node according to the stored partition information, and the intermediate data is It is sent to the next data storage node for processing, which greatly reduces the amount of data transmission and improves the execution efficiency and running performance of the stored procedure.

In a specific implementation, the determining, by the second intermediate data, the stored partition information, and the first step indication information, the next data storage node and the second step indication information, including:

And determining, according to the first step indication information and topology information of the target storage process, the second step indication information.

Specifically, determining, according to the first step indication information and the topology information of the target storage process, the second step indication information, including:

Determining, by the target data storage node, the plurality of operation steps included in the target storage process by the first operation step that the target data storage node needs to perform, based on the first step indication information and the topology information of the target storage process Position in

The target data storage node determines the second step indication information based on a position of the first operation step at a position in a plurality of operation steps included in the target storage process.

In another embodiment, the method further includes:

When the target data storage node receives the uploaded target storage process, performing topology compilation on the target storage process to obtain topology information of the target storage process;

The target data storage node sends topology information of the target stored procedure to other data storage nodes in the distributed database except the target data storage node.

In the embodiment of the present invention, when any data storage node in the distributed database receives the uploaded target storage process, the target storage process may be topologically compiled, the topology information of the target storage process is obtained, and then the target is obtained. The topology information of the stored procedure is sent to other data storage nodes in the distributed database except the data storage node.

In this way, it is ensured that any data storage node in the distributed database pre-stores the topology information of the target stored procedure, and when any data storage node needs to execute the target storage process, the topology information of the stored procedure according to the stored target is obtained. Determining the operational steps that need to be performed ensures that the sequential execution of the individual data storage nodes is performed without the scheduling of the storage node manager.

Specifically, the target data storage node performs topology compilation on the target storage process to obtain topology information of the target storage process, including:

The target data storage node performs decomposition processing on the target storage process to obtain a plurality of operation steps, and determines topology information of the target storage process based on the execution order of the plurality of operation steps and the plurality of operation steps; or ,

Decoding the target storage process by the target data storage node to obtain a plurality of operation steps, adding an input step before the plurality of operation steps according to an execution sequence of the plurality of operation steps, and Adding an output step to obtain topology information of the target stored procedure, the input step is for inputting input data of the target stored procedure, and the output step is for outputting the target stored procedure The data is output.

In a second aspect, there is provided an execution apparatus of a stored procedure, the execution apparatus of the stored procedure having a function of implementing an execution method behavior of the stored procedure in the first aspect described above. The execution device of the stored procedure includes at least one module for implementing the execution method of the stored procedure provided by the above first aspect.

In a third aspect, there is provided an apparatus for executing a stored procedure, the structure of an execution apparatus of the stored procedure comprising a processor and a memory, the memory for storing an execution apparatus supporting the stored procedure to perform the foregoing first aspect A program storing a method of executing a process, and data related to storing an execution method for implementing the stored procedure provided by the first aspect described above. The processor is configured to execute a program stored in the memory. The operating device of the storage device may further include a communication bus for establishing a connection between the processor and the memory.

In a fourth aspect, there is provided a computer readable storage medium having stored therein instructions that, when run on a computer, cause the computer to perform the method of executing the stored procedure described in the first aspect above.

In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of executing the stored procedure described in the first aspect above.

The technical effects obtained by the second aspect, the third aspect, the fourth aspect, and the fifth aspect are similar to those obtained by the corresponding technical means in the first aspect, and are not described herein again.

The beneficial effects brought by the technical solutions provided by the present application are:

In this application, any one of the plurality of data storage nodes in the distributed database for executing the target storage process may receive the first data information sent by the first data storage node, and then based on the stored target storage process The topology information directly processes the first data to obtain second data, determines a next storage node for processing the second data based on the second data and the stored partition information, and finally sends the second data information to the second data storage node. So that the next database stores the second data based on the topology information of the stored target stored procedure. That is, each data storage node can directly process the data by itself, and then determine the second data storage node, and directly send the data processing result to the second data storage node, without returning the data processing result. Give the data storage node manager, which greatly reduces the amount of data transmission and improves the execution efficiency and running performance of the stored procedure.

DRAWINGS

1A is a system architecture diagram of a distributed database 100;

1B is a structural diagram of an execution system of a storage process according to an embodiment of the present invention;

1C is a schematic flowchart of an execution process of a storage process provided by the related art;

FIG. 1D is a schematic diagram showing the logical structure of a data storage node 10 according to an embodiment of the present invention;

1E is a schematic structural diagram of hardware of a data storage node 10 according to an embodiment of the present invention;

1F is a flowchart of a method for executing a stored procedure according to an embodiment of the present invention;

1G is a schematic diagram of topology information of a storage process according to an embodiment of the present invention;

1H is a schematic diagram of topology information of another storage process according to an embodiment of the present invention;

2A is a structural diagram of an execution system of a storage process according to an embodiment of the present invention;

2B is a flowchart of another method for executing a stored procedure according to an embodiment of the present invention;

3 is a schematic flowchart of an execution process of another storage process according to an embodiment of the present invention;

4A is a schematic structural diagram of an apparatus for executing a storage process according to an embodiment of the present invention;

4B is a schematic structural diagram of a processing module 402 according to an embodiment of the present invention;

FIG. 4C is a schematic structural diagram of a determining module 403 according to an embodiment of the present invention.

Detailed ways

In order to make the objects, technical solutions and advantages of the present application more clear, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

Before the detailed description of the method for executing the stored procedure provided by the embodiment of the present invention, the application scenario of the embodiment of the present invention is first introduced.

The embodiment of the present invention is applied to a scenario in which a related service is processed by using a distributed database, and the service may be a query service, a comparison service, or the like. Taking the query service as an example, the distributed database can be used to query the age of a person's great-grandfather and query the financial status of a person based on the data stored in each data storage node in the distributed database. In the process of processing a distributed database, it is usually necessary to complete the business by calling and executing a stored procedure.

After the application scenario of the embodiment of the present invention is introduced, in order to facilitate the understanding of the execution method of the stored procedure provided by the embodiment of the present invention, the system architecture of the embodiment of the present invention is introduced.

1A is a system architecture diagram of a distributed database 100. As shown in FIG. 1A, the distributed database 100 includes a plurality of physically dispersed data storage nodes 10, which may be connected by a network.

Each of the data storage nodes 10 has its own local database for storing data, and after being connected to each other through a network, a global logically centralized and physically distributed large database, that is, a distributed database, can be formed. Specifically, each data storage node 10 may be a node or a server capable of storing data.

The basic idea of a distributed database is to distribute the data in the original centralized database to multiple data storage nodes connected through the network to obtain larger storage capacity and higher concurrency. Specifically, the data partitioning technology can be used to store massive data fragments into storage nodes of the distributed database. In summary, the sub-database sub-table technology refers to storing the data of the large table into each storage node according to the established partitioning strategy, or dividing the large table into the business sub-tables with smaller data amounts, and each sub-table Store to each storage node according to the established partitioning strategy.

The above-mentioned distributed database 100 can be used to process the corresponding service. In the embodiment of the present invention, the processed service needs to be completed by calling and executing the stored procedure. FIG. 1B is a structural diagram of an execution system of a storage process according to an embodiment of the present invention. As shown in FIG. 1B, the system architecture includes a client 200 and a distributed database 100. The client 200 and the distributed database 100 can pass through. Internet connection.

In an actual implementation, the client 200 may send a call request of the stored procedure to the distributed database 100, and then a certain data storage node 10 in the distributed database 100 may receive the call request of the stored procedure, and according to the stored procedure. The call request invokes the stored procedure and then interacts with other data storage nodes 10 to execute the stored procedure.

The data storage node 10 that receives the call request of the stored procedure may be specified by the client 200, or may be specified by the distributed database 100 according to the preset service processing logic, which is not limited by the embodiment of the present invention.

It should be noted that FIG. 1B is only an example in which the client 200 is an entity other than the distributed database 100. In an actual application, the client 200 may also be any data storage node 10 in the distributed database 100. . That is, when any data storage node 10 obtains a call request of a stored procedure triggered by a user or a preset condition, the stored procedure can be called according to the call request of the stored procedure, and then interact with other data storage nodes 10. Execute the stored procedure.

It should be noted that FIG. 1A and FIG. 1B only take the distributed database 100 as including three data storage nodes 10 as an example, and those skilled in the art can understand the number of data storage nodes 10 shown in FIG. 1A and FIG. 1B. The definition of the distributed database 100 is not limited. In the actual application, the distributed database 100 may include more or less data storage nodes 10 than illustrated, which is not limited by the embodiment of the present invention.

In order to facilitate the understanding of the implementation method of the execution method of the stored procedure provided by the embodiment of the present invention, the execution flow of the stored procedure provided by the related art is briefly introduced. FIG. 1C is a schematic diagram of an execution flow of a storage process provided by the related art. As shown in FIG. 1C, the distributed database 100 includes at least a data storage node A, a data storage node B, a data storage node C, and a data storage node M, and each data storage node may be connected to each other through a network.

It is assumed that the distributed database 100 is to be executed by the stored procedure S, and the semantics of the stored procedure S is “if the name of the great-grandfather has the name of the great-grandfather”, the stored procedure S includes three operation steps, which are respectively the operation step S1: Query the name of the father of @name, operation step S2: query the name of the grandfather of @name according to the father of @name, operation step S3: query the age of the great-grandfather of @name according to the name of the grandfather of @name. Where @name is the data to be input of the stored procedure S, which can be any name.

In addition, it is assumed that each data storage node in the distributed database shown in FIG. 1C stores a different data list, and each data list is used to store the name of the person, the name of the corresponding father, and the age of the father. That is, different data lists can be partitioned by data according to the name of the person, and thus stored in different storage nodes. For example, as shown in FIG. 1C, the data storage node A stores the data list 1, the data storage node B stores the data list 2, and the data storage node C stores the data list 3. Moreover, the data storage node M stores partition information for indicating a storage location of the data, that is, a storage node that can indicate a different person name.

If the stored procedure S is executed in the master-slave control manner provided by the related art, as shown in FIG. 1C, after the data storage node M receives the call request of the stored procedure S, the data storage node M can serve as the storage node manager (Master). ). It is assumed that the input data carried by the call request of the stored procedure S is Z3, that is, @name is Z3, and the execution flow of the stored procedure S may include the following steps 1)-7):

1) The data storage node M calls the stored procedure S according to the call request of the stored procedure S, and determines the operation steps and execution order included in the stored procedure S. Then, the stored procedure node M determines the data storage node A storing Z3 based on Z3 and the stored partition information, and transmits the Z3 sum operation step S1 to the data storage node A.

2) The data storage node A performs the operation step S1 based on Z3, that is, the name of the father who queries Z3 from the stored data list 1 is Z2, and then passes back Z2 to the data storage process node M.

3) The data storage node M determines the data storage node B storing Z2 based on Z2 and the stored partition information, and transmits the Z2 and operation step S2 to the data storage node B.

4) The data storage node B performs the operation step S2 based on Z2, that is, the name of the father of the query Z2 from the stored data list 2 is Z1, and then the Z1 is transmitted back to the data storage process node M.

5) The data storage node M determines the data storage node C in which Z1 is stored based on Z1 and the stored partition information, and transmits the Z1 and operation step S3 to the data storage node C.

6) The data storage node C performs an operation step S3 based on Z1, that is, the age of the father of the query Z1 from the stored data list 3 is 85, and then 85 is transmitted back to the data storage node M.

7) After the data storage node M receives 85, 85 can be output as the output data of the stored procedure S.

As shown in FIG. 1C, each of the data storage node A, the data storage node B, and the data storage node C needs to receive data from the data storage node M, and needs to return the processing result of the data back to the data storage. The node M, that is, the data storage node executing the stored procedure and the storage node manager have a large amount of data transmission, and, because the storage node manager needs to summarize the data, and then perform the scheduling of each operation step of the storage process, Therefore, each data storage node that executes the stored procedure needs to wait.

That is to say, the master-slave control mode adopted is accompanied by data aggregation, data storage node waiting, and a large amount of data transmission in the process of executing the stored procedure, thereby greatly affecting the execution efficiency and running performance of the stored procedure.

After the execution flow of the stored procedure provided by the related art is briefly introduced, the execution flow of the stored procedure provided by the embodiment of the present invention is briefly introduced.

In the embodiment of the present invention, in combination with the distributed database 100 shown in FIG. 1A or FIG. 1B, it is assumed that the plurality of data storage nodes included in the distributed database 100 are used to execute a target storage process, and the plurality of data storage nodes The first data storage node, the target data storage node, and the second data storage node are three data storage nodes that sequentially execute the target storage process, wherein the target data storage node may be among a plurality of data storage nodes for executing the target storage process. Any of the data storage nodes, the first data storage node is a previous data storage node of the target data storage node arranged in the execution order of the target storage process, and the second data storage node is the target data arranged according to the execution order of the target storage process The next data storage node of the storage node, then the target data storage node can be used to perform the following steps 1)-3):

1) The target data storage node receives the first data information and the first step indication information sent by the first data storage node, where the first data information is the first data or the indication information of the first data, where the first data is the target a data required when the stored procedure is executed, the first step information is used to indicate a position of the first operation step that the target data storage node needs to perform in the plurality of operation steps included in the target storage process;

2) The target data storage node processes the first data based on the first data information, the first step indication information, and the stored topology information of the target storage process to obtain second data, and the topology information of the target storage process And a plurality of operation steps included in the target storage process and an execution sequence of the multiple operation steps;

3) The target data storage node determines the second data storage node and the second step indication information based on the second data, the stored partition information, and the first step indication information, and sends the second data information to the second data storage node. And the second step indication information, the partition information is used to indicate a storage location of the data, and the second step indication information is used to indicate that the second operation step that the second data storage node needs to perform is included in the target storage process. a location in the computing step, the second data information being the second data or the indication information of the second data.

That is, each data storage node executing the stored procedure can directly process the data by itself, then determine the second data storage node, and directly send the data processing result to the second data storage node without using the data. The processing result is passed back to the data storage node manager, thereby greatly reducing the data transmission amount and improving the execution efficiency and running performance of the storage process.

In a specific embodiment, stream processing modules can be configured in each data storage node 10 in the distributed database 100, and partition information is stored at each data storage node 10 such that each data storage node 10 is configured. The flow processing module and the stored partition information implement the execution method of the stored procedure provided by the embodiment of the present invention.

FIG. 1D is a schematic diagram showing the logical structure of a data storage node 10 according to an embodiment of the present invention. As shown in FIG. 1D, the data storage node 10 includes a stream processing module 11 and partition information 12. The stream processing module 11 is for executing a stored procedure, and the partition information 12 is for indicating a storage location of the data.

The partition information 12 may be in the form of a list or a partition policy.

The stream processing module 11 includes a topology manager 11a and a path planning module 11b.

The topology manager 11a is configured to store topology information of at least one stored procedure, and the topology information of each stored procedure is used to indicate a plurality of operation steps and an execution sequence of the plurality of operation steps included in the storage process. Further, the topology manager 11a may be further configured to perform topology compilation on the uploaded target storage process, obtain topology information of the target storage process, and instruct the data storage node 10 to send other data of the topology information of the target storage process. Storage node 10.

The path planning module 11b is configured to perform partition scheduling and arithmetic operations of the stored procedure. Partition scheduling refers to determining the next storage node for processing the data based on the partition information of the data and the storage. The arithmetic operation refers to determining an operation step that the data storage node 10 needs to perform, and performs the operation step based on the data. Specifically, the operation step that needs to be performed may be determined based on the step indication information and the stored topology information of the stored procedure.

Before describing the execution method of the stored procedure provided by the embodiment of the present invention, the structure of the data storage node according to the embodiment of the present invention is described in detail.

FIG. 1E is a schematic diagram showing the hardware structure of a data storage node 10 according to an embodiment of the present invention. Referring to FIG. 1E, the data storage node 10 includes a processor 13, a communication bus 14, a memory 15, and at least one communication interface 16. It will be understood by those skilled in the art that the structure of the data storage node 10 shown in FIG. 1E does not constitute a limitation on the data storage node 10. In practical applications, the data storage node 10 may include more or fewer components than illustrated. The embodiment of the present invention does not limit this, or combines some components, or different component arrangements.

The processor 13 can be a general purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the execution of the program of the present application. integrated circuit.

Communication bus 14 may include a path for communicating information between the components described above.

The memory 15 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (RAM), or other information that can store information and instructions. Type of dynamic storage device, or Electro Scientific Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disc storage, optical disc Storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or capable of carrying or storing desired program code in the form of instructions or data structures and capable of being Any other medium accessed by the computer, but is not limited thereto.

The memory 15 can exist independently and is coupled to the processor 13 via a communication bus 14. The memory 15 can also be integrated with the processor 13. In the embodiment of the present invention, the memory 15 may be used to store data, such as may be used to store partition information, topology information of a stored procedure, or information sent by a first data storage node, and the like, and the memory 15 may also be used for storage. One or more running programs and/or modules that execute the method of executing the stored procedure provided by the embodiments of the present invention.

The communication interface 16 uses devices such as any transceiver for communicating with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Networks (WLAN), and the like.

In a particular implementation, as an embodiment, processor 13 may include one or more CPUs, such as CPU0 and CPU1 shown in Figure 1C.

In a specific implementation, as an embodiment, the UE may further include an output device 17 and an input device 18.

Among them, the output device 17 communicates with the processor 13, and the information can be displayed in various ways. For example, the output device 17 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector. Wait.

Wherein the input device 18 is in communication with the processor 13, the user's input can be received in a variety of ways. For example, input device 18 can be a keyboard, a touch screen device, or a sensing device, and the like.

The data storage node 10 described above may be a terminal or other node having a data storage function. In a specific implementation, the data storage node 10 can be a mobile phone, a portable computer, a network server, a personal digital assistant (PDA), a tablet computer, a user equipment (UE), a communication device, or an embedded device. The embodiment of the invention does not limit the type of data storage node 10.

The memory 15 is used to store program code for executing the solution of the present application, and is controlled by the processor 13 for execution. The processor 13 is operative to execute program code stored in the memory 15. For example, the data storage node 10 shown in FIG. 1E can implement the methods described in the following embodiments of FIGS. 1F and 2B through the processor 13 and program code in the memory 15.

The method for executing the stored procedure provided by the embodiment of the present invention will be described in detail below with reference to FIG. 1A or FIG. 1B. FIG. 1F is a flowchart of a method for executing a stored procedure according to an embodiment of the present invention. The method is applied to the distributed database of FIG. 1A or FIG. 1B, and the distributed database includes multiple data storage nodes for executing. The target storage process, the first data storage node, the target data storage node, and the second data storage node of the plurality of data storage nodes are three data storage nodes that sequentially execute the target storage process. Referring to FIG. 1F, the method includes the following steps:

Step 101: The first data storage node sends the first data information and the first step indication information to the target data storage node.

The target data storage node may be used to execute any one of the plurality of data storage nodes of the target storage process, and the first data storage node refers to the previous one of the target data storage nodes arranged according to the execution order of the target storage process. The data storage node, the second data storage node refers to the next data storage node of the target data storage node arranged in the order of execution of the target storage process.

Wherein, the target stored procedure can be any stored procedure called by the distributed database. It should be noted that the stored procedure in the embodiment of the present invention is not a process of storing data, but a callable object stored in a database, which is similar to a callable function, and may be stored in a practical application. Name and input data are called. A stored procedure is essentially a set of SQL statements capable of implementing a specific function, that is, the stored procedure includes a plurality of operation steps, that is, a corresponding set of SQL statements, each of which is one of the corresponding SQL statement sets. SQL statement. Specifically, each of the operation steps may be used for performing the query processing, and may be used for other processing, which is not limited by the embodiment of the present invention.

The first data information is the first data or the indication information of the first data, and the first data is data required when the target data storage node executes the first operation step included in the target data storage node. The indication information of the first data may be used to indicate a storage location of the first data or the first data, and the first data may be acquired according to the indication information of the first data.

The first step information is used to indicate a location of the first operation step that the target data storage node needs to perform in the multiple operation steps included in the target storage process, that is, to indicate that the target data storage node needs to be executed. The first operation step is a plurality of operation steps of the plurality of operation steps included in the target storage process, and the plurality of operation steps are a plurality of operation steps that have been arranged in the execution order.

For example, the first step indication information may be a numerical value. For example, when the first step information is used to indicate the first operation step of the plurality of operation steps included in the target storage process, the first step indication information may be a value of 1. When the first step information is used to indicate the second one of the plurality of operation steps included in the target storage process, the first step indication information may be a value of 2 or the like.

The first data storage node may be an input node or an intermediate node. The input node refers to a data storage node in the distributed database that receives a call request of a target stored procedure, and the intermediate node refers to a data storage node used to execute any one of a plurality of operation steps included in the target storage process, Specifically, it is a data storage node for performing the previous operation step of the first operation step performed by the target data storage node.

In the embodiment of the present invention, according to the difference of the first data storage node, the manner and meaning of the first data and the first step indication information are different, and specifically include the following two types:

1) when the first data storage node is an input node, the first data is input data acquired by the input node based on a call request of the target storage process, and the first step indication information is that the input node is based on the target storage process The call request determination is obtained, and the first step indication information is used to indicate a first one of the plurality of operation steps included in the target storage process.

The calling request of the target stored procedure is used to invoke the target stored procedure, and may carry the identifier and input data of the target stored procedure. The identifier of the target stored procedure is used to uniquely identify the target stored procedure, and may be a name or a number of the target stored procedure. The input data refers to an input parameter used to invoke the target stored procedure, and specifically may be data input for the target stored procedure when the user initiates a service.

In an actual application, the calling request of the target stored procedure may be triggered by the user through the client, and the client may be an entity outside the distributed database, or may be any data storage node in the distributed database, and the present invention is implemented. This example does not limit this.

When any data storage node in the distributed database receives the call request of the target stored procedure, the data storage node can serve as an input node. And the input node, when receiving the call request of the target stored procedure, may invoke the target stored procedure according to the identifier and the input data of the target stored procedure carried by the target stored procedure call request, and may perform an input step, where the input step is Refers to the input data obtained during the process of calling the target stored procedure. After obtaining the input data, the input node may use the input data as the first data, and determine, according to the first data and the stored partition information, a second data storage node, that is, a target data storage, for processing the first data. The node then sends the first data information and the first step indication information to the target data storage node.

2) when the first data storage node is an intermediate node, the first data is a data processing result obtained by the intermediate node processing the data based on the received data information, where the intermediate node is used to perform the target storage The data storage node of any of the plurality of operational steps included in the process is specifically a data storage node for performing the previous operational step of the first operational step performed by the target data storage node.

The first step indication information is determined by the first data storage node based on the received step indication information, and the first step indication information is used to indicate the next operation step of the operation step performed by the first data storage node. .

The storing process processing on the data refers to performing an operation step that needs to be performed currently based on the data, for example, performing query processing according to the data to query other data related to the data.

That is, when the first data storage node is an intermediate node, the first data storage node and the target data storage node are processed in the same manner, and may be processed according to the target data storage node to obtain the first data information. And the first step indication information, and then transmitting the first data information and the first step indication information to the target data storage node.

Step 102: The target data storage node receives the first data information and the first step indication information sent by the first data storage node.

Step 103: The target data storage node performs a storage process on the first data based on the first data information, the first step indication information, and the stored topology information of the target storage process, to obtain second data.

The performing the processing of the first data in the storage process means that the first operation step is performed based on the first data. For example, when the first operation step is the query step, the query processing may be performed according to the first data, where the second data is Other data related to the first data obtained for the query.

The topology information of the target storage process is used to indicate execution logic inside the target storage process, specifically for indicating a plurality of operation steps included in the target storage process and an execution sequence of the multiple operation steps, that is, the target The topology information of the stored procedure may include multiple operation steps included in the target storage process, and the multiple operation steps are multiple operation steps that have been arranged in the order of execution.

In an embodiment, the topology information of the target stored procedure may include a plurality of topology nodes arranged in an order of execution, the plurality of topology nodes being in one-to-one correspondence with the plurality of operation steps, that is, each topology node is used to indicate the One of the plurality of operation steps, and the plurality of topology nodes may also be connected by an arrow, and the arrow connection is used to indicate the execution order of the multiple topology nodes. For example, if the target storage process includes three operation steps, referring to FIG. 1G, the topology information of the target storage process may include three topology nodes, namely, a topology node 1, a topology node 2, and a topology node 3, and the three topologies. The nodes are connected by an arrow as shown in Fig. 1G. The topology node 1 is used to indicate the first operation step of the three operation steps included in the target storage, the topology node 2 is used to indicate the second operation step of the three operation steps, and the topology node 3 is used to indicate 3 The third of the steps in the operation.

Further, the topology information of the target storage process may also be used to indicate an input step and an output step, and the execution sequence of the input step is before the plurality of operation steps, and the execution order of the plurality of output steps is in the multiple operation steps after that. That is, the topology information of the target stored procedure may include a plurality of execution steps including an input step arranged in order of execution, a plurality of operation steps and an output step included in the target storage process.

The input step is used to input input data of the target storage process, specifically for acquiring input data during the process of calling the target storage process; and the output step is for outputting the output data of the target storage process, specifically for The data processing result of the last one of the plurality of data storage nodes executing the target stored procedure is output as output data.

In an embodiment, the topology information of the target stored procedure may include a plurality of topology nodes arranged in an order of execution, the plurality of topology nodes corresponding to the plurality of execution steps. The first topology node of the multiple topology nodes is used to indicate an input step, and the last topology node is used to indicate an output step, and a topology node between the first topology node and the last topology node is used to indicate the target The storage process includes multiple operational steps. Moreover, the plurality of topology nodes may also be connected by an arrow, and the arrow connection is used to indicate the execution order of the multiple topology nodes.

For example, if the target storage process includes three operation steps, referring to FIG. 1H, the topology information of the target storage process may include five topology nodes, namely, a topology node 1, a topology node 2, a topology node 3, a topology node 4, and a topology. Node 5, and the five topological nodes are connected by an arrow as shown in FIG. 1H. The topology node 1 is used to indicate an input step, and the topology node 2, the topology node 3, and the topology node 4 are respectively used to indicate the first operation step, the second operation step, and the third of the three operation steps included in the target storage. In the three steps, the topology node 5 is used to indicate the output step.

The target data storage node stores topology information of the target storage process, and the topology information of the target storage process may be obtained by topologically compiling the target storage process when the target data storage node receives the uploaded target storage process. After the other data storage nodes in the distributed database receive the uploaded target storage process, the topology of the target storage process is topologically compiled to obtain the topology information of the target storage process, and then the topology information of the target storage process is sent to the target storage process. The target data storage node sends it.

That is, when any data storage node in the distributed database receives the uploaded target storage process, the target storage process may be topologically compiled, the topology information of the target storage process is obtained, and then the target storage is performed. The topology information of the process is sent to other data storage nodes in the distributed database other than the data storage node.

For example, when the target data storage node receives the uploaded target storage process, the target storage process may be topologically compiled to obtain topology information of the target storage process, and then the topology information of the target storage process is sent to the distributed A data storage node in the database other than the target data storage node.

Specifically, any data storage node may perform topology compilation on the stored procedure through the configured topology manager, and store the topology information of the obtained stored procedure through the topology manager, and then perform the data to the other data storage nodes through the topology manager. send.

Specifically, the target data storage node performs topology compilation on the target storage process, and the topology information of the target storage process may include the following two implementation manners:

The first implementation manner is: performing decomposition processing on the target storage process to obtain a plurality of operation steps, and determining topology information of the target storage process based on the multiple operation steps and the execution order of the multiple operation steps.

Specifically, the target stored procedure may be decomposed in units of a single SQL statement to obtain a plurality of SQL statements, and the plurality of SQL statements are the plurality of operation steps.

The second implementation manner is: performing decomposition processing on the target storage process, obtaining a plurality of operation steps, and then adding an input step before the plurality of operation steps according to an execution order of the plurality of operation steps, and in the multiple operation steps Then increase the output step to get the topology information of the target stored procedure.

The input step is for inputting input data of the target stored procedure, and the outputting step is for outputting output data of the target stored procedure.

Further, when any data storage node in the distributed database detects an update operation and a delete operation on the target storage process, the topology information of the stored target storage process may also be updated or deleted, and may indicate other The data storage node updates or deletes the topology information of the stored target stored procedure.

Specifically, any data storage node may respond to a stored procedure upload, update, or delete operation through the configured topology manager. In practical applications, the user can perform upload, update, and delete operations on the stored procedure in the management system of any data storage node.

In an embodiment, when any data storage node in the distributed database receives an update instruction to the target stored procedure, the updated target storage process may also be topologically compiled to obtain an updated target storage process. The topology information is then replaced with the topology information of the updated target stored procedure and the topology information of the stored target stored procedure. Moreover, the data storage node may further send topology information and an update instruction of the updated target storage process to other data storage nodes, so that other data storage nodes update the topology information of the stored target storage process, that is, The topology information of the updated target stored procedure is replaced with the topology information of the stored target stored procedure.

In another embodiment, when any data storage node in the distributed database receives the delete instruction for the target stored procedure, the topology information of the stored target stored procedure may also be deleted, and then the other data storage node is Sending a deletion instruction of the topology information of the target stored procedure, so that other data storage nodes delete the topology information of the stored target storage process.

In the embodiment of the present invention, the first data is processed according to the first data information, the first step indication information, and the stored topology information of the target storage process, and the obtaining the second data may include the following steps. 1)-3):

1) determining the first data based on the first data information.

Specifically, determining, according to the first data information, the first data may include the following three implementation manners:

The first implementation manner: when the first data information is the first data, the first data information may be directly determined as the first data.

The second implementation manner is: when the first data information is the indication information of the first data, and the indication information of the first intermediate information is used to indicate the first data, the first The indication information of the intermediate information is converted into the first data.

For example, when the indication information of the first data is a hash value of the first data, the hash value of the first data may be converted into the first data according to a preset hash algorithm.

The third implementation manner is: when the first data information is the indication information of the first data, and the indication information of the first intermediate information is used to indicate the storage location of the first data, the indication of the first data may be based Information, the first data is obtained from data stored by the target data storage node.

2) determining, according to the first step indication information and the topology information of the target storage process, a first operation step that the target data storage node needs to perform.

The first step indication information is used to indicate that the first operation step that the target data storage node needs to perform is in a plurality of operation steps included in the target storage process, that is, to indicate that the first operation step is the plurality of a plurality of operation steps in the operation step, and the topology information of the target storage process is used to indicate a plurality of operation steps included in the target storage process and an execution order of the plurality of operation steps, and therefore, the information is indicated based on the first step And the topology information of the target stored procedure, that is, the first operation step that the target data storage node needs to perform.

For example, assume that the target storage process package has three operation steps, which are respectively an operation step 1, an operation step 2, and an operation step 3 arranged in the order of execution, when the first step indication information is used for the target data storage node to be executed. An operation step is a second operation step of the plurality of operation steps included in the target storage process, and the target data storage node may determine, according to the topology information of the target storage process, a plurality of operation steps included in the target storage process. The second operation step is the operation step 2, and then the operation step 2 is determined as the first operation step that the target data storage node needs to perform.

3) Performing the first operation step based on the first data to obtain the second data.

Specifically, the SQL statement corresponding to the first operation step may be executed based on the first data to obtain the second data.

For example, if the first data is the person name Z3, and the semantics of the SQL statement corresponding to the first operation step is the name of the father who queries a certain person name, the target data storage node can query from the stored data (such as a list). Z3's father's name, get Z2, Z2 is the second data.

Further, since the target data storage node may store topology information of the plurality of stored procedures, in order to enable the target data storage node to determine the first operation based on the topology information of the stored procedure after receiving the first step indication information, Step, the first data storage node sends the first data information and the first step indication information to the target data storage node, and sends the identifier of the called target storage process to the target data storage node, and correspondingly, the identifier The target data storage node also needs to send an identification to the second data storage node to the target stored procedure. That is, the identity of the target stored procedure needs to be passed between the various data storage nodes.

Further, in order to determine which storage process is specifically executed, the target storage node may further receive an identifier of the target storage process sent by the first data storage node, and acquire topology information of the target storage process based on the identifier of the target storage process; and/or, The identity of the target stored procedure is sent to the second data storage node.

The identifier of the target stored procedure is used to uniquely identify the target stored procedure. That is, when each data storage node in the distributed database stores topology information of multiple stored procedures, each of the data storage nodes for executing the target storage process may also transmit an identifier of the target stored procedure to indicate The next data storage node is currently executing that stored procedure.

Specifically, the target data storage node may acquire topology information of the target storage process from the stored topology information of the plurality of storage processes based on the identifier of the target storage process.

In a specific embodiment, the target data storage node may receive the target sent by the first data storage node before processing the first data based on the first data information, the first step indication information, and the stored topology information of the target storage process. The identifier of the stored procedure is then determined based on the identifier of the target stored procedure, and the topology information of the target stored procedure is determined from the stored topology information of the at least one stored procedure. And after processing the first data based on the first data information, the first step indication information, and the stored topology information of the target storage process, sending the identifier of the target storage process to the second data storage node, so that the second data is The storage node acquires topology information of the target stored procedure based on the identifier of the received target storage process.

Further, the first target data storage node may send the identifier of the target stored procedure to the second data storage node while sending the second data information and the second step indication information to the second data storage node.

Step 104: The target data storage node determines the second data storage node and the second step indication information based on the second data, the stored partition information, and the first step indication information.

The partition information is used to indicate a storage location of the data, and the second step indication information is used to indicate a location of the second operation step that the second data storage node needs to perform in the multiple operation steps included in the target storage process, That is, it is used to indicate which of the plurality of operation steps is the second operation step.

Specifically, determining, according to the second data, the stored partition information, and the first step indication information, that the second data storage node and the second step indication information comprise the following steps 1)-2):

1) determining the second step indication information based on the first step indication information and the topology information of the target stored procedure.

Specifically, the location of the first operation step that the target data storage node needs to perform in the multiple operation steps included in the target storage process may be determined based on the first step indication information and the topology information of the target storage process; The first operation step determines the second step indication information at a next position of the position of the plurality of operation steps included in the target storage process.

Wherein, when the first operation step information is used to indicate that the first operation step is another operation step other than the last operation step among the plurality of operation steps included in the target storage process, the second step indication information is used for Determining a next operation step of the first operation step; when the first operation step information is used to indicate that the first operation step is the last one of the plurality of operation steps included in the target storage process, the second step The indication information is used to indicate an output step for instructing output of the received data as output data of the target stored procedure.

2) determining, based on the second data and the stored partition information, the second data storage node for processing the second data.

Specifically, the data storage node pre-stored with the second data may be determined from the distributed database based on the second data and the stored partition information; and then the data storage node pre-stored with the second data may be determined as The second data storage node.

Further, the target data storage node may further perform determining, according to the second data and the stored partition information, when determining that the second step indication information is used to indicate the next operation step of the first operation step. The step of the second data storage node of the second data. And determining that the second step indication information is used to indicate the output step, the target data storage node may further determine the output node as the second data storage node, so that the second data storage node uses the second data as the target The output data of the stored procedure is output.

In order to facilitate the target data storage node to determine the output node, the input node may also send the identifier of the output node to the data while sending the data information to the intermediate node for executing the target storage node, and each intermediate node is in the second data storage. While the node sends the data, the identifier of the output node may also be sent to the second data storage node, so that when any intermediate node determines that the second data storage node is used to perform the output step, the output node is determined based on the identifier of the output node, and the output is determined. The node is determined to be the second data storage node.

The output node may be set by a distributed database, or may be set by a user. The output node and the input node may be the same data storage node, or may be different data storage nodes, which is not limited in this embodiment of the present invention. . For example, the data storage node that receives the call request of the target stored procedure, ie, the input node, may be set as the output node by default.

Step 105: The target data storage node sends the second data information and the second step indication information to the second data storage node.

The second data information is the second data or the indication information of the second data, and the indication information of the second data may be used to indicate the second data, or a storage location of the second data, according to the second data. The indication information can acquire the second data.

In an actual application, in order to increase the transmission rate, when the data amount of the second data is large, the second data may be converted into the indication information of the second data with a smaller amount of data, and the data amount of the second data is used. When it is small, the second data can be sent directly.

In the embodiment of the present invention, when the second data storage node is also an intermediate node for executing the target storage process, the execution logic of the second data storage node is the same as the execution logic of the target storage node, that is, it may be directly based on The second data information processes the second data to obtain third data, and then determines a lower second data storage node capable of processing the third data, and transmits the third data information to the lower second data storage node.

Specifically, the second data storage node may receive the second data information and the second step indication information sent by the target storage node; and based on the second data information, the second step indication information, and the stored topology information of the target storage process. Performing a storage process on the second data to obtain third data; determining, according to the third data, the stored partition information, and the second step indication information, a second data storage node for processing the third data and The third step indicates information, and sends the third data information and the third step indication information to the third data storage node. The third data storage node is a next data storage node of the second data storage node arranged in the execution order of the target storage process, and the third step indication information is used to indicate a third operation that the third data storage node needs to perform. The step is at a position in the plurality of operation steps included in the target storage process, and the third data information is indication information of the second data or the second data.

The second data storage node may perform a storage process on the first data according to the first data information, the first step indication information, and the stored topology information of the target storage process. The second data method, based on the second data information, the second step indication information, and the stored topology information of the target storage process, performs a storage process on the second data to obtain a third data, and the specific implementation process may refer to step 103. The related description is not repeated here.

The second data storage node may determine, according to the second data storage node and the second step indication information, based on the second data, the stored partition information, and the first step indication information, according to the method. The third data, the stored partition information, and the second step indication information are used to determine the third data storage node and the third step indication information for processing the third data. For the specific implementation process, refer to the related description of step 104. Let me repeat.

That is, any data storage node for executing the target storage process may directly process the data into a data processing result according to the processing logic of the target data storage node, and then determine the processing result for processing the data. The next data storage node sends the data processing result information to the next data storage node, and is processed by the next data storage node without being transmitted back to the storage node manager, thereby avoiding round-trip transmission of data and reducing The transmission of data is consumed.

Further, when the second data storage node is an output node, the second step indication information is used to indicate an output operation, and the second data storage node may determine the second data based on the second data information, and then the second The data is output as output data of the target stored procedure. For example, the second data may be sent as output data of the target stored procedure to a client that initiates a call request of the target stored procedure for feedback to the user through the client.

In the embodiment of the present invention, any one of the plurality of data storage nodes for executing the target storage process in the distributed database may receive the first data information sent by the previous data storage node, and then store the target data based on the storage. The topology information of the process directly processes the first data to obtain the second data, determines the next data storage node for processing the second data based on the second data and the stored partition information, and finally sends the second data information to the next one. The data storage node causes the next data storage node to process the second data based on the stored topology information of the target storage process. That is, each data storage node can directly process the data by itself, then determine the next data storage node, and directly send the data processing result to the next data storage node without returning the data processing result. Give the data storage node manager, which greatly reduces the amount of data transmission and improves the execution efficiency and running performance of the stored procedure.

Next, in conjunction with the system architecture diagram shown in FIG. 2A, the data storage node for executing the target storage process includes two data storage nodes, which are respectively a fourth data storage node and a fifth data storage node, for implementing the present invention. The execution method of the stored procedure provided by the example is described in detail, wherein the execution logic of the fourth data storage node and the fifth data storage node is the same as the execution logic of the target data storage node in the embodiment shown in FIG. 1F above.

FIG. 2B is a flowchart of another method for executing a stored procedure according to an embodiment of the present invention. The method is applied to the system architecture shown in FIG. 2A, and the method includes the following steps:

Step 201: The input node receives the call request of the target stored procedure, invokes the target stored procedure according to the call request of the target stored procedure, and acquires the input data during the process of calling the target stored procedure.

The input data is input data carried by the call request of the target stored procedure.

That is, the input node may perform an input step based on a call request of the target stored procedure, and the input step refers to acquiring input data during the process of calling the target stored procedure.

Step 202: The input node determines a fourth data storage node for processing the input data based on the input data and the stored partition information.

The four data storage nodes refer to the next data storage node of the input node arranged in the order of execution of the target storage process.

Step 203: The input node sends the first data information and the first step indication information to the fourth data storage node.

The first data information is indication information of the first data or the first data, and the first data is input data.

The first step indication information is determined by the input node based on the call request of the target storage process, that is, the input node may determine, according to the invocation request of the received target storage procedure, that the next data storage node is configured to execute the The target storage process includes a first one of the plurality of operation steps, and determines the first step indication information based on the first operation step. That is, the first step indication information is used to indicate the first one of the plurality of operation steps included in the target storage process.

Step 204: The fourth data storage node performs a storage process on the first data to obtain the second data, based on the first data information, the first step indication information, and the stored topology information of the target storage process.

The fourth data storage node may perform, according to the first data, a first one of the plurality of operation steps included in the target storage process to obtain the second data.

Step 205: The fourth data storage node determines, according to the second data, the stored partition information, and the first step indication information, a fifth data storage node for processing the second data, and determines the second step indication information.

The fifth data storage node refers to the next data storage node of the fourth data storage node arranged in the order of execution of the target storage process. The second step indication information is used to indicate a second one of the plurality of operation steps included in the target storage process.

Step 206: The fourth data storage node sends the second data information and the second step indication information to the fifth data storage node.

Step 207: The fifth data storage node performs a storage process on the second data based on the second data information, the second step indication information, and the stored topology information of the target storage process to obtain the third data.

The fifth data storage node may perform a second one of the plurality of operation steps included in the target storage process based on the second data to obtain the third data.

Step 208: The fifth data storage node determines, according to the second step indication information, that the next data storage node for processing the second data is an output node, and determines the third step indication information based on the outputting step.

When the second step indication information is used to indicate the last one of the plurality of operation steps included in the target storage process, determining that the next data storage node for processing the second data is an output node, and The output node is used to indicate the output step.

Step 209: The fifth data storage node sends the third data information and the third step indication information to the output node.

The third data information is indication information of the third data or the third data, and the third step indication information is used to indicate an output step.

Step 210: The output node outputs the third data as output data of the target storage process based on the third data information and the third step indication information.

It should be noted that, in the embodiment of the present invention, only the data storage node that performs the target storage process includes two data storage nodes as an example, and in actual applications, the data storage node that executes the target storage process may further include more data. The storage node, and each of the data storage nodes can be executed according to the execution logic of the target data storage node shown in FIG. 1F, and details are not described herein again.

In the embodiment of the present invention, each data storage node for executing a stored procedure may directly process the data by itself, then determine the next data storage node, and directly send the data processing result to the next data storage node, and The data processing result is no longer transmitted back to the data storage node manager, thereby greatly reducing the data transmission amount and improving the execution efficiency and running performance of the storage process.

FIG. 3 is a schematic diagram of an execution flow of another storage process according to an embodiment of the present invention. As shown in FIG. 3, the distributed database 100 includes at least a data storage node A, a data storage node B, a data storage node C, and a data storage node M, and each data storage node can be connected to each other through a network.

In addition, it is assumed that each data storage node in the distributed database shown in FIG. 3 stores different data lists, and each data list is used to store the name of the person, the name of the corresponding father, and the age of the father. That is, different data lists can be partitioned by data according to the name of the person, and thus stored in different storage nodes. For example, as shown in FIG. 3, the data storage node A stores the data list 1, the data storage node B stores the data list 2, and the data storage node C stores the data list 3. Moreover, the data storage node M stores partition information for indicating a storage location of the data, that is, a data storage node that can indicate a different person name.

In addition, it is assumed that each data storage node in the distributed database shown in FIG. 3 stores topology information of the storage process S, and the topology information of the storage process S is as shown in FIG. 1H, wherein the topology node 1 is used to indicate an input step. The topology node 2, the topology node 3, and the topology node 4 are used for the operation step S1, the operation step S2, and the operation step S3, respectively, and the topology node 5 is used to indicate the output step.

If the execution method of the stored procedure according to the embodiment of the present invention executes the stored procedure S, as shown in FIG. 3, if the data storage node M receives the call request of the stored procedure S, the input data carried by the call request of the stored procedure S is Z3, that is, @name is Z3, the execution process of the stored procedure S may include the following steps 1)-5):

1) The data storage node M calls the stored procedure S according to the call request of the stored procedure S, and performs an input step, which is also acquired after the process of calling the stored procedure S. Then, based on Z3 and the stored partition information, the data storage node A storing Z3 is determined, and Z3 and the first step indication information are transmitted to the data storage node A.

The first step indication information is used to indicate the first operation step of the multiple operation steps included in the storage process S, for example, may be a value of 1.

2) The data storage node A determines, based on the Z3, the first step indication information and the stored topology information of the stored procedure S, that the data storage node A is used to perform the operation step S1, and performs the operation step S1 based on Z3, that is, from the stored data list. The name of the father who queries Z3 in 1 is Z2. Then, the data storage node A determines the data storage node B storing Z2 based on the Z2 and the stored partition information, determines the second step indication information based on the first step indication information, and transmits the Z2 and the second step indication information to the data storage node. B.

The second step indication information is used to indicate a second one of the plurality of operation steps included in the storage process S, for example, may be a value of 2.

3) The data storage node B determines, based on the Z2, the second step indication information and the stored topology information of the stored procedure S, that the data storage node B is used to perform the operation step S2, and performs the operation step S2 based on Z2, that is, from the stored data list. The name of the father who queried Z2 in 2 is Z1. Then, the data storage node B determines the data storage node C storing Z1 based on the Z1 and the stored partition information, determines the third step indication information based on the second step indication information, and transmits the Z1 and the third step indication information to the data storage node. C.

The third step indication information is used to indicate a third operation step of the plurality of operation steps included in the storage process S, for example, may be a value of 3.

4) The data storage node C determines, based on the Z1, the third step indication information and the stored topology information of the stored procedure S, that the data storage node C is used to perform the operation step S3, and performs the operation step S3 based on Z3, that is, from the stored data list. The age of the father who queried Z1 in 3 was 85. Then, the data storage node C determines, based on the third step indication information, that the next data storage node is the output node, that is, the data storage node M, and determines the fourth step indication information based on the output step, and sends the 85 and the fourth step indication information to the data. Storage node M.

The fourth step indication information is used to indicate an output step, for example, may be a string out.

5) After receiving the 85 and the fourth step indication information, the data storage node M outputs 85 as the output data of the stored procedure S based on the fourth step indication information.

As can be seen from FIG. 3, compared with FIG. 1C, each of the data storage node A, the data storage node B, and the data storage node C can directly process the data by itself, and then determine the next data storage node, and The data processing result is directly sent to the next data storage node, and the data processing result is not transmitted back to the data storage node M, and the data interaction of each round trip is reduced to a single data transmission, thereby greatly reducing the data. The amount of transmission reduces the data transmission consumption and improves the execution efficiency and running performance of the stored procedure.

4A is a schematic structural diagram of an apparatus for executing a storage process according to an embodiment of the present invention, where the apparatus is applied to a target data storage node in a distributed database; the distributed database includes a plurality of data storage nodes for performing target storage. The first data storage node, the target data storage node, and the second data storage node of the plurality of data storage nodes are three data storage nodes that sequentially execute the target storage process. Referring to Figure 4A, the apparatus includes:

The receiving module 401 is configured to perform the operations performed by step 102 in the foregoing embodiment of FIG. 1F;

The processing module 402 is configured to perform the operations performed by step 103 in the foregoing embodiment of FIG. 1F;

a determining module 403, configured to perform the operations performed by step 104 in the foregoing embodiment of FIG. 1F;

The sending module 404 is configured to perform the operations performed by step 105 in the foregoing embodiment of FIG. 1F.

Optionally, when the first data storage node is an input node, the first data is input data acquired by the input node based on a call request of the target storage process, and the input node receives the target in the distributed database. a data storage node that invokes a request for a stored procedure;

When the first data storage node is an intermediate node, the first data is a data processing result obtained by the intermediate node processing the data based on the received data information, where the intermediate node is used to execute the target storage process. A data storage node of any of the operational steps.

Optionally, referring to FIG. 4B, the processing module 402 includes:

The first determining unit 4021 is configured to determine the first data based on the first data information;

The second determining unit 4022 is configured to determine, according to the first step indication information and the topology information of the target storage process, a first operation step that the target data storage node needs to perform;

The executing unit 4023 is configured to perform the first computing step based on the first data to obtain the second data.

Optionally, the first determining unit 4021 is configured to:

When the first data information is the indication information of the first data, the first data is obtained from the data stored by the target data storage node based on the indication information of the first data.

Optionally, the receiving module 401 is further configured to receive an identifier of the target storage process sent by the first data storage node, where the processing module 402 is further configured to acquire the target storage process based on the identifier of the target storage process. Topology information; and/or

The sending module 404 is further configured to send the identifier of the target stored procedure to the second data storage node.

Optionally, when the first step indication information is used to indicate that the first operation step is another operation step other than the last operation step among the plurality of operation steps included in the target storage process, the next location is used for Instructing the next operational step of the first operational step;

When the first step indication information is used to indicate that the first operation step is the last one of the plurality of operation steps included in the target storage process, the next position is used to indicate an output step, the output step is used to indicate The received data is output as output data of the target stored procedure.

Optionally, referring to FIG. 4C, the determining module 403 includes:

a third determining unit 4031, configured to determine, according to the second data and the stored partition information, a data storage node that stores the second data in advance from the distributed database;

The fourth determining unit 4032 is configured to determine, as the second data storage node, a data storage node that stores the second data in advance.

It should be noted that the execution device of the stored procedure provided by the foregoing embodiment is only illustrated by the division of each functional module described above when executing the stored procedure. In an actual application, the foregoing function may be allocated by different functional modules according to requirements. Upon completion, the internal structure of the device is divided into different functional modules to perform all or part of the functions described above. In addition, the execution device of the stored procedure provided by the foregoing embodiment is the same as the embodiment of the method for executing the stored procedure, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present invention are generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transfer to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital versatile disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)). )Wait.

In another embodiment, there is also provided an apparatus for executing a stored procedure, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processing The apparatus is configured to perform the execution method of the stored procedure described in any of the above-described embodiments of FIG. 1F or FIG. 2B.

In another embodiment, there is also provided a computer readable storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform any of the above-described embodiments of FIG. 1F or FIG. 2B The execution method of the stored procedure as described in the example.

In another embodiment, there is also provided a computer program product comprising instructions which, when executed on a computer, cause the computer to perform the method of performing the stored procedure described in any of the above-described embodiments of FIG. 1F or FIG. 2B.

A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

The above description of the embodiments of the present application is not intended to limit the application, and any modifications, equivalents, improvements, etc. made within the spirit and principles of the present application are included in the scope of the present application. Inside.

Claims

A method for executing a stored procedure, characterized in that it is applied to a distributed database; the distributed database includes a plurality of data storage nodes for executing a target storage process, and a first data storage of the plurality of data storage nodes The node, the target data storage node, and the second data storage node are three data storage nodes that sequentially execute the target storage process; the method includes:

Receiving, by the target data storage node, first data information and first step indication information sent by the first data storage node, where the first data information is first data or indication information of the first data, where a data is data required when the target stored procedure is executed, the first step information is used to indicate a plurality of operation steps included in the target storage process by the first operation step that the target data storage node needs to perform Position in

The target data storage node performs a storage process on the first data to obtain a second data based on the first data information, the first step indication information, and the stored topology information of the target storage process. The topology information of the target storage process is used to indicate a plurality of operation steps included in the target storage process and an execution sequence of the multiple operation steps;

The target data storage node determines the second data storage node and the second step indication information based on the second data, the partition information, and the first step indication information, and sends the second data storage node to the second data storage node. Two data information and the second step indication information, the partition information is used to indicate a storage location of the data, and the second step indication information is used to indicate that the second data storage node needs to perform a second operation step in the The location in the plurality of operation steps included in the target storage process, the second data information being indication information of the second data or the second data.
The method of claim 1 wherein

When the first data storage node is an input node, the first data is input data acquired by the input node based on a call request of the target storage process, and the input node is received in the distributed database. a data storage node of the call request of the target stored procedure;

When the first data storage node is an intermediate node, the first data is a data processing result obtained by the intermediate node performing a storage process on the data based on the received data information, where the intermediate node is used to execute the A data storage node of any of the plurality of operational steps included in the target stored procedure.
The method according to claim 1 or 2, wherein the target data storage node is based on the first data information, the first step indication information, and the stored topology information of the target storage process. The first data is processed by the stored procedure to obtain the second data, including:

Determining, by the target data storage node, the first data based on the first data information;

Determining, by the target data storage node, a first operation step that the target data storage node needs to perform, based on the first step indication information and topology information of the target storage process;

The target data storage node performs the first operation step based on the first data to obtain the second data.
The method of claim 3, wherein the determining, by the target data storage node, the first data based on the first data information comprises:

When the first data information is the indication information of the first data, the target data storage node acquires the first data from the stored data based on the indication information of the first data.
The method of any of claims 1-4, wherein the method further comprises:

Receiving, by the target data storage node, an identifier of the target storage process sent by the first data storage node,

Acquiring topology information of the target storage process based on the identifier of the target storage process; and/or, the target data storage node sending the identifier of the target storage process to the second data storage node.
A method according to any of claims 1-5, wherein

When the first step indication information is used to indicate that the first operation step is another operation step other than the last operation step among the plurality of operation steps included in the target storage process, the second step indication information a next operational step for indicating the first operational step;

When the first step indication information is used to indicate that the first operation step is the last one of the plurality of operation steps included in the target storage process, the second step indication information is used to indicate an output step, The outputting step is for indicating that the received data is output as output data of the target storage process.
The method according to any one of claims 1 to 6, wherein the target data storage node determines the second data storage node and based on the second data, the partition information, and the first step indication information. The second step indicates information, including:

Determining, by the target data storage node, a data storage node pre-stored with the second data from the distributed database based on the second data and partition information;

The target data storage node determines a data storage node in which the second data is stored in advance as the second data storage node.
An execution device for a stored procedure, characterized by being applied to a target data storage node in a distributed database; the distributed database includes a plurality of data storage nodes for executing a target storage process, the plurality of data storage nodes The first data storage node, the target data storage node, and the second data storage node are three data storage nodes that sequentially execute the target storage process; the device includes:

a receiving module, configured to receive first data information and first step indication information sent by the first data storage node, where the first data information is first data or indication information of the first data, where the first data is Data required when the target stored procedure is executed, the first step information is used to indicate a position of the first operation step that the target data storage node needs to perform in a plurality of operation steps included in the target storage process ;

a processing module, configured to perform a storage process on the first data to obtain second data, according to the first data information, the first step indication information, and the stored topology information of the target storage process, where The topology information of the target storage process is used to indicate a plurality of operation steps included in the target storage process and an execution sequence of the multiple operation steps;

a determining module, configured to determine, according to the second data, the stored partition information, and the first step indication information, a second data storage node and second step indication information, where the partition information is used to indicate a storage location of the data, The second step indication information is used to indicate a location of the second operation step that the second data storage node needs to perform in a plurality of operation steps included in the target storage process;

And a sending module, configured to send second data information and the second step indication information to the second data storage node, where the second data information is indication information of the second data or the second data.
The device of claim 8 wherein:

When the first data storage node is an input node, the first data is input data acquired by the input node based on a call request of the target storage process, and the input node is received in the distributed database. a data storage node of the call request of the target stored procedure;

When the first data storage node is an intermediate node, the first data is a data processing result obtained by the intermediate node processing the data based on the received data information, where the intermediate node is used to execute the target A data storage node of any of the plurality of operational steps included in the stored procedure.
The device according to claim 8 or 9, wherein the processing module comprises:

a first determining unit, configured to determine the first data based on the first data information;

a second determining unit, configured to determine, according to the first step indication information and topology information of the target storage process, a first operation step that the target data storage node needs to perform;

And an execution unit, configured to perform the first operation step based on the first data to obtain the second data.
The apparatus according to any one of claims 8 to 10, wherein the first determining unit is configured to:

And when the first data information is the indication information of the first data, acquiring the first data from the data stored by the target data storage node based on the indication information of the first data.
A device according to any of claims 8-11, wherein

The receiving module is further configured to receive an identifier of the target storage process sent by the first data storage node;

The processing module is further configured to acquire topology information of the target storage process based on the identifier of the target storage process; and/or

The sending module is further configured to send the identifier of the target stored procedure to the second data storage node.
A device according to any of claims 8-12, wherein

When the first step indication information is used to indicate that the first operation step is another operation step other than the last operation step among the plurality of operation steps included in the target storage process, the next location is used for Instructing a next operational step of the first operational step;

When the first step indication information is used to indicate that the first operation step is the last one of the plurality of operation steps included in the target storage process, the next position is used to indicate an output step, The outputting step is for instructing to output the received data as output data of the target stored procedure.
The device of any of claims 8-13, wherein the determining module comprises:

a third determining unit, configured to determine, according to the second data and the stored partition information, a data storage node that stores the second data in advance from the distributed database;

And a fourth determining unit, configured to determine, as the second data storage node, a data storage node that stores the second data in advance.
An execution device for a stored procedure, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor is configured to perform the claims 1-7 The steps of any of the methods described.
A computer readable storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1-7.