WO2019062156A1 - Storage procedure executing method and device, and storage medium - Google Patents

Storage procedure executing method and device, and storage medium Download PDF

Info

Publication number
WO2019062156A1
WO2019062156A1 PCT/CN2018/087384 CN2018087384W WO2019062156A1 WO 2019062156 A1 WO2019062156 A1 WO 2019062156A1 CN 2018087384 W CN2018087384 W CN 2018087384W WO 2019062156 A1 WO2019062156 A1 WO 2019062156A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
storage node
data storage
information
Prior art date
Application number
PCT/CN2018/087384
Other languages
French (fr)
Chinese (zh)
Inventor
李旭良
单卫华
董阳
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019062156A1 publication Critical patent/WO2019062156A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the present application relates to the field of big data technologies, and in particular, to a method, an apparatus, and a storage medium for executing a stored procedure.
  • Distributed database refers to a network of geographically dispersed data storage nodes connected to form a logically centralized database.
  • the massive data partitions can be stored in each data storage node, which solves the problem of database scalability.
  • a distributed database can complete the corresponding business by calling and executing a stored procedure.
  • the stored procedure is a callable object stored in the database, which is essentially a set of SQL (Structured Query Language) statements that can implement specific functions.
  • a collection that is, a stored procedure, includes multiple arithmetic steps, which can be called by storing a procedure name and input data.
  • a stored procedure to be called if the data required for execution of the stored procedure is stored in a plurality of data storage nodes, the execution of the stored procedure must be performed by multiple data storage nodes. The interaction is complete.
  • a distributed database usually performs a stored procedure in a master-slave control manner, that is, when a data storage node receives a call request of a stored procedure, if the data storage node has a stored procedure management function, the data
  • the storage node can be used as a storage node manager (Master) to perform unified management and scheduling on multiple data storage nodes that execute the stored procedure.
  • the storage node manager may determine, according to the first data and the stored partition information, a first data storage node for processing the first data, where the first data is input data carried by the call request of the stored procedure, and the partition information A storage location for indicating data, the first data storage node being a data storage node storing the first data.
  • the storage node manager sends the first data and the first operation step of the stored procedure to the first data storage node, to instruct the first data storage node to perform the first operation step based on the first data, to obtain the first Two data.
  • the first data storage node needs to pass back the second data to the storage node manager, so that the storage node manager continues to determine the second data storage node for processing the second data based on the stored partition information, and
  • the second data and the second operation step of the stored procedure are sent to the second data storage node to instruct the second data storage node to perform a second operation step based on the second data to obtain the third data.
  • the second data storage node continues to pass the third data back to the storage node manager, so that the storage node manager repeatedly performs the step of calling the next data storage node until the last data of the execution of the stored procedure is invoked.
  • the storage node is configured such that the last data storage node performs the last operation step of the stored procedure to obtain the last data, and returns the last data as output data to the storage node manager.
  • each data storage node that executes the stored procedure needs to receive data from the storage node manager, and also needs to return the processing result of the data to the data storage node, so the storage node manager and each data storage node There will be a large amount of data transfer between them, which affects the execution efficiency and running performance of the stored procedure.
  • the present application provides a storage method execution method and device.
  • Storage medium The technical solution is as follows:
  • a method for executing a stored procedure is provided, which is applied to a distributed database;
  • the distributed database includes a plurality of data storage nodes for executing a target storage process, and the first of the plurality of data storage nodes
  • the data storage node, the target data storage node, and the second data storage node are three data storage nodes that sequentially execute the target storage process; the method includes:
  • the target data storage node Receiving, by the target data storage node, first data information and first step indication information sent by the first data storage node, where the first data information is first data or indication information of the first data, where a data is data required when the target stored procedure is executed, the first step information is used to indicate a plurality of operation steps included in the target storage process by the first operation step that the target data storage node needs to perform Position in
  • the target data storage node performs a storage process on the first data to obtain a second data based on the first data information, the first step indication information, and the stored topology information of the target storage process.
  • the topology information of the target storage process is used to indicate a plurality of operation steps included in the target storage process and an execution sequence of the multiple operation steps;
  • the target data storage node determines the second data storage node and the second step indication information based on the second data, the partition information, and the first step indication information, and sends the second data storage node to the second data storage node.
  • Two data information and the second step indication information, the partition information is used to indicate a storage location of the data
  • the second step indication information is used to indicate that the second data storage node needs to perform a second operation step in the The location in the plurality of operation steps included in the target storage process, the second data information being indication information of the second data or the second data.
  • the target data storage node is any one of the plurality of data storage nodes for executing the target storage process, and the first data storage node is the previous data of the target data storage node arranged according to the execution order of the target storage process.
  • the storage node, the second data storage node is a next data storage node of the target data storage node arranged in the order of execution of the target storage process.
  • each data storage node can directly process the received data by itself, then determine the next data storage node, and directly send the data processing result to the next data storage node without using the data again.
  • the processing result is passed back to the data storage node manager, thereby greatly reducing the data transmission amount and improving the execution efficiency and running performance of the storage process.
  • the first data storage node when the first data storage node is an input node, the first data is input data acquired by the input node based on a call request of the target storage process, and the input node is the distribution a data storage node in the database that receives the call request of the target stored procedure;
  • the first data storage node is an intermediate node
  • the first data is a data processing result obtained by the intermediate node processing the data based on the received data information, where the intermediate node is used to execute the target A data storage node of any of the plurality of operational steps included in the stored procedure.
  • the first data may be initial input data sent by the input node, that is, data input by the user when the target storage process is invoked, or intermediate data sent by the intermediate node, that is, included in the execution target storage process.
  • the target data storage node may be the next data storage node of the input node or the next data storage node of any intermediate node.
  • the target data storage node performs a storage process on the first data based on the first data information, the first step indication information, and the stored topology information of the target storage process, to obtain
  • the second data includes:
  • the target data storage node performs the first operation step based on the first data to obtain the second data.
  • the topology information of the target storage process is used to indicate execution logic inside the target storage process, and is specifically used to indicate multiple operation steps included in the target storage process and an execution sequence of the multiple operation steps.
  • the target data storage node since the target data storage node stores the topology information of the target storage process, in the process in which the target storage process is executed, the target data storage node may directly directly follow the step indication information sent by the previous data storage node and The topology information of the target stored procedure stored by itself determines the operation steps that need to be performed by itself, and may send step indication information to the next data storage node to indicate the operation steps that the next data storage node needs to perform.
  • multiple data storage nodes can sequentially execute the target storage process, and can sequentially generate the generated intermediate data, thereby greatly reducing the data transmission amount and improving the execution of the storage process. Efficiency and operational performance.
  • the target data storage node determines the first data based on the first data information, including:
  • the target data storage node acquires the first data from the stored data based on the indication information of the first data.
  • the first data information may be index information of the first data or the like.
  • the first data storage node may convert the first data into the indication information of the first data, and send the indication information of the first data to the target data storage node. , thereby reducing the amount of data transmission and improving the execution efficiency of the stored procedure.
  • the method further includes:
  • the target data storage node sends an identifier of the target stored procedure to the second data storage node.
  • each data storage node for executing the target storage process may also transmit an identifier of the target stored procedure to indicate the next data.
  • the storage node is currently executing the stored procedure, and obtains the topology information of the target stored procedure from the stored topology information of the plurality of stored procedures according to the identifier of the delivered target stored procedure, as a basis for determining the execution of the operation step, thereby improving Execute the accuracy of the stored procedure.
  • the first step indication information is used to indicate that the first operation step is another operation step other than the last operation step among the plurality of operation steps included in the target storage process
  • the second step indication information is used to indicate the next operation step of the first operation step
  • the second step indication information is used to indicate an output step
  • the outputting step is for indicating that the received data is output as output data of the target storage process.
  • the target data storage node determines the second data storage node and the second step indication information, based on the second data, the partition information, and the first step indication information, including:
  • the target data storage node determines a data storage node in which the second data is stored in advance as the second data storage node.
  • the target data storage node since the target data storage node stores the partition information, in the process in which the target storage process is executed, the target data storage node may directly determine the pre-storage according to the processed second data and the stored partition information. a data storage node having the second data, and determining a data storage node in which the second data is stored in advance as a second data storage node for processing the second data.
  • the processed intermediate data does not need to be sent to the storage node manager, and the storage node manager schedules the next data storage node, and the target data storage node itself can determine the next data storage node according to the stored partition information, and the intermediate data is It is sent to the next data storage node for processing, which greatly reduces the amount of data transmission and improves the execution efficiency and running performance of the stored procedure.
  • the determining, by the second intermediate data, the stored partition information, and the first step indication information, the next data storage node and the second step indication information including:
  • the second step indication information including:
  • the target data storage node determines the second step indication information based on a position of the first operation step at a position in a plurality of operation steps included in the target storage process.
  • the method further includes:
  • the target data storage node When the target data storage node receives the uploaded target storage process, performing topology compilation on the target storage process to obtain topology information of the target storage process;
  • the target data storage node sends topology information of the target stored procedure to other data storage nodes in the distributed database except the target data storage node.
  • the target storage process when any data storage node in the distributed database receives the uploaded target storage process, the target storage process may be topologically compiled, the topology information of the target storage process is obtained, and then the target is obtained.
  • the topology information of the stored procedure is sent to other data storage nodes in the distributed database except the data storage node.
  • any data storage node in the distributed database pre-stores the topology information of the target stored procedure, and when any data storage node needs to execute the target storage process, the topology information of the stored procedure according to the stored target is obtained. Determining the operational steps that need to be performed ensures that the sequential execution of the individual data storage nodes is performed without the scheduling of the storage node manager.
  • the target data storage node performs topology compilation on the target storage process to obtain topology information of the target storage process, including:
  • the target data storage node performs decomposition processing on the target storage process to obtain a plurality of operation steps, and determines topology information of the target storage process based on the execution order of the plurality of operation steps and the plurality of operation steps; or ,
  • an execution apparatus of a stored procedure having a function of implementing an execution method behavior of the stored procedure in the first aspect described above.
  • the execution device of the stored procedure includes at least one module for implementing the execution method of the stored procedure provided by the above first aspect.
  • an apparatus for executing a stored procedure comprising a processor and a memory, the memory for storing an execution apparatus supporting the stored procedure to perform the foregoing first aspect
  • a program storing a method of executing a process, and data related to storing an execution method for implementing the stored procedure provided by the first aspect described above.
  • the processor is configured to execute a program stored in the memory.
  • the operating device of the storage device may further include a communication bus for establishing a connection between the processor and the memory.
  • a computer readable storage medium having stored therein instructions that, when run on a computer, cause the computer to perform the method of executing the stored procedure described in the first aspect above.
  • a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of executing the stored procedure described in the first aspect above.
  • any one of the plurality of data storage nodes in the distributed database for executing the target storage process may receive the first data information sent by the first data storage node, and then based on the stored target storage process
  • the topology information directly processes the first data to obtain second data, determines a next storage node for processing the second data based on the second data and the stored partition information, and finally sends the second data information to the second data storage node.
  • the next database stores the second data based on the topology information of the stored target stored procedure. That is, each data storage node can directly process the data by itself, and then determine the second data storage node, and directly send the data processing result to the second data storage node, without returning the data processing result.
  • the data storage node manager which greatly reduces the amount of data transmission and improves the execution efficiency and running performance of the stored procedure.
  • 1A is a system architecture diagram of a distributed database 100
  • FIG. 1B is a structural diagram of an execution system of a storage process according to an embodiment of the present invention.
  • 1C is a schematic flowchart of an execution process of a storage process provided by the related art
  • FIG. 1D is a schematic diagram showing the logical structure of a data storage node 10 according to an embodiment of the present invention.
  • 1E is a schematic structural diagram of hardware of a data storage node 10 according to an embodiment of the present invention.
  • 1F is a flowchart of a method for executing a stored procedure according to an embodiment of the present invention
  • 1G is a schematic diagram of topology information of a storage process according to an embodiment of the present invention.
  • 1H is a schematic diagram of topology information of another storage process according to an embodiment of the present invention.
  • FIG. 2A is a structural diagram of an execution system of a storage process according to an embodiment of the present invention.
  • 2B is a flowchart of another method for executing a stored procedure according to an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of an execution process of another storage process according to an embodiment of the present invention.
  • FIG. 4A is a schematic structural diagram of an apparatus for executing a storage process according to an embodiment of the present invention.
  • FIG. 4B is a schematic structural diagram of a processing module 402 according to an embodiment of the present invention.
  • FIG. 4C is a schematic structural diagram of a determining module 403 according to an embodiment of the present invention.
  • the embodiment of the present invention is applied to a scenario in which a related service is processed by using a distributed database
  • the service may be a query service, a comparison service, or the like.
  • the distributed database can be used to query the age of a person's great-grandfather and query the financial status of a person based on the data stored in each data storage node in the distributed database.
  • it is usually necessary to complete the business by calling and executing a stored procedure.
  • FIG. 1A is a system architecture diagram of a distributed database 100. As shown in FIG. 1A, the distributed database 100 includes a plurality of physically dispersed data storage nodes 10, which may be connected by a network.
  • Each of the data storage nodes 10 has its own local database for storing data, and after being connected to each other through a network, a global logically centralized and physically distributed large database, that is, a distributed database, can be formed.
  • each data storage node 10 may be a node or a server capable of storing data.
  • the basic idea of a distributed database is to distribute the data in the original centralized database to multiple data storage nodes connected through the network to obtain larger storage capacity and higher concurrency.
  • the data partitioning technology can be used to store massive data fragments into storage nodes of the distributed database.
  • the sub-database sub-table technology refers to storing the data of the large table into each storage node according to the established partitioning strategy, or dividing the large table into the business sub-tables with smaller data amounts, and each sub-table Store to each storage node according to the established partitioning strategy.
  • FIG. 1B is a structural diagram of an execution system of a storage process according to an embodiment of the present invention.
  • the system architecture includes a client 200 and a distributed database 100.
  • the client 200 and the distributed database 100 can pass through. Internet connection.
  • the client 200 may send a call request of the stored procedure to the distributed database 100, and then a certain data storage node 10 in the distributed database 100 may receive the call request of the stored procedure, and according to the stored procedure.
  • the call request invokes the stored procedure and then interacts with other data storage nodes 10 to execute the stored procedure.
  • the data storage node 10 that receives the call request of the stored procedure may be specified by the client 200, or may be specified by the distributed database 100 according to the preset service processing logic, which is not limited by the embodiment of the present invention.
  • FIG. 1B is only an example in which the client 200 is an entity other than the distributed database 100.
  • the client 200 may also be any data storage node 10 in the distributed database 100. . That is, when any data storage node 10 obtains a call request of a stored procedure triggered by a user or a preset condition, the stored procedure can be called according to the call request of the stored procedure, and then interact with other data storage nodes 10. Execute the stored procedure.
  • FIG. 1A and FIG. 1B only take the distributed database 100 as including three data storage nodes 10 as an example, and those skilled in the art can understand the number of data storage nodes 10 shown in FIG. 1A and FIG. 1B.
  • the definition of the distributed database 100 is not limited. In the actual application, the distributed database 100 may include more or less data storage nodes 10 than illustrated, which is not limited by the embodiment of the present invention.
  • FIG. 1C is a schematic diagram of an execution flow of a storage process provided by the related art.
  • the distributed database 100 includes at least a data storage node A, a data storage node B, a data storage node C, and a data storage node M, and each data storage node may be connected to each other through a network.
  • the distributed database 100 is to be executed by the stored procedure S, and the semantics of the stored procedure S is “if the name of the great-grandfather has the name of the great-grandfather”, the stored procedure S includes three operation steps, which are respectively the operation step S1: Query the name of the father of @name, operation step S2: query the name of the grandfather of @name according to the father of @name, operation step S3: query the age of the great-grandfather of @name according to the name of the grandfather of @name.
  • @name is the data to be input of the stored procedure S, which can be any name.
  • each data storage node in the distributed database shown in FIG. 1C stores a different data list, and each data list is used to store the name of the person, the name of the corresponding father, and the age of the father. That is, different data lists can be partitioned by data according to the name of the person, and thus stored in different storage nodes. For example, as shown in FIG. 1C, the data storage node A stores the data list 1, the data storage node B stores the data list 2, and the data storage node C stores the data list 3. Moreover, the data storage node M stores partition information for indicating a storage location of the data, that is, a storage node that can indicate a different person name.
  • the data storage node M can serve as the storage node manager (Master).
  • Master the storage node manager
  • the input data carried by the call request of the stored procedure S is Z3, that is, @name is Z3, and the execution flow of the stored procedure S may include the following steps 1)-7):
  • the data storage node M calls the stored procedure S according to the call request of the stored procedure S, and determines the operation steps and execution order included in the stored procedure S. Then, the stored procedure node M determines the data storage node A storing Z3 based on Z3 and the stored partition information, and transmits the Z3 sum operation step S1 to the data storage node A.
  • the data storage node A performs the operation step S1 based on Z3, that is, the name of the father who queries Z3 from the stored data list 1 is Z2, and then passes back Z2 to the data storage process node M.
  • the data storage node M determines the data storage node B storing Z2 based on Z2 and the stored partition information, and transmits the Z2 and operation step S2 to the data storage node B.
  • the data storage node B performs the operation step S2 based on Z2, that is, the name of the father of the query Z2 from the stored data list 2 is Z1, and then the Z1 is transmitted back to the data storage process node M.
  • the data storage node M determines the data storage node C in which Z1 is stored based on Z1 and the stored partition information, and transmits the Z1 and operation step S3 to the data storage node C.
  • the data storage node C performs an operation step S3 based on Z1, that is, the age of the father of the query Z1 from the stored data list 3 is 85, and then 85 is transmitted back to the data storage node M.
  • each of the data storage node A, the data storage node B, and the data storage node C needs to receive data from the data storage node M, and needs to return the processing result of the data back to the data storage.
  • the node M that is, the data storage node executing the stored procedure and the storage node manager have a large amount of data transmission, and, because the storage node manager needs to summarize the data, and then perform the scheduling of each operation step of the storage process, Therefore, each data storage node that executes the stored procedure needs to wait.
  • the master-slave control mode adopted is accompanied by data aggregation, data storage node waiting, and a large amount of data transmission in the process of executing the stored procedure, thereby greatly affecting the execution efficiency and running performance of the stored procedure.
  • the plurality of data storage nodes included in the distributed database 100 are used to execute a target storage process, and the plurality of data storage nodes
  • the first data storage node, the target data storage node, and the second data storage node are three data storage nodes that sequentially execute the target storage process, wherein the target data storage node may be among a plurality of data storage nodes for executing the target storage process.
  • the first data storage node is a previous data storage node of the target data storage node arranged in the execution order of the target storage process
  • the second data storage node is the target data arranged according to the execution order of the target storage process
  • the target data storage node receives the first data information and the first step indication information sent by the first data storage node, where the first data information is the first data or the indication information of the first data, where the first data is the target a data required when the stored procedure is executed, the first step information is used to indicate a position of the first operation step that the target data storage node needs to perform in the plurality of operation steps included in the target storage process;
  • the target data storage node processes the first data based on the first data information, the first step indication information, and the stored topology information of the target storage process to obtain second data, and the topology information of the target storage process And a plurality of operation steps included in the target storage process and an execution sequence of the multiple operation steps;
  • the target data storage node determines the second data storage node and the second step indication information based on the second data, the stored partition information, and the first step indication information, and sends the second data information to the second data storage node.
  • the second step indication information the partition information is used to indicate a storage location of the data
  • the second step indication information is used to indicate that the second operation step that the second data storage node needs to perform is included in the target storage process. a location in the computing step, the second data information being the second data or the indication information of the second data.
  • each data storage node executing the stored procedure can directly process the data by itself, then determine the second data storage node, and directly send the data processing result to the second data storage node without using the data.
  • the processing result is passed back to the data storage node manager, thereby greatly reducing the data transmission amount and improving the execution efficiency and running performance of the storage process.
  • stream processing modules can be configured in each data storage node 10 in the distributed database 100, and partition information is stored at each data storage node 10 such that each data storage node 10 is configured.
  • the flow processing module and the stored partition information implement the execution method of the stored procedure provided by the embodiment of the present invention.
  • FIG. 1D is a schematic diagram showing the logical structure of a data storage node 10 according to an embodiment of the present invention.
  • the data storage node 10 includes a stream processing module 11 and partition information 12.
  • the stream processing module 11 is for executing a stored procedure
  • the partition information 12 is for indicating a storage location of the data.
  • the partition information 12 may be in the form of a list or a partition policy.
  • the stream processing module 11 includes a topology manager 11a and a path planning module 11b.
  • the topology manager 11a is configured to store topology information of at least one stored procedure, and the topology information of each stored procedure is used to indicate a plurality of operation steps and an execution sequence of the plurality of operation steps included in the storage process. Further, the topology manager 11a may be further configured to perform topology compilation on the uploaded target storage process, obtain topology information of the target storage process, and instruct the data storage node 10 to send other data of the topology information of the target storage process. Storage node 10.
  • the path planning module 11b is configured to perform partition scheduling and arithmetic operations of the stored procedure.
  • Partition scheduling refers to determining the next storage node for processing the data based on the partition information of the data and the storage.
  • the arithmetic operation refers to determining an operation step that the data storage node 10 needs to perform, and performs the operation step based on the data. Specifically, the operation step that needs to be performed may be determined based on the step indication information and the stored topology information of the stored procedure.
  • FIG. 1E is a schematic diagram showing the hardware structure of a data storage node 10 according to an embodiment of the present invention.
  • the data storage node 10 includes a processor 13, a communication bus 14, a memory 15, and at least one communication interface 16. It will be understood by those skilled in the art that the structure of the data storage node 10 shown in FIG. 1E does not constitute a limitation on the data storage node 10. In practical applications, the data storage node 10 may include more or fewer components than illustrated. The embodiment of the present invention does not limit this, or combines some components, or different component arrangements.
  • the processor 13 can be a general purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the execution of the program of the present application. integrated circuit.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • Communication bus 14 may include a path for communicating information between the components described above.
  • the memory 15 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (RAM), or other information that can store information and instructions.
  • ROM read-only memory
  • RAM random access memory
  • Type of dynamic storage device or Electro Scientific Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disc storage, optical disc Storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or capable of carrying or storing desired program code in the form of instructions or data structures and capable of being Any other medium accessed by the computer, but is not limited thereto.
  • EEPROM Electro Scientific Erasable Programmable Read-Only Memory
  • CD-ROM Compact Disc Read-Only Memory
  • optical disc Storage including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.
  • the memory 15 can exist independently and is coupled to the processor 13 via a communication bus 14.
  • the memory 15 can also be integrated with the processor 13.
  • the memory 15 may be used to store data, such as may be used to store partition information, topology information of a stored procedure, or information sent by a first data storage node, and the like, and the memory 15 may also be used for storage.
  • One or more running programs and/or modules that execute the method of executing the stored procedure provided by the embodiments of the present invention.
  • the communication interface 16 uses devices such as any transceiver for communicating with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Networks (WLAN), and the like.
  • devices such as any transceiver for communicating with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Networks (WLAN), and the like.
  • RAN Radio Access Network
  • WLAN Wireless Local Area Networks
  • processor 13 may include one or more CPUs, such as CPU0 and CPU1 shown in Figure 1C.
  • the UE may further include an output device 17 and an input device 18.
  • the output device 17 communicates with the processor 13, and the information can be displayed in various ways.
  • the output device 17 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector. Wait.
  • LCD liquid crystal display
  • LED light emitting diode
  • CRT cathode ray tube
  • input device 18 is in communication with the processor 13, the user's input can be received in a variety of ways.
  • input device 18 can be a keyboard, a touch screen device, or a sensing device, and the like.
  • the data storage node 10 described above may be a terminal or other node having a data storage function.
  • the data storage node 10 can be a mobile phone, a portable computer, a network server, a personal digital assistant (PDA), a tablet computer, a user equipment (UE), a communication device, or an embedded device.
  • PDA personal digital assistant
  • UE user equipment
  • the embodiment of the invention does not limit the type of data storage node 10.
  • the memory 15 is used to store program code for executing the solution of the present application, and is controlled by the processor 13 for execution.
  • the processor 13 is operative to execute program code stored in the memory 15.
  • the data storage node 10 shown in FIG. 1E can implement the methods described in the following embodiments of FIGS. 1F and 2B through the processor 13 and program code in the memory 15.
  • FIG. 1F is a flowchart of a method for executing a stored procedure according to an embodiment of the present invention.
  • the method is applied to the distributed database of FIG. 1A or FIG. 1B, and the distributed database includes multiple data storage nodes for executing.
  • the target storage process, the first data storage node, the target data storage node, and the second data storage node of the plurality of data storage nodes are three data storage nodes that sequentially execute the target storage process.
  • the method includes the following steps:
  • Step 101 The first data storage node sends the first data information and the first step indication information to the target data storage node.
  • the target data storage node may be used to execute any one of the plurality of data storage nodes of the target storage process, and the first data storage node refers to the previous one of the target data storage nodes arranged according to the execution order of the target storage process.
  • the data storage node, the second data storage node refers to the next data storage node of the target data storage node arranged in the order of execution of the target storage process.
  • the target stored procedure can be any stored procedure called by the distributed database.
  • the stored procedure in the embodiment of the present invention is not a process of storing data, but a callable object stored in a database, which is similar to a callable function, and may be stored in a practical application. Name and input data are called.
  • a stored procedure is essentially a set of SQL statements capable of implementing a specific function, that is, the stored procedure includes a plurality of operation steps, that is, a corresponding set of SQL statements, each of which is one of the corresponding SQL statement sets. SQL statement. Specifically, each of the operation steps may be used for performing the query processing, and may be used for other processing, which is not limited by the embodiment of the present invention.
  • the first data information is the first data or the indication information of the first data, and the first data is data required when the target data storage node executes the first operation step included in the target data storage node.
  • the indication information of the first data may be used to indicate a storage location of the first data or the first data, and the first data may be acquired according to the indication information of the first data.
  • the first step information is used to indicate a location of the first operation step that the target data storage node needs to perform in the multiple operation steps included in the target storage process, that is, to indicate that the target data storage node needs to be executed.
  • the first operation step is a plurality of operation steps of the plurality of operation steps included in the target storage process, and the plurality of operation steps are a plurality of operation steps that have been arranged in the execution order.
  • the first step indication information may be a numerical value.
  • the first step indication information when the first step information is used to indicate the first operation step of the plurality of operation steps included in the target storage process, the first step indication information may be a value of 1.
  • the first step indication information when the first step information is used to indicate the second one of the plurality of operation steps included in the target storage process, the first step indication information may be a value of 2 or the like.
  • the first data storage node may be an input node or an intermediate node.
  • the input node refers to a data storage node in the distributed database that receives a call request of a target stored procedure
  • the intermediate node refers to a data storage node used to execute any one of a plurality of operation steps included in the target storage process, Specifically, it is a data storage node for performing the previous operation step of the first operation step performed by the target data storage node.
  • the manner and meaning of the first data and the first step indication information are different, and specifically include the following two types:
  • the first data storage node when the first data storage node is an input node, the first data is input data acquired by the input node based on a call request of the target storage process, and the first step indication information is that the input node is based on the target storage process
  • the call request determination is obtained, and the first step indication information is used to indicate a first one of the plurality of operation steps included in the target storage process.
  • the calling request of the target stored procedure is used to invoke the target stored procedure, and may carry the identifier and input data of the target stored procedure.
  • the identifier of the target stored procedure is used to uniquely identify the target stored procedure, and may be a name or a number of the target stored procedure.
  • the input data refers to an input parameter used to invoke the target stored procedure, and specifically may be data input for the target stored procedure when the user initiates a service.
  • the calling request of the target stored procedure may be triggered by the user through the client, and the client may be an entity outside the distributed database, or may be any data storage node in the distributed database, and the present invention is implemented. This example does not limit this.
  • the data storage node can serve as an input node.
  • the input node when receiving the call request of the target stored procedure, may invoke the target stored procedure according to the identifier and the input data of the target stored procedure carried by the target stored procedure call request, and may perform an input step, where the input step is Refers to the input data obtained during the process of calling the target stored procedure.
  • the input node may use the input data as the first data, and determine, according to the first data and the stored partition information, a second data storage node, that is, a target data storage, for processing the first data. The node then sends the first data information and the first step indication information to the target data storage node.
  • the first data storage node when the first data storage node is an intermediate node, the first data is a data processing result obtained by the intermediate node processing the data based on the received data information, where the intermediate node is used to perform the target storage
  • the data storage node of any of the plurality of operational steps included in the process is specifically a data storage node for performing the previous operational step of the first operational step performed by the target data storage node.
  • the first step indication information is determined by the first data storage node based on the received step indication information, and the first step indication information is used to indicate the next operation step of the operation step performed by the first data storage node. .
  • the storing process processing on the data refers to performing an operation step that needs to be performed currently based on the data, for example, performing query processing according to the data to query other data related to the data.
  • the first data storage node and the target data storage node are processed in the same manner, and may be processed according to the target data storage node to obtain the first data information. And the first step indication information, and then transmitting the first data information and the first step indication information to the target data storage node.
  • Step 102 The target data storage node receives the first data information and the first step indication information sent by the first data storage node.
  • Step 103 The target data storage node performs a storage process on the first data based on the first data information, the first step indication information, and the stored topology information of the target storage process, to obtain second data.
  • the performing the processing of the first data in the storage process means that the first operation step is performed based on the first data.
  • the query processing may be performed according to the first data, where the second data is Other data related to the first data obtained for the query.
  • the topology information of the target storage process is used to indicate execution logic inside the target storage process, specifically for indicating a plurality of operation steps included in the target storage process and an execution sequence of the multiple operation steps, that is, the target
  • the topology information of the stored procedure may include multiple operation steps included in the target storage process, and the multiple operation steps are multiple operation steps that have been arranged in the order of execution.
  • the topology information of the target stored procedure may include a plurality of topology nodes arranged in an order of execution, the plurality of topology nodes being in one-to-one correspondence with the plurality of operation steps, that is, each topology node is used to indicate the One of the plurality of operation steps, and the plurality of topology nodes may also be connected by an arrow, and the arrow connection is used to indicate the execution order of the multiple topology nodes.
  • the topology information of the target storage process may include three topology nodes, namely, a topology node 1, a topology node 2, and a topology node 3, and the three topologies.
  • the nodes are connected by an arrow as shown in Fig. 1G.
  • the topology node 1 is used to indicate the first operation step of the three operation steps included in the target storage
  • the topology node 2 is used to indicate the second operation step of the three operation steps
  • the topology node 3 is used to indicate 3 The third of the steps in the operation.
  • the topology information of the target storage process may also be used to indicate an input step and an output step, and the execution sequence of the input step is before the plurality of operation steps, and the execution order of the plurality of output steps is in the multiple operation steps after that. That is, the topology information of the target stored procedure may include a plurality of execution steps including an input step arranged in order of execution, a plurality of operation steps and an output step included in the target storage process.
  • the input step is used to input input data of the target storage process, specifically for acquiring input data during the process of calling the target storage process; and the output step is for outputting the output data of the target storage process, specifically for The data processing result of the last one of the plurality of data storage nodes executing the target stored procedure is output as output data.
  • the topology information of the target stored procedure may include a plurality of topology nodes arranged in an order of execution, the plurality of topology nodes corresponding to the plurality of execution steps.
  • the first topology node of the multiple topology nodes is used to indicate an input step
  • the last topology node is used to indicate an output step
  • a topology node between the first topology node and the last topology node is used to indicate the target
  • the storage process includes multiple operational steps.
  • the plurality of topology nodes may also be connected by an arrow, and the arrow connection is used to indicate the execution order of the multiple topology nodes.
  • the topology information of the target storage process may include five topology nodes, namely, a topology node 1, a topology node 2, a topology node 3, a topology node 4, and a topology. Node 5, and the five topological nodes are connected by an arrow as shown in FIG. 1H.
  • the topology node 1 is used to indicate an input step
  • the topology node 2, the topology node 3, and the topology node 4 are respectively used to indicate the first operation step, the second operation step, and the third of the three operation steps included in the target storage.
  • the topology node 5 is used to indicate the output step.
  • the target data storage node stores topology information of the target storage process, and the topology information of the target storage process may be obtained by topologically compiling the target storage process when the target data storage node receives the uploaded target storage process.
  • the topology of the target storage process is topologically compiled to obtain the topology information of the target storage process, and then the topology information of the target storage process is sent to the target storage process.
  • the target data storage node sends it.
  • the target storage process may be topologically compiled, the topology information of the target storage process is obtained, and then the target storage is performed.
  • the topology information of the process is sent to other data storage nodes in the distributed database other than the data storage node.
  • the target storage process may be topologically compiled to obtain topology information of the target storage process, and then the topology information of the target storage process is sent to the distributed A data storage node in the database other than the target data storage node.
  • any data storage node may perform topology compilation on the stored procedure through the configured topology manager, and store the topology information of the obtained stored procedure through the topology manager, and then perform the data to the other data storage nodes through the topology manager. send.
  • the target data storage node performs topology compilation on the target storage process
  • the topology information of the target storage process may include the following two implementation manners:
  • the first implementation manner is: performing decomposition processing on the target storage process to obtain a plurality of operation steps, and determining topology information of the target storage process based on the multiple operation steps and the execution order of the multiple operation steps.
  • the target stored procedure may be decomposed in units of a single SQL statement to obtain a plurality of SQL statements, and the plurality of SQL statements are the plurality of operation steps.
  • the second implementation manner is: performing decomposition processing on the target storage process, obtaining a plurality of operation steps, and then adding an input step before the plurality of operation steps according to an execution order of the plurality of operation steps, and in the multiple operation steps Then increase the output step to get the topology information of the target stored procedure.
  • the input step is for inputting input data of the target stored procedure
  • the outputting step is for outputting output data of the target stored procedure.
  • the topology information of the stored target storage process may also be updated or deleted, and may indicate other The data storage node updates or deletes the topology information of the stored target stored procedure.
  • any data storage node may respond to a stored procedure upload, update, or delete operation through the configured topology manager.
  • the user can perform upload, update, and delete operations on the stored procedure in the management system of any data storage node.
  • the updated target storage process may also be topologically compiled to obtain an updated target storage process.
  • the topology information is then replaced with the topology information of the updated target stored procedure and the topology information of the stored target stored procedure.
  • the data storage node may further send topology information and an update instruction of the updated target storage process to other data storage nodes, so that other data storage nodes update the topology information of the stored target storage process, that is, The topology information of the updated target stored procedure is replaced with the topology information of the stored target stored procedure.
  • the topology information of the stored target stored procedure may also be deleted, and then the other data storage node is Sending a deletion instruction of the topology information of the target stored procedure, so that other data storage nodes delete the topology information of the stored target storage process.
  • the first data is processed according to the first data information, the first step indication information, and the stored topology information of the target storage process, and the obtaining the second data may include the following steps. 1)-3):
  • determining, according to the first data information, the first data may include the following three implementation manners:
  • the first implementation manner when the first data information is the first data, the first data information may be directly determined as the first data.
  • the second implementation manner is: when the first data information is the indication information of the first data, and the indication information of the first intermediate information is used to indicate the first data, the first The indication information of the intermediate information is converted into the first data.
  • the hash value of the first data may be converted into the first data according to a preset hash algorithm.
  • the third implementation manner is: when the first data information is the indication information of the first data, and the indication information of the first intermediate information is used to indicate the storage location of the first data, the indication of the first data may be based Information, the first data is obtained from data stored by the target data storage node.
  • the first step indication information is used to indicate that the first operation step that the target data storage node needs to perform is in a plurality of operation steps included in the target storage process, that is, to indicate that the first operation step is the plurality of a plurality of operation steps in the operation step, and the topology information of the target storage process is used to indicate a plurality of operation steps included in the target storage process and an execution order of the plurality of operation steps, and therefore, the information is indicated based on the first step And the topology information of the target stored procedure, that is, the first operation step that the target data storage node needs to perform.
  • the target storage process package has three operation steps, which are respectively an operation step 1, an operation step 2, and an operation step 3 arranged in the order of execution, when the first step indication information is used for the target data storage node to be executed.
  • An operation step is a second operation step of the plurality of operation steps included in the target storage process, and the target data storage node may determine, according to the topology information of the target storage process, a plurality of operation steps included in the target storage process.
  • the second operation step is the operation step 2, and then the operation step 2 is determined as the first operation step that the target data storage node needs to perform.
  • the SQL statement corresponding to the first operation step may be executed based on the first data to obtain the second data.
  • the target data storage node can query from the stored data (such as a list). Z3's father's name, get Z2, Z2 is the second data.
  • the target data storage node may store topology information of the plurality of stored procedures, in order to enable the target data storage node to determine the first operation based on the topology information of the stored procedure after receiving the first step indication information, Step, the first data storage node sends the first data information and the first step indication information to the target data storage node, and sends the identifier of the called target storage process to the target data storage node, and correspondingly, the identifier
  • the target data storage node also needs to send an identification to the second data storage node to the target stored procedure. That is, the identity of the target stored procedure needs to be passed between the various data storage nodes.
  • the target storage node may further receive an identifier of the target storage process sent by the first data storage node, and acquire topology information of the target storage process based on the identifier of the target storage process; and/or, The identity of the target stored procedure is sent to the second data storage node.
  • the identifier of the target stored procedure is used to uniquely identify the target stored procedure. That is, when each data storage node in the distributed database stores topology information of multiple stored procedures, each of the data storage nodes for executing the target storage process may also transmit an identifier of the target stored procedure to indicate The next data storage node is currently executing that stored procedure.
  • the target data storage node may acquire topology information of the target storage process from the stored topology information of the plurality of storage processes based on the identifier of the target storage process.
  • the target data storage node may receive the target sent by the first data storage node before processing the first data based on the first data information, the first step indication information, and the stored topology information of the target storage process.
  • the identifier of the stored procedure is then determined based on the identifier of the target stored procedure, and the topology information of the target stored procedure is determined from the stored topology information of the at least one stored procedure.
  • sending the identifier of the target storage process to the second data storage node so that the second data is
  • the storage node acquires topology information of the target stored procedure based on the identifier of the received target storage process.
  • the first target data storage node may send the identifier of the target stored procedure to the second data storage node while sending the second data information and the second step indication information to the second data storage node.
  • Step 104 The target data storage node determines the second data storage node and the second step indication information based on the second data, the stored partition information, and the first step indication information.
  • the partition information is used to indicate a storage location of the data
  • the second step indication information is used to indicate a location of the second operation step that the second data storage node needs to perform in the multiple operation steps included in the target storage process, That is, it is used to indicate which of the plurality of operation steps is the second operation step.
  • the location of the first operation step that the target data storage node needs to perform in the multiple operation steps included in the target storage process may be determined based on the first step indication information and the topology information of the target storage process;
  • the first operation step determines the second step indication information at a next position of the position of the plurality of operation steps included in the target storage process.
  • the second step indication information is used for Determining a next operation step of the first operation step; when the first operation step information is used to indicate that the first operation step is the last one of the plurality of operation steps included in the target storage process, the second step The indication information is used to indicate an output step for instructing output of the received data as output data of the target stored procedure.
  • the data storage node pre-stored with the second data may be determined from the distributed database based on the second data and the stored partition information; and then the data storage node pre-stored with the second data may be determined as The second data storage node.
  • the target data storage node may further perform determining, according to the second data and the stored partition information, when determining that the second step indication information is used to indicate the next operation step of the first operation step.
  • the target data storage node may further determine the output node as the second data storage node, so that the second data storage node uses the second data as the target The output data of the stored procedure is output.
  • the input node may also send the identifier of the output node to the data while sending the data information to the intermediate node for executing the target storage node, and each intermediate node is in the second data storage. While the node sends the data, the identifier of the output node may also be sent to the second data storage node, so that when any intermediate node determines that the second data storage node is used to perform the output step, the output node is determined based on the identifier of the output node, and the output is determined. The node is determined to be the second data storage node.
  • the output node may be set by a distributed database, or may be set by a user.
  • the output node and the input node may be the same data storage node, or may be different data storage nodes, which is not limited in this embodiment of the present invention.
  • the data storage node that receives the call request of the target stored procedure, ie, the input node may be set as the output node by default.
  • Step 105 The target data storage node sends the second data information and the second step indication information to the second data storage node.
  • the second data information is the second data or the indication information of the second data, and the indication information of the second data may be used to indicate the second data, or a storage location of the second data, according to the second data.
  • the indication information can acquire the second data.
  • the second data in order to increase the transmission rate, when the data amount of the second data is large, the second data may be converted into the indication information of the second data with a smaller amount of data, and the data amount of the second data is used. When it is small, the second data can be sent directly.
  • the execution logic of the second data storage node is the same as the execution logic of the target storage node, that is, it may be directly based on
  • the second data information processes the second data to obtain third data, and then determines a lower second data storage node capable of processing the third data, and transmits the third data information to the lower second data storage node.
  • the second data storage node may receive the second data information and the second step indication information sent by the target storage node; and based on the second data information, the second step indication information, and the stored topology information of the target storage process.
  • Performing a storage process on the second data to obtain third data determining, according to the third data, the stored partition information, and the second step indication information, a second data storage node for processing the third data and
  • the third step indicates information, and sends the third data information and the third step indication information to the third data storage node.
  • the third data storage node is a next data storage node of the second data storage node arranged in the execution order of the target storage process, and the third step indication information is used to indicate a third operation that the third data storage node needs to perform.
  • the step is at a position in the plurality of operation steps included in the target storage process, and the third data information is indication information of the second data or the second data.
  • the second data storage node may perform a storage process on the first data according to the first data information, the first step indication information, and the stored topology information of the target storage process.
  • the second data method based on the second data information, the second step indication information, and the stored topology information of the target storage process, performs a storage process on the second data to obtain a third data, and the specific implementation process may refer to step 103. The related description is not repeated here.
  • the second data storage node may determine, according to the second data storage node and the second step indication information, based on the second data, the stored partition information, and the first step indication information, according to the method.
  • the third data, the stored partition information, and the second step indication information are used to determine the third data storage node and the third step indication information for processing the third data.
  • step 104 Let me repeat.
  • any data storage node for executing the target storage process may directly process the data into a data processing result according to the processing logic of the target data storage node, and then determine the processing result for processing the data.
  • the next data storage node sends the data processing result information to the next data storage node, and is processed by the next data storage node without being transmitted back to the storage node manager, thereby avoiding round-trip transmission of data and reducing The transmission of data is consumed.
  • the second step indication information is used to indicate an output operation, and the second data storage node may determine the second data based on the second data information, and then the second The data is output as output data of the target stored procedure.
  • the second data may be sent as output data of the target stored procedure to a client that initiates a call request of the target stored procedure for feedback to the user through the client.
  • any one of the plurality of data storage nodes for executing the target storage process in the distributed database may receive the first data information sent by the previous data storage node, and then store the target data based on the storage.
  • the topology information of the process directly processes the first data to obtain the second data, determines the next data storage node for processing the second data based on the second data and the stored partition information, and finally sends the second data information to the next one.
  • the data storage node causes the next data storage node to process the second data based on the stored topology information of the target storage process. That is, each data storage node can directly process the data by itself, then determine the next data storage node, and directly send the data processing result to the next data storage node without returning the data processing result.
  • the data storage node manager which greatly reduces the amount of data transmission and improves the execution efficiency and running performance of the stored procedure.
  • the data storage node for executing the target storage process includes two data storage nodes, which are respectively a fourth data storage node and a fifth data storage node, for implementing the present invention.
  • the execution method of the stored procedure provided by the example is described in detail, wherein the execution logic of the fourth data storage node and the fifth data storage node is the same as the execution logic of the target data storage node in the embodiment shown in FIG. 1F above.
  • FIG. 2B is a flowchart of another method for executing a stored procedure according to an embodiment of the present invention. The method is applied to the system architecture shown in FIG. 2A, and the method includes the following steps:
  • Step 201 The input node receives the call request of the target stored procedure, invokes the target stored procedure according to the call request of the target stored procedure, and acquires the input data during the process of calling the target stored procedure.
  • the input data is input data carried by the call request of the target stored procedure.
  • the input node may perform an input step based on a call request of the target stored procedure, and the input step refers to acquiring input data during the process of calling the target stored procedure.
  • Step 202 The input node determines a fourth data storage node for processing the input data based on the input data and the stored partition information.
  • the four data storage nodes refer to the next data storage node of the input node arranged in the order of execution of the target storage process.
  • Step 203 The input node sends the first data information and the first step indication information to the fourth data storage node.
  • the first data information is indication information of the first data or the first data, and the first data is input data.
  • the first step indication information is determined by the input node based on the call request of the target storage process, that is, the input node may determine, according to the invocation request of the received target storage procedure, that the next data storage node is configured to execute the
  • the target storage process includes a first one of the plurality of operation steps, and determines the first step indication information based on the first operation step. That is, the first step indication information is used to indicate the first one of the plurality of operation steps included in the target storage process.
  • Step 204 The fourth data storage node performs a storage process on the first data to obtain the second data, based on the first data information, the first step indication information, and the stored topology information of the target storage process.
  • the fourth data storage node may perform, according to the first data, a first one of the plurality of operation steps included in the target storage process to obtain the second data.
  • Step 205 The fourth data storage node determines, according to the second data, the stored partition information, and the first step indication information, a fifth data storage node for processing the second data, and determines the second step indication information.
  • the fifth data storage node refers to the next data storage node of the fourth data storage node arranged in the order of execution of the target storage process.
  • the second step indication information is used to indicate a second one of the plurality of operation steps included in the target storage process.
  • Step 206 The fourth data storage node sends the second data information and the second step indication information to the fifth data storage node.
  • Step 207 The fifth data storage node performs a storage process on the second data based on the second data information, the second step indication information, and the stored topology information of the target storage process to obtain the third data.
  • the fifth data storage node may perform a second one of the plurality of operation steps included in the target storage process based on the second data to obtain the third data.
  • Step 208 The fifth data storage node determines, according to the second step indication information, that the next data storage node for processing the second data is an output node, and determines the third step indication information based on the outputting step.
  • the second step indication information is used to indicate the last one of the plurality of operation steps included in the target storage process, determining that the next data storage node for processing the second data is an output node, and The output node is used to indicate the output step.
  • Step 209 The fifth data storage node sends the third data information and the third step indication information to the output node.
  • the third data information is indication information of the third data or the third data, and the third step indication information is used to indicate an output step.
  • Step 210 The output node outputs the third data as output data of the target storage process based on the third data information and the third step indication information.
  • the data storage node that performs the target storage process includes two data storage nodes as an example, and in actual applications, the data storage node that executes the target storage process may further include more data.
  • the storage node, and each of the data storage nodes can be executed according to the execution logic of the target data storage node shown in FIG. 1F, and details are not described herein again.
  • each data storage node for executing a stored procedure may directly process the data by itself, then determine the next data storage node, and directly send the data processing result to the next data storage node, and The data processing result is no longer transmitted back to the data storage node manager, thereby greatly reducing the data transmission amount and improving the execution efficiency and running performance of the storage process.
  • FIG. 3 is a schematic diagram of an execution flow of another storage process according to an embodiment of the present invention.
  • the distributed database 100 includes at least a data storage node A, a data storage node B, a data storage node C, and a data storage node M, and each data storage node can be connected to each other through a network.
  • the distributed database 100 is to be executed by the stored procedure S, and the semantics of the stored procedure S is “if the name of the great-grandfather has the name of the great-grandfather”, the stored procedure S includes three operation steps, which are respectively the operation step S1: Query the name of the father of @name, operation step S2: query the name of the grandfather of @name according to the father of @name, operation step S3: query the age of the great-grandfather of @name according to the name of the grandfather of @name.
  • @name is the data to be input of the stored procedure S, which can be any name.
  • each data storage node in the distributed database shown in FIG. 3 stores different data lists, and each data list is used to store the name of the person, the name of the corresponding father, and the age of the father. That is, different data lists can be partitioned by data according to the name of the person, and thus stored in different storage nodes. For example, as shown in FIG. 3, the data storage node A stores the data list 1, the data storage node B stores the data list 2, and the data storage node C stores the data list 3. Moreover, the data storage node M stores partition information for indicating a storage location of the data, that is, a data storage node that can indicate a different person name.
  • each data storage node in the distributed database shown in FIG. 3 stores topology information of the storage process S
  • the topology information of the storage process S is as shown in FIG. 1H, wherein the topology node 1 is used to indicate an input step.
  • the topology node 2, the topology node 3, and the topology node 4 are used for the operation step S1, the operation step S2, and the operation step S3, respectively, and the topology node 5 is used to indicate the output step.
  • the execution method of the stored procedure executes the stored procedure S, as shown in FIG. 3, if the data storage node M receives the call request of the stored procedure S, the input data carried by the call request of the stored procedure S is Z3, that is, @name is Z3, the execution process of the stored procedure S may include the following steps 1)-5):
  • the data storage node M calls the stored procedure S according to the call request of the stored procedure S, and performs an input step, which is also acquired after the process of calling the stored procedure S. Then, based on Z3 and the stored partition information, the data storage node A storing Z3 is determined, and Z3 and the first step indication information are transmitted to the data storage node A.
  • the first step indication information is used to indicate the first operation step of the multiple operation steps included in the storage process S, for example, may be a value of 1.
  • the data storage node A determines, based on the Z3, the first step indication information and the stored topology information of the stored procedure S, that the data storage node A is used to perform the operation step S1, and performs the operation step S1 based on Z3, that is, from the stored data list.
  • the name of the father who queries Z3 in 1 is Z2.
  • the data storage node A determines the data storage node B storing Z2 based on the Z2 and the stored partition information, determines the second step indication information based on the first step indication information, and transmits the Z2 and the second step indication information to the data storage node.
  • the second step indication information is used to indicate a second one of the plurality of operation steps included in the storage process S, for example, may be a value of 2.
  • the data storage node B determines, based on the Z2, the second step indication information and the stored topology information of the stored procedure S, that the data storage node B is used to perform the operation step S2, and performs the operation step S2 based on Z2, that is, from the stored data list.
  • the name of the father who queried Z2 in 2 is Z1.
  • the data storage node B determines the data storage node C storing Z1 based on the Z1 and the stored partition information, determines the third step indication information based on the second step indication information, and transmits the Z1 and the third step indication information to the data storage node.
  • the third step indication information is used to indicate a third operation step of the plurality of operation steps included in the storage process S, for example, may be a value of 3.
  • the data storage node C determines, based on the Z1, the third step indication information and the stored topology information of the stored procedure S, that the data storage node C is used to perform the operation step S3, and performs the operation step S3 based on Z3, that is, from the stored data list.
  • the age of the father who queried Z1 in 3 was 85.
  • the data storage node C determines, based on the third step indication information, that the next data storage node is the output node, that is, the data storage node M, and determines the fourth step indication information based on the output step, and sends the 85 and the fourth step indication information to the data.
  • Storage node M determines, based on the Z1, the third step indication information and the stored topology information of the stored procedure S, that the data storage node C is used to perform the operation step S3, and performs the operation step S3 based on Z3, that is, from the stored data list.
  • the age of the father who queried Z1 in 3 was 85.
  • the fourth step indication information is used to indicate an output step, for example, may be a string out.
  • the data storage node M After receiving the 85 and the fourth step indication information, the data storage node M outputs 85 as the output data of the stored procedure S based on the fourth step indication information.
  • each of the data storage node A, the data storage node B, and the data storage node C can directly process the data by itself, and then determine the next data storage node, and The data processing result is directly sent to the next data storage node, and the data processing result is not transmitted back to the data storage node M, and the data interaction of each round trip is reduced to a single data transmission, thereby greatly reducing the data.
  • the amount of transmission reduces the data transmission consumption and improves the execution efficiency and running performance of the stored procedure.
  • FIG. 4A is a schematic structural diagram of an apparatus for executing a storage process according to an embodiment of the present invention, where the apparatus is applied to a target data storage node in a distributed database; the distributed database includes a plurality of data storage nodes for performing target storage.
  • the first data storage node, the target data storage node, and the second data storage node of the plurality of data storage nodes are three data storage nodes that sequentially execute the target storage process.
  • the apparatus includes:
  • the receiving module 401 is configured to perform the operations performed by step 102 in the foregoing embodiment of FIG. 1F;
  • the processing module 402 is configured to perform the operations performed by step 103 in the foregoing embodiment of FIG. 1F;
  • a determining module 403 configured to perform the operations performed by step 104 in the foregoing embodiment of FIG. 1F;
  • the sending module 404 is configured to perform the operations performed by step 105 in the foregoing embodiment of FIG. 1F.
  • the first data storage node is an input node
  • the first data is input data acquired by the input node based on a call request of the target storage process, and the input node receives the target in the distributed database.
  • a data storage node that invokes a request for a stored procedure
  • the first data is a data processing result obtained by the intermediate node processing the data based on the received data information, where the intermediate node is used to execute the target storage process.
  • the processing module 402 includes:
  • the first determining unit 4021 is configured to determine the first data based on the first data information
  • the second determining unit 4022 is configured to determine, according to the first step indication information and the topology information of the target storage process, a first operation step that the target data storage node needs to perform;
  • the executing unit 4023 is configured to perform the first computing step based on the first data to obtain the second data.
  • the first determining unit 4021 is configured to:
  • the first data information is the indication information of the first data
  • the first data is obtained from the data stored by the target data storage node based on the indication information of the first data.
  • the receiving module 401 is further configured to receive an identifier of the target storage process sent by the first data storage node, where the processing module 402 is further configured to acquire the target storage process based on the identifier of the target storage process.
  • Topology information and/or
  • the sending module 404 is further configured to send the identifier of the target stored procedure to the second data storage node.
  • the next location is used for Instructing the next operational step of the first operational step
  • the output step is used to indicate The received data is output as output data of the target stored procedure.
  • the determining module 403 includes:
  • a third determining unit 4031 configured to determine, according to the second data and the stored partition information, a data storage node that stores the second data in advance from the distributed database;
  • the fourth determining unit 4032 is configured to determine, as the second data storage node, a data storage node that stores the second data in advance.
  • any one of the plurality of data storage nodes for executing the target storage process in the distributed database may receive the first data information sent by the previous data storage node, and then store the target data based on the storage.
  • the topology information of the process directly processes the first data to obtain the second data, determines the next data storage node for processing the second data based on the second data and the stored partition information, and finally sends the second data information to the next one.
  • the data storage node causes the next data storage node to process the second data based on the stored topology information of the target storage process. That is, each data storage node can directly process the data by itself, then determine the next data storage node, and directly send the data processing result to the next data storage node without returning the data processing result.
  • the data storage node manager which greatly reduces the amount of data transmission and improves the execution efficiency and running performance of the stored procedure.
  • the execution device of the stored procedure provided by the foregoing embodiment is only illustrated by the division of each functional module described above when executing the stored procedure.
  • the foregoing function may be allocated by different functional modules according to requirements.
  • the internal structure of the device is divided into different functional modules to perform all or part of the functions described above.
  • the execution device of the stored procedure provided by the foregoing embodiment is the same as the embodiment of the method for executing the stored procedure, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transfer to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital versatile disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)). )Wait.
  • a magnetic medium for example, a floppy disk, a hard disk, a magnetic tape
  • an optical medium for example, a digital versatile disc (DVD)
  • DVD digital versatile disc
  • SSD solid state disk
  • an apparatus for executing a stored procedure comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processing The apparatus is configured to perform the execution method of the stored procedure described in any of the above-described embodiments of FIG. 1F or FIG. 2B.
  • a computer readable storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform any of the above-described embodiments of FIG. 1F or FIG. 2B The execution method of the stored procedure as described in the example.
  • a computer program product comprising instructions which, when executed on a computer, cause the computer to perform the method of performing the stored procedure described in any of the above-described embodiments of FIG. 1F or FIG. 2B.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application discloses a storage procedure executing method and device, and a storage medium, belonging to the technical field of big data. Said method is used for a distributed database, and comprises: a target data storage node receiving first data information and first step indication information which are sent by a first data storage node; performing storage procedure processing on the first data on the basis of the first data information, the first step instruction information and stored topology information of the target storage process, so as to obtain second data; determining a second data storage node and second step instruction information on the basis of the second data, the stored partition information and the first step instruction information, and sending the second data information and the second step instruction information to a second data storage node. Thus, each data storage node may directly process data and send a data processing result to the second data storage node, without the need of returning same to the data storage node manager, reducing the amount of data transmission, improving the execution efficiency and operation performance of the storage procedure.

Description

存储过程的执行方法、装置及存储介质Method, device and storage medium for executing stored procedure 技术领域Technical field
本申请涉及大数据技术领域,特别涉及一种执行存储过程的执行方法、装置及存储介质。The present application relates to the field of big data technologies, and in particular, to a method, an apparatus, and a storage medium for executing a stored procedure.
背景技术Background technique
近年来,随着数据量的高速增长,分布式数据库技术得到了快速发展。分布式数据库是指通过网络将物理上分散的多个数据存储节点连接起来组成逻辑上集中的数据库,可以将海量数据分区存储在各个数据存储节点中,解决了数据库的扩展性问题。分布式数据库可以通过调用和执行存储过程来完成相应业务,存储过程是一种存储在数据库中的可调用对象,实质是一组能够实现特定功能的SQL(Structured Query Language,结构化查询语言)语句集合,即存储过程包括多个运算步骤,具体可以通过存储过程名和输入数据进行调用。而且,在分布式数据库架构中,对于某个被调用的存储过程,若该存储过程执行所需的数据被存储在多个数据存储节点中,那么该存储过程的执行必需由多个数据存储节点交互完成。In recent years, with the rapid growth of data volume, distributed database technology has developed rapidly. Distributed database refers to a network of geographically dispersed data storage nodes connected to form a logically centralized database. The massive data partitions can be stored in each data storage node, which solves the problem of database scalability. A distributed database can complete the corresponding business by calling and executing a stored procedure. The stored procedure is a callable object stored in the database, which is essentially a set of SQL (Structured Query Language) statements that can implement specific functions. A collection, that is, a stored procedure, includes multiple arithmetic steps, which can be called by storing a procedure name and input data. Moreover, in a distributed database architecture, for a stored procedure to be called, if the data required for execution of the stored procedure is stored in a plurality of data storage nodes, the execution of the stored procedure must be performed by multiple data storage nodes. The interaction is complete.
相关技术中,分布式数据库通常采用主从控制方式执行存储过程,也即是,当某一数据存储节点接收到存储过程的调用请求时,如果该数据存储节点具有存储过程管理功能,则该数据存储节点即可作为存储节点管理器(Master),对执行该存储过程的多个数据存储节点进行统一管理和调度。首先,存储节点管理器可以基于第一数据和存储的分区信息,确定用于处理第一数据的第一个数据存储节点,其中,第一数据为存储过程的调用请求携带的输入数据,分区信息用于指示数据的存储位置,该第一个数据存储节点是指存储有该第一数据的数据存储节点。然后,存储节点管理器将第一数据和该存储过程的第一个运算步骤发送给第一个数据存储节点,以指示第一个数据存储节点基于第一数据执行第一个运算步骤,得到第二数据。之后,该第一个数据存储节点需要将第二数据回传给存储节点管理器,以便存储节点管理器继续基于存储的分区信息,确定用于处理第二数据的第二数据存储节点,并将第二数据和该存储过程的第二个运算步骤发送给该第二个数据存储节点,以指示该第二个数据存储节点基于该第二数据执行第二个运算步骤,得到第三数据。之后,该第二个数据存储节点再继续将第三数据回传给存储节点管理器,以便存储节点管理器重复执行调用下一个数据存储节点的步骤,直至调用到执行该存储过程的最后一个数据存储节点,使得该最后一个数据存储节点执行该存储过程的最后一个运算步骤,得到最后一个数据,并将该最后一个数据作为输出数据回传给存储节点管理器。In the related art, a distributed database usually performs a stored procedure in a master-slave control manner, that is, when a data storage node receives a call request of a stored procedure, if the data storage node has a stored procedure management function, the data The storage node can be used as a storage node manager (Master) to perform unified management and scheduling on multiple data storage nodes that execute the stored procedure. First, the storage node manager may determine, according to the first data and the stored partition information, a first data storage node for processing the first data, where the first data is input data carried by the call request of the stored procedure, and the partition information A storage location for indicating data, the first data storage node being a data storage node storing the first data. Then, the storage node manager sends the first data and the first operation step of the stored procedure to the first data storage node, to instruct the first data storage node to perform the first operation step based on the first data, to obtain the first Two data. Thereafter, the first data storage node needs to pass back the second data to the storage node manager, so that the storage node manager continues to determine the second data storage node for processing the second data based on the stored partition information, and The second data and the second operation step of the stored procedure are sent to the second data storage node to instruct the second data storage node to perform a second operation step based on the second data to obtain the third data. Thereafter, the second data storage node continues to pass the third data back to the storage node manager, so that the storage node manager repeatedly performs the step of calling the next data storage node until the last data of the execution of the stored procedure is invoked. The storage node is configured such that the last data storage node performs the last operation step of the stored procedure to obtain the last data, and returns the last data as output data to the storage node manager.
上述主从控制方式中,执行存储过程的每个数据存储节点都需要从存储节点管理器接收数据,还需要将数据的处理结果回传给数据存储节点,因此存储节点管理器与各个数据存储节点之间将存在大量数据传输,影响了存储过程的执行效率和运行性能。In the above master-slave control mode, each data storage node that executes the stored procedure needs to receive data from the storage node manager, and also needs to return the processing result of the data to the data storage node, so the storage node manager and each data storage node There will be a large amount of data transfer between them, which affects the execution efficiency and running performance of the stored procedure.
发明内容Summary of the invention
为了解决相关技术存在的存储节点管理器与各个数据存储节点之间传输的数据量较大,影响存储过程的执行效率和运行性能的问题,本申请提供了一种存储过程的执行方法、装置及存储介质。所述技术方案如下:In order to solve the problem that the amount of data transmitted between the storage node manager and each data storage node is large and affects the execution efficiency and the running performance of the storage process, the present application provides a storage method execution method and device. Storage medium. The technical solution is as follows:
第一方面,提供了一种存储过程的执行方法,应用于分布式数据库;所述分布式数据库包括的多个数据存储节点用于执行目标存储过程,所述多个数据存储节点中的第一数据存储节点、目标数据存储节点和第二数据存储节点为顺序执行所述目标存储过程的三个数据存储节点;所述方法包括:In a first aspect, a method for executing a stored procedure is provided, which is applied to a distributed database; the distributed database includes a plurality of data storage nodes for executing a target storage process, and the first of the plurality of data storage nodes The data storage node, the target data storage node, and the second data storage node are three data storage nodes that sequentially execute the target storage process; the method includes:
所述目标数据存储节点接收所述第一数据存储节点发送的第一数据信息和第一步骤指示信息,所述第一数据信息为第一数据或者所述第一数据的指示信息,所述第一数据为所述目标存储过程被执行时所需的数据,所述第一步骤信息用于指示所述目标数据存储节点需要执行的第一运算步骤在所述目标存储过程包括的多个运算步骤中的位置;Receiving, by the target data storage node, first data information and first step indication information sent by the first data storage node, where the first data information is first data or indication information of the first data, where a data is data required when the target stored procedure is executed, the first step information is used to indicate a plurality of operation steps included in the target storage process by the first operation step that the target data storage node needs to perform Position in
所述目标数据存储节点基于所述第一数据信息、所述第一步骤指示信息和存储的所述目标存储过程的拓扑信息,对所述第一数据进行存储过程处理,得到第二数据,所述目标存储过程的拓扑信息用于指示所述目标存储过程包括的多个运算步骤和所述多个运算步骤的执行顺序;The target data storage node performs a storage process on the first data to obtain a second data based on the first data information, the first step indication information, and the stored topology information of the target storage process. The topology information of the target storage process is used to indicate a plurality of operation steps included in the target storage process and an execution sequence of the multiple operation steps;
所述目标数据存储节点基于所述第二数据、分区信息和所述第一步骤指示信息,确定所述第二数据存储节点和第二步骤指示信息,并向所述第二数据存储节点发送第二数据信息和所述第二步骤指示信息,所述分区信息用于指示数据的存储位置,所述第二步骤指示信息用于指示所述第二数据存储节点需要执行的第二运算步骤在所述目标存储过程包括的多个运算步骤中的位置,所述第二数据信息为所述第二数据或者所述第二数据的指示信息。The target data storage node determines the second data storage node and the second step indication information based on the second data, the partition information, and the first step indication information, and sends the second data storage node to the second data storage node. Two data information and the second step indication information, the partition information is used to indicate a storage location of the data, and the second step indication information is used to indicate that the second data storage node needs to perform a second operation step in the The location in the plurality of operation steps included in the target storage process, the second data information being indication information of the second data or the second data.
其中,目标数据存储节点为用于执行目标存储过程的多个数据存储节点中的任一数据存储节点,第一数据存储节点为按照目标存储过程的执行顺序排列的目标数据存储节点的上一个数据存储节点,第二数据存储节点为按照目标存储过程的执行顺序排列的目标数据存储节点的下一个数据存储节点。The target data storage node is any one of the plurality of data storage nodes for executing the target storage process, and the first data storage node is the previous data of the target data storage node arranged according to the execution order of the target storage process. The storage node, the second data storage node is a next data storage node of the target data storage node arranged in the order of execution of the target storage process.
本发明实施例中,每个数据存储节点均可以通过自身直接对接收的数据进行处理,然后确定下一个数据存储节点,并将数据处理结果直接发送给下一个数据存储节点,而不用再将数据处理结果回传给数据存储节点管理器,从而大大减少了数据的传输量,提高了存储过程的执行效率和运行性能。In the embodiment of the present invention, each data storage node can directly process the received data by itself, then determine the next data storage node, and directly send the data processing result to the next data storage node without using the data again. The processing result is passed back to the data storage node manager, thereby greatly reducing the data transmission amount and improving the execution efficiency and running performance of the storage process.
在具体实现中,当所述第一数据存储节点为输入节点时,所述第一数据为所述输入节点基于所述目标存储过程的调用请求获取的输入数据,所述输入节点为所述分布式数据库中接收到所述目标存储过程的调用请求的数据存储节点;In a specific implementation, when the first data storage node is an input node, the first data is input data acquired by the input node based on a call request of the target storage process, and the input node is the distribution a data storage node in the database that receives the call request of the target stored procedure;
当所述第一数据存储节点为中间节点时,所述第一数据为所述中间节点基于接收的数据信息对数据进行处理得到的数据处理结果,所述中间节点是指用于执行所述目标存储过程包括的多个运算步骤中的任一运算步骤的数据存储节点。When the first data storage node is an intermediate node, the first data is a data processing result obtained by the intermediate node processing the data based on the received data information, where the intermediate node is used to execute the target A data storage node of any of the plurality of operational steps included in the stored procedure.
本发明实施例中,该第一数据可以为输入节点发送的初始输入数据,即用户在调用目标存储过程时输入的数据,也可以为中间节点发送的中间数据,即在执 行目标存储过程包括的任一运算步骤的过程中所产生的数据。相应地,目标数据存储节点可以为输入节点的下一个数据存储节点,也可以为任一中间节点的下一个数据存储节点。In the embodiment of the present invention, the first data may be initial input data sent by the input node, that is, data input by the user when the target storage process is invoked, or intermediate data sent by the intermediate node, that is, included in the execution target storage process. The data generated during the course of any of the operational steps. Accordingly, the target data storage node may be the next data storage node of the input node or the next data storage node of any intermediate node.
在具体实现中,所述目标数据存储节点基于所述第一数据信息、所述第一步骤指示信息和存储的所述目标存储过程的拓扑信息,对所述第一数据进行存储过程处理,得到第二数据,包括:In a specific implementation, the target data storage node performs a storage process on the first data based on the first data information, the first step indication information, and the stored topology information of the target storage process, to obtain The second data includes:
所述目标数据存储节点基于所述第一数据信息确定所述第一数据;Determining, by the target data storage node, the first data based on the first data information;
所述目标数据存储节点基于所述第一步骤指示信息和所述目标存储过程的拓扑信息,确定所述目标数据存储节点需要执行的第一运算步骤;Determining, by the target data storage node, a first operation step that the target data storage node needs to perform, based on the first step indication information and topology information of the target storage process;
所述目标数据存储节点基于所述第一数据执行所述第一运算步骤,得到所述第二数据。The target data storage node performs the first operation step based on the first data to obtain the second data.
其中,目标存储过程的拓扑信息用于指示该目标存储过程内部的执行逻辑,具体用于指示该目标存储过程包括的多个运算步骤和该多个运算步骤的执行顺序。The topology information of the target storage process is used to indicate execution logic inside the target storage process, and is specifically used to indicate multiple operation steps included in the target storage process and an execution sequence of the multiple operation steps.
本发明实施例中,由于目标数据存储节点存储有目标存储过程的拓扑信息,因此在目标存储过程被执行的过程中,目标数据存储节点即可直接根据上一个数据存储节点发送的步骤指示信息以及自身存储的目标存储过程的拓扑信息确定自身需要执行的运算步骤,并可以向下一个数据存储节点发送步骤指示信息,以指示下一个数据存储节点需要执行的运算步骤。In the embodiment of the present invention, since the target data storage node stores the topology information of the target storage process, in the process in which the target storage process is executed, the target data storage node may directly directly follow the step indication information sent by the previous data storage node and The topology information of the target stored procedure stored by itself determines the operation steps that need to be performed by itself, and may send step indication information to the next data storage node to indicate the operation steps that the next data storage node needs to perform.
如此,无需存储节点管理器的调度和管理,多个数据存储节点即能够顺序执行目标存储过程,且能够将产生的中间数据顺序传递,从而大大减少了数据的传输量,提高了存储过程的执行效率和运行性能。In this way, without the scheduling and management of the storage node manager, multiple data storage nodes can sequentially execute the target storage process, and can sequentially generate the generated intermediate data, thereby greatly reducing the data transmission amount and improving the execution of the storage process. Efficiency and operational performance.
在具体实现中,所述目标数据存储节点基于所述第一数据信息确定所述第一数据,包括:In a specific implementation, the target data storage node determines the first data based on the first data information, including:
当所述第一数据信息为所述第一数据的指示信息时,所述目标数据存储节点基于所述第一数据的指示信息从存储的数据中获取所述第一数据。When the first data information is the indication information of the first data, the target data storage node acquires the first data from the stored data based on the indication information of the first data.
其中,该第一数据信息可以为第一数据的索引信息等。本发明实施例中,当第一数据的数据量较大时,第一数据存储节点可以将第一数据转换为第一数据的指示信息,并将第一数据的指示信息发送给目标数据存储节点,从而减小数据传输量,提高存储过程的执行效率。The first data information may be index information of the first data or the like. In the embodiment of the present invention, when the data amount of the first data is large, the first data storage node may convert the first data into the indication information of the first data, and send the indication information of the first data to the target data storage node. , thereby reducing the amount of data transmission and improving the execution efficiency of the stored procedure.
在另一实施例中,所述方法还包括:In another embodiment, the method further includes:
所述目标数据存储节点接收所述第一数据存储节点发送的所述目标存储过程的标识,Receiving, by the target data storage node, an identifier of the target storage process sent by the first data storage node,
基于所述目标存储过程的标识,获取所述目标存储过程的拓扑信息;和/或,Obtaining topology information of the target storage process based on the identifier of the target storage process; and/or,
所述目标数据存储节点向所述第二数据存储节点发送所述目标存储过程的标识。The target data storage node sends an identifier of the target stored procedure to the second data storage node.
本发明实施例中,当每个数据存储节点存储有多个存储过程的拓扑信息时,各个用于执行目标存储过程的数据存储节点之间还可以传递目标存储过程的标识,以指示下一个数据存储节点当前所执行是那个存储过程,并根据所传递的目 标存储过程的标识从存储的多个存储过程的拓扑信息中获取目标存储过程的拓扑信息,作为确定执行运算步骤的依据,从而提高了执行存储过程的准确度。In the embodiment of the present invention, when each data storage node stores topology information of multiple stored procedures, each data storage node for executing the target storage process may also transmit an identifier of the target stored procedure to indicate the next data. The storage node is currently executing the stored procedure, and obtains the topology information of the target stored procedure from the stored topology information of the plurality of stored procedures according to the identifier of the delivered target stored procedure, as a basis for determining the execution of the operation step, thereby improving Execute the accuracy of the stored procedure.
在具体实现中,当所述第一步骤指示信息用于指示所述第一运算步骤为所述目标存储过程包括的多个运算步骤中除最后一个运算步骤之外的其他运算步骤时,所述第二步骤指示信息用于指示所述第一运算步骤的下一个运算步骤;In a specific implementation, when the first step indication information is used to indicate that the first operation step is another operation step other than the last operation step among the plurality of operation steps included in the target storage process, The second step indication information is used to indicate the next operation step of the first operation step;
当所述第一步骤指示信息用于指示所述第一运算步骤为所述目标存储过程包括的多个运算步骤中的最后一个运算步骤时,所述第二步骤指示信息用于指示输出步骤,所述输出步骤用于指示将接收到的数据作为所述目标存储过程的输出数据进行输出。When the first step indication information is used to indicate that the first operation step is the last one of the plurality of operation steps included in the target storage process, the second step indication information is used to indicate an output step, The outputting step is for indicating that the received data is output as output data of the target storage process.
在具体实现中,所述目标数据存储节点基于所述第二数据、分区信息和所述第一步骤指示信息,确定所述第二数据存储节点和第二步骤指示信息,包括:In a specific implementation, the target data storage node determines the second data storage node and the second step indication information, based on the second data, the partition information, and the first step indication information, including:
所述目标数据存储节点基于所述第二数据和分区信息,从所述分布式数据库中确定预先存储有所述第二数据的数据存储节点;Determining, by the target data storage node, a data storage node pre-stored with the second data from the distributed database based on the second data and partition information;
所述目标数据存储节点将预先存储有所述第二数据的数据存储节点,确定为所述第二数据存储节点。The target data storage node determines a data storage node in which the second data is stored in advance as the second data storage node.
本发明实施例中,由于目标数据存储节点存储有分区信息,因此在目标存储过程被执行的过程中,目标数据存储节点即可直接根据处理得到的第二数据和存储的分区信息,确定预先存储有该第二数据的数据存储节点,并将预先存储有该第二数据的数据存储节点确定为用于处理该第二数据的第二数据存储节点。In the embodiment of the present invention, since the target data storage node stores the partition information, in the process in which the target storage process is executed, the target data storage node may directly determine the pre-storage according to the processed second data and the stored partition information. a data storage node having the second data, and determining a data storage node in which the second data is stored in advance as a second data storage node for processing the second data.
如此,无需将处理得到的中间数据发送给存储节点管理器,由存储节点管理器调度下一个数据存储节点,目标数据存储节点自身可根据存储的分区信息确定下一个数据存储节点,并将中间数据发送给下一个数据存储节点进行处理,从而大大减少了数据的传输量,提高了存储过程的执行效率和运行性能。In this way, the processed intermediate data does not need to be sent to the storage node manager, and the storage node manager schedules the next data storage node, and the target data storage node itself can determine the next data storage node according to the stored partition information, and the intermediate data is It is sent to the next data storage node for processing, which greatly reduces the amount of data transmission and improves the execution efficiency and running performance of the stored procedure.
在具体实现中,所述基于所述第二中间数据、存储的分区信息和所述第一步骤指示信息,确定下一个数据存储节点和第二步骤指示信息,包括:In a specific implementation, the determining, by the second intermediate data, the stored partition information, and the first step indication information, the next data storage node and the second step indication information, including:
基于所述第一步骤指示信息和所述目标存储过程的拓扑信息确定所述第二步骤指示信息。And determining, according to the first step indication information and topology information of the target storage process, the second step indication information.
具体地,基于所述第一步骤指示信息和所述目标存储过程的拓扑信息确定所述第二步骤指示信息,包括:Specifically, determining, according to the first step indication information and the topology information of the target storage process, the second step indication information, including:
所述目标数据存储节点基于所述第一步骤指示信息和所述目标存储过程的拓扑信息,确定所述目标数据存储节点需要执行的第一运算步骤在所述目标存储过程包括的多个运算步骤中的位置;Determining, by the target data storage node, the plurality of operation steps included in the target storage process by the first operation step that the target data storage node needs to perform, based on the first step indication information and the topology information of the target storage process Position in
所述目标数据存储节点基于所述第一运算步骤在所述目标存储过程包括的多个运算步骤中的位置的下一个位置,确定所述第二步骤指示信息。The target data storage node determines the second step indication information based on a position of the first operation step at a position in a plurality of operation steps included in the target storage process.
在另一实施例中,所述方法还包括:In another embodiment, the method further includes:
当所述目标数据存储节点接收到上传的所述目标存储过程时,对所述目标存储过程进行拓扑编译,得到所述目标存储过程的拓扑信息;When the target data storage node receives the uploaded target storage process, performing topology compilation on the target storage process to obtain topology information of the target storage process;
所述目标数据存储节点将所述目标存储过程的拓扑信息发送给所述分布式数据库中除所述目标数据存储节点之外的其他数据存储节点。The target data storage node sends topology information of the target stored procedure to other data storage nodes in the distributed database except the target data storage node.
本发明实施例中,当分布式数据库中的任一数据存储节点接收到上传的目标存储过程时,均可以对该目标存储过程进行拓扑编译,得到该目标存储过程的拓扑信息,然后将该目标存储过程的拓扑信息发送给该分布式数据库中除该数据存储节点之外的其他数据存储节点。In the embodiment of the present invention, when any data storage node in the distributed database receives the uploaded target storage process, the target storage process may be topologically compiled, the topology information of the target storage process is obtained, and then the target is obtained. The topology information of the stored procedure is sent to other data storage nodes in the distributed database except the data storage node.
如此,保证了分布式数据库中的任一数据存储节点均预先存储有目标存储过程的拓扑信息,则当任一数据存储节点需要执行目标存储过程时,即能够根据存储的目标存储过程的拓扑信息确定需要执行的运算步骤,保证了在没有存储节点管理器的调度下,各个数据存储节点的顺序执行。In this way, it is ensured that any data storage node in the distributed database pre-stores the topology information of the target stored procedure, and when any data storage node needs to execute the target storage process, the topology information of the stored procedure according to the stored target is obtained. Determining the operational steps that need to be performed ensures that the sequential execution of the individual data storage nodes is performed without the scheduling of the storage node manager.
具体地,所述目标数据存储节点对所述目标存储过程进行拓扑编译,得到所述目标存储过程的拓扑信息,包括:Specifically, the target data storage node performs topology compilation on the target storage process to obtain topology information of the target storage process, including:
所述目标数据存储节点对所述目标存储过程进行分解处理,得到多个运算步骤,基于所述多个运算步骤和所述多个运算步骤的执行顺序确定所述目标存储过程的拓扑信息;或者,The target data storage node performs decomposition processing on the target storage process to obtain a plurality of operation steps, and determines topology information of the target storage process based on the execution order of the plurality of operation steps and the plurality of operation steps; or ,
所述目标数据存储节点对所述目标存储过程进行分解处理,得到多个运算步骤,按照所述多个运算步骤的执行顺序,在所述多个运算步骤之前增加输入步骤,并在所述多个运算步骤之后增加输出步骤,得到所述目标存储过程的拓扑信息,所述输入步骤用于对所述目标存储过程的输入数据进行输入,所述输出步骤用于对所述目标存储过程的输出数据进行输出。Decoding the target storage process by the target data storage node to obtain a plurality of operation steps, adding an input step before the plurality of operation steps according to an execution sequence of the plurality of operation steps, and Adding an output step to obtain topology information of the target stored procedure, the input step is for inputting input data of the target stored procedure, and the output step is for outputting the target stored procedure The data is output.
第二方面,提供了一种存储过程的执行装置,所述存储过程的执行装置具有实现上述第一方面中所述存储过程的执行方法行为的功能。所述存储过程的执行装置包括至少一个模块,该至少一个模块用于实现上述第一方面所提供的存储过程的执行方法。In a second aspect, there is provided an execution apparatus of a stored procedure, the execution apparatus of the stored procedure having a function of implementing an execution method behavior of the stored procedure in the first aspect described above. The execution device of the stored procedure includes at least one module for implementing the execution method of the stored procedure provided by the above first aspect.
第三方面,提供了一种存储过程的执行装置,所述存储过程的执行装置的结构中包括处理器和存储器,所述存储器用于存储支持存储过程的执行装置执行上述第一方面所提供的存储过程的执行方法的程序,以及存储用于实现上述第一方面所提供的存储过程的执行方法所涉及的数据。所述处理器被配置为用于执行所述存储器中存储的程序。所述存储设备的操作装置还可以包括通信总线,该通信总线用于该处理器与存储器之间建立连接。In a third aspect, there is provided an apparatus for executing a stored procedure, the structure of an execution apparatus of the stored procedure comprising a processor and a memory, the memory for storing an execution apparatus supporting the stored procedure to perform the foregoing first aspect A program storing a method of executing a process, and data related to storing an execution method for implementing the stored procedure provided by the first aspect described above. The processor is configured to execute a program stored in the memory. The operating device of the storage device may further include a communication bus for establishing a connection between the processor and the memory.
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面所述的存储过程的执行方法。In a fourth aspect, there is provided a computer readable storage medium having stored therein instructions that, when run on a computer, cause the computer to perform the method of executing the stored procedure described in the first aspect above.
第五方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的存储过程的执行方法。In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of executing the stored procedure described in the first aspect above.
上述第二方面、第三方面、第四方面和第五方面所获得的技术效果与第一方 面中对应的技术手段获得的技术效果近似,在这里不再赘述。The technical effects obtained by the second aspect, the third aspect, the fourth aspect, and the fifth aspect are similar to those obtained by the corresponding technical means in the first aspect, and are not described herein again.
本申请提供的技术方案带来的有益效果是:The beneficial effects brought by the technical solutions provided by the present application are:
本申请中,分布式数据库中用于执行目标存储过程的多个数据存储节点中的任一数据存储节点均可以接收第一数据存储节点发送的第一数据信息,然后基于存储的目标存储过程的拓扑信息对第一数据直接进行处理,得到第二数据,基于第二数据和存储的分区信息确定用于处理第二数据的下一个存储节点,最后将第二数据信息发送给第二数据存储节点,使得下一个数据库存储基于存储的目标存储过程的拓扑信息对第二数据进行处理。也即是,每个数据存储节点均可以通过自身直接对数据进行处理,然后确定第二数据存储节点,并将数据处理结果直接发送给第二数据存储节点,而不用再将数据处理结果回传给数据存储节点管理器,从而大大减少了数据的传输量,提高了存储过程的执行效率和运行性能。In this application, any one of the plurality of data storage nodes in the distributed database for executing the target storage process may receive the first data information sent by the first data storage node, and then based on the stored target storage process The topology information directly processes the first data to obtain second data, determines a next storage node for processing the second data based on the second data and the stored partition information, and finally sends the second data information to the second data storage node. So that the next database stores the second data based on the topology information of the stored target stored procedure. That is, each data storage node can directly process the data by itself, and then determine the second data storage node, and directly send the data processing result to the second data storage node, without returning the data processing result. Give the data storage node manager, which greatly reduces the amount of data transmission and improves the execution efficiency and running performance of the stored procedure.
附图说明DRAWINGS
图1A是一种分布式数据库100的系统架构图;1A is a system architecture diagram of a distributed database 100;
图1B是本发明实施例提供的一种存储过程的执行系统架构图;1B is a structural diagram of an execution system of a storage process according to an embodiment of the present invention;
图1C是相关技术提供的一种存储过程的执行流程示意图;1C is a schematic flowchart of an execution process of a storage process provided by the related art;
图1D是本发明实施例提供的一个数据存储节点10的逻辑结构示意图;FIG. 1D is a schematic diagram showing the logical structure of a data storage node 10 according to an embodiment of the present invention;
图1E是本发明实施例提供的一种数据存储节点10的硬件结构示意图;1E is a schematic structural diagram of hardware of a data storage node 10 according to an embodiment of the present invention;
图1F是本发明实施例提供的一种存储过程的执行方法的流程图;1F is a flowchart of a method for executing a stored procedure according to an embodiment of the present invention;
图1G是本发明实施例提供的一种存储过程的拓扑信息的示意图;1G is a schematic diagram of topology information of a storage process according to an embodiment of the present invention;
图1H是本发明实施例提供的另一种存储过程的拓扑信息的示意图;1H is a schematic diagram of topology information of another storage process according to an embodiment of the present invention;
图2A是本发明实施例提供的一种存储过程的执行系统架构图;2A is a structural diagram of an execution system of a storage process according to an embodiment of the present invention;
图2B是本发明实施例提供的另一种存储过程的执行方法的流程图;2B is a flowchart of another method for executing a stored procedure according to an embodiment of the present invention;
图3是本发明实施例提供的另一种存储过程的执行流程示意图;3 is a schematic flowchart of an execution process of another storage process according to an embodiment of the present invention;
图4A是本发明实施例提供的一种存储过程的执行装置的结构示意图;4A is a schematic structural diagram of an apparatus for executing a storage process according to an embodiment of the present invention;
图4B是本发明实施例提供的一种处理模块402的结构示意图;4B is a schematic structural diagram of a processing module 402 according to an embodiment of the present invention;
图4C是本发明实施例提供的一种确定模块403的结构示意图。FIG. 4C is a schematic structural diagram of a determining module 403 according to an embodiment of the present invention.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objects, technical solutions and advantages of the present application more clear, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.
在对本发明实施例提供的存储过程执行方法进行详细说明之前,首先对本发明实施例的应用场景进行介绍。Before the detailed description of the method for executing the stored procedure provided by the embodiment of the present invention, the application scenario of the embodiment of the present invention is first introduced.
本发明实施例应用于利用分布式数据库处理相关业务的场景中,该业务可以为查询业务、比对业务等等。以查询业务为例,可以基于分布式数据库中各数据存储节点存储的数据,利用分布式数据库查询某个人的曾祖父的年龄、查询某个人的财务情况等信息。在分布式数据库处理业务的过程中,通常需要通过调用和执行存储过程来完成业务。The embodiment of the present invention is applied to a scenario in which a related service is processed by using a distributed database, and the service may be a query service, a comparison service, or the like. Taking the query service as an example, the distributed database can be used to query the age of a person's great-grandfather and query the financial status of a person based on the data stored in each data storage node in the distributed database. In the process of processing a distributed database, it is usually necessary to complete the business by calling and executing a stored procedure.
在对本发明实施例的应用场景进行介绍之后,为了便于理解本发明实施例提供的存储过程的执行方法,接下来将对本发明实施例的系统架构进行介绍。After the application scenario of the embodiment of the present invention is introduced, in order to facilitate the understanding of the execution method of the stored procedure provided by the embodiment of the present invention, the system architecture of the embodiment of the present invention is introduced.
图1A是一种分布式数据库100的系统架构图,如图1A所示,分布式数据库100包括多个在物理上分散的数据存储节点10,该多个数据存储节点10可以通过网络连接。1A is a system architecture diagram of a distributed database 100. As shown in FIG. 1A, the distributed database 100 includes a plurality of physically dispersed data storage nodes 10, which may be connected by a network.
其中,每个数据存储节点10均具有自己局部的用于存储数据的数据库,通过网络互相连接之后即可组成一个全局的逻辑上集中、物理上分布的大型数据库,即分布式数据库。具体地,每个数据存储节点10可以为终端或者服务器等能够存储数据的节点。Each of the data storage nodes 10 has its own local database for storing data, and after being connected to each other through a network, a global logically centralized and physically distributed large database, that is, a distributed database, can be formed. Specifically, each data storage node 10 may be a node or a server capable of storing data.
分布式数据库的基本思想是将原来集中式数据库中的数据分散存储到多个通过网络连接的数据存储节点上,以获取更大的存储容量和更高的并发量。其具体可以采用分库分表技术将海量的数据分片存储到分布式数据库的各个存储节点中。概括来讲,分库分表技术是指按照既定的分区策略将大表的数据分片存储到各个存储节点,或者将大表切分成各个数据量较小的业务子表,并将各个子表按照既定的分区策略存储到各个存储节点。The basic idea of a distributed database is to distribute the data in the original centralized database to multiple data storage nodes connected through the network to obtain larger storage capacity and higher concurrency. Specifically, the data partitioning technology can be used to store massive data fragments into storage nodes of the distributed database. In summary, the sub-database sub-table technology refers to storing the data of the large table into each storage node according to the established partitioning strategy, or dividing the large table into the business sub-tables with smaller data amounts, and each sub-table Store to each storage node according to the established partitioning strategy.
利用上述分布式数据库100能够处理相应业务,本发明实施例中,以所处理的业务需要通过调用和执行存储过程来完成为例。图1B是本发明实施例提供的一种存储过程的执行系统架构图,如图1B所示,该系统架构包括客户端200和分布式数据库100,客户端200和分布式数据库100之间可以通过网络连接。The above-mentioned distributed database 100 can be used to process the corresponding service. In the embodiment of the present invention, the processed service needs to be completed by calling and executing the stored procedure. FIG. 1B is a structural diagram of an execution system of a storage process according to an embodiment of the present invention. As shown in FIG. 1B, the system architecture includes a client 200 and a distributed database 100. The client 200 and the distributed database 100 can pass through. Internet connection.
实际实现中,客户端200可以向分布式数据库100发送存储过程的调用请求,则该分布式数据库100中的某个数据存储节点10即可接收到该存储过程的调用请求,并根据该存储过程的调用请求调用存储过程,然后与其他数据存储节点10交互执行该存储过程。In an actual implementation, the client 200 may send a call request of the stored procedure to the distributed database 100, and then a certain data storage node 10 in the distributed database 100 may receive the call request of the stored procedure, and according to the stored procedure. The call request invokes the stored procedure and then interacts with other data storage nodes 10 to execute the stored procedure.
其中,接收该存储过程的调用请求的数据存储节点10可以由客户端200指定,也可以由分布式数据库100根据预先设置的业务处理逻辑指定,本发明实施例对此不做限定。The data storage node 10 that receives the call request of the stored procedure may be specified by the client 200, or may be specified by the distributed database 100 according to the preset service processing logic, which is not limited by the embodiment of the present invention.
需要说明的是,图1B仅是以客户端200为分布式数据库100之外的实体为例进行说明,而实际应用中,客户端200还可以为分布式数据库100中的任一数据存储节点10。也即是,当任一数据存储节点10获取到用户触发的或者预设条件触发的存储过程的调用请求时,即可根据该存储过程的调用请求调用存储过程,然后与其他数据存储节点10交互执行该存储过程。It should be noted that FIG. 1B is only an example in which the client 200 is an entity other than the distributed database 100. In an actual application, the client 200 may also be any data storage node 10 in the distributed database 100. . That is, when any data storage node 10 obtains a call request of a stored procedure triggered by a user or a preset condition, the stored procedure can be called according to the call request of the stored procedure, and then interact with other data storage nodes 10. Execute the stored procedure.
还需要说明的是,图1A和图1B仅以分布式数据库100包括3个数据存储节点10为例,本领域技术人员可以理解,图1A和图1B中示出的数据存储节点10的数目并不构成对分布式数据库100的限定,实际应用中,分布式数据库100可以包括比图示更多或更少的数据存储节点10,本发明实施例对此不做限定。It should be noted that FIG. 1A and FIG. 1B only take the distributed database 100 as including three data storage nodes 10 as an example, and those skilled in the art can understand the number of data storage nodes 10 shown in FIG. 1A and FIG. 1B. The definition of the distributed database 100 is not limited. In the actual application, the distributed database 100 may include more or less data storage nodes 10 than illustrated, which is not limited by the embodiment of the present invention.
为了便于理解本发明实施例提供的存储过程的执行方法的发明点,先对相关技术提供的存储过程的执行流程进行简单介绍。图1C是相关技术提供的一种存储 过程的执行流程示意图。如图1C所示,分布式数据库100至少包括数据存储节点A、数据存储节点B、数据存储节点C和数据存储节点M,且各个数据存储节点可以通过网络互相连接。In order to facilitate the understanding of the implementation method of the execution method of the stored procedure provided by the embodiment of the present invention, the execution flow of the stored procedure provided by the related art is briefly introduced. FIG. 1C is a schematic diagram of an execution flow of a storage process provided by the related art. As shown in FIG. 1C, the distributed database 100 includes at least a data storage node A, a data storage node B, a data storage node C, and a data storage node M, and each data storage node may be connected to each other through a network.
假设该分布式数据库100待执行存储过程S,且存储过程S的语义为“假如@name存在曾祖父则查询其曾祖父的年龄”,则该存储过程S包括3个运算步骤,分别为运算步骤S1:查询@name的父亲的名字,运算步骤S2:根据@name的父亲查询@name的祖父的名字,运算步骤S3:根据@name的祖父的名称查询@name的曾祖父的年龄。其中,@name为存储过程S的待输入数据,可以为任一人名。It is assumed that the distributed database 100 is to be executed by the stored procedure S, and the semantics of the stored procedure S is “if the name of the great-grandfather has the name of the great-grandfather”, the stored procedure S includes three operation steps, which are respectively the operation step S1: Query the name of the father of @name, operation step S2: query the name of the grandfather of @name according to the father of @name, operation step S3: query the age of the great-grandfather of @name according to the name of the grandfather of @name. Where @name is the data to be input of the stored procedure S, which can be any name.
另外,假设图1C所示的分布式数据库中的各个数据存储节点均存储有不同的数据列表,每个数据列表用于存储人名、对应的父亲的名字和父亲的年龄。也即是,不同的数据列表可以按照人名进行数据分区,从而存储在不同的存储节点中。例如,如图1C所示,数据存储节点A存储有数据列表1,数据存储节点B存储有数据列表2,数据存储节点C存储有数据列表3。而且,数据存储节点M存储有分区信息,该分区信息用于指示数据的存储位置,即可以指示不同人名的存储节点。In addition, it is assumed that each data storage node in the distributed database shown in FIG. 1C stores a different data list, and each data list is used to store the name of the person, the name of the corresponding father, and the age of the father. That is, different data lists can be partitioned by data according to the name of the person, and thus stored in different storage nodes. For example, as shown in FIG. 1C, the data storage node A stores the data list 1, the data storage node B stores the data list 2, and the data storage node C stores the data list 3. Moreover, the data storage node M stores partition information for indicating a storage location of the data, that is, a storage node that can indicate a different person name.
如果按照相关技术提供的主从控制方式执行存储过程S,则如图1C所示,当数据存储节点M接收到存储过程S的调用请求之后,数据存储节点M即可作为存储节点管理器(Master)。假设存储过程S的调用请求携带的输入数据为Z3,即@name为Z3,则该存储过程S的执行流程可以包括以下步骤1)-7):If the stored procedure S is executed in the master-slave control manner provided by the related art, as shown in FIG. 1C, after the data storage node M receives the call request of the stored procedure S, the data storage node M can serve as the storage node manager (Master). ). It is assumed that the input data carried by the call request of the stored procedure S is Z3, that is, @name is Z3, and the execution flow of the stored procedure S may include the following steps 1)-7):
1)数据存储节点M根据存储过程S的调用请求调用存储过程S,并确定存储过程S包括的运算步骤和执行顺序。然后,存储过程节点M基于Z3和存储的分区信息确定存储有Z3的数据存储节点A,并将Z3和运算步骤S1发送给数据存储节点A。1) The data storage node M calls the stored procedure S according to the call request of the stored procedure S, and determines the operation steps and execution order included in the stored procedure S. Then, the stored procedure node M determines the data storage node A storing Z3 based on Z3 and the stored partition information, and transmits the Z3 sum operation step S1 to the data storage node A.
2)数据存储节点A基于Z3执行运算步骤S1,即从存储的数据列表1中查询Z3的父亲的名字为Z2,然后将Z2回传给数据存储过程节点M。2) The data storage node A performs the operation step S1 based on Z3, that is, the name of the father who queries Z3 from the stored data list 1 is Z2, and then passes back Z2 to the data storage process node M.
3)数据存储节点M基于Z2和存储的分区信息确定存储有Z2的数据存储节点B,并将Z2和运算步骤S2发送给数据存储节点B。3) The data storage node M determines the data storage node B storing Z2 based on Z2 and the stored partition information, and transmits the Z2 and operation step S2 to the data storage node B.
4)数据存储节点B基于Z2执行运算步骤S2,即从存储的数据列表2中查询Z2的父亲的名字为Z1,然后将Z1回传给数据存储过程节点M。4) The data storage node B performs the operation step S2 based on Z2, that is, the name of the father of the query Z2 from the stored data list 2 is Z1, and then the Z1 is transmitted back to the data storage process node M.
5)数据存储节点M基于Z1和存储的分区信息确定存储有Z1的数据存储节点C,并将Z1和运算步骤S3发送给数据存储节点C。5) The data storage node M determines the data storage node C in which Z1 is stored based on Z1 and the stored partition information, and transmits the Z1 and operation step S3 to the data storage node C.
6)数据存储节点C基于Z1执行运算步骤S3,即从存储的数据列表3中查询Z1的父亲的年龄为85,然后将85回传给数据存储节点M。6) The data storage node C performs an operation step S3 based on Z1, that is, the age of the father of the query Z1 from the stored data list 3 is 85, and then 85 is transmitted back to the data storage node M.
7)数据存储节点M接收到85之后,即可将85作为该存储过程S的输出数据进行输出。7) After the data storage node M receives 85, 85 can be output as the output data of the stored procedure S.
由图1C所示,数据存储节点A、数据存储节点B和数据存储节点C中的每个数据存储节点均需要从数据存储节点M接收数据,且需要将数据的处理结果再回传给数据存储节点M,即执行存储过程的各个数据存储节点与存储节点管理器之间有大量的数据传输,而且,由于需要存储节点管理器需要对数据进行汇总,再 进行存储过程的各个运算步骤的调度,因此还会导致执行存储过程的各数据存储节点需要进行等待。As shown in FIG. 1C, each of the data storage node A, the data storage node B, and the data storage node C needs to receive data from the data storage node M, and needs to return the processing result of the data back to the data storage. The node M, that is, the data storage node executing the stored procedure and the storage node manager have a large amount of data transmission, and, because the storage node manager needs to summarize the data, and then perform the scheduling of each operation step of the storage process, Therefore, each data storage node that executes the stored procedure needs to wait.
也即是,采用的主从控制方式在执行存储过程的过程中伴随着数据的汇总、数据存储节点的等待和大量的数据传输,因此会大大影响存储过程的执行效率和运行性能。That is to say, the master-slave control mode adopted is accompanied by data aggregation, data storage node waiting, and a large amount of data transmission in the process of executing the stored procedure, thereby greatly affecting the execution efficiency and running performance of the stored procedure.
在对相关技术提供的存储过程的执行流程进行简单介绍之后,接下来将对本发明实施例提供的存储过程的执行流程进行简单介绍。After the execution flow of the stored procedure provided by the related art is briefly introduced, the execution flow of the stored procedure provided by the embodiment of the present invention is briefly introduced.
本发明实施例中,结合上述图1A或图1B所示的分布式数据库100,假设该分布式数据库100包括的多个数据存储节点用于执行目标存储过程,且该多个数据存储节点中的第一数据存储节点、目标数据存储节点和第二数据存储节点为顺序执行目标存储过程的三个数据存储节点,其中,目标数据存储节点可以为用于执行目标存储过程的多个数据存储节点中的任一数据存储节点,第一数据存储节点为按照目标存储过程的执行顺序排列的目标数据存储节点的上一个数据存储节点,第二数据存储节点为按照目标存储过程的执行顺序排列的目标数据存储节点的下一个数据存储节点,则目标数据存储节点可以用于执行以下步骤1)-3):In the embodiment of the present invention, in combination with the distributed database 100 shown in FIG. 1A or FIG. 1B, it is assumed that the plurality of data storage nodes included in the distributed database 100 are used to execute a target storage process, and the plurality of data storage nodes The first data storage node, the target data storage node, and the second data storage node are three data storage nodes that sequentially execute the target storage process, wherein the target data storage node may be among a plurality of data storage nodes for executing the target storage process. Any of the data storage nodes, the first data storage node is a previous data storage node of the target data storage node arranged in the execution order of the target storage process, and the second data storage node is the target data arranged according to the execution order of the target storage process The next data storage node of the storage node, then the target data storage node can be used to perform the following steps 1)-3):
1)目标数据存储节点接收第一数据存储节点发送的第一数据信息和第一步骤指示信息,该第一数据信息为第一数据或者该第一数据的指示信息,该第一数据为该目标存储过程被执行时所需的数据,该第一步骤信息用于指示该目标数据存储节点需要执行的第一运算步骤在该目标存储过程包括的多个运算步骤中的位置;1) The target data storage node receives the first data information and the first step indication information sent by the first data storage node, where the first data information is the first data or the indication information of the first data, where the first data is the target a data required when the stored procedure is executed, the first step information is used to indicate a position of the first operation step that the target data storage node needs to perform in the plurality of operation steps included in the target storage process;
2)目标数据存储节点基于该第一数据信息、该第一步骤指示信息和存储的该目标存储过程的拓扑信息,对该第一数据进行处理,得到第二数据,该目标存储过程的拓扑信息用于指示该目标存储过程包括的多个运算步骤和该多个运算步骤的执行顺序;2) The target data storage node processes the first data based on the first data information, the first step indication information, and the stored topology information of the target storage process to obtain second data, and the topology information of the target storage process And a plurality of operation steps included in the target storage process and an execution sequence of the multiple operation steps;
3)目标数据存储节点基于该第二数据、存储的分区信息和该第一步骤指示信息,确定第二数据存储节点和第二步骤指示信息,并向该第二数据存储节点发送第二数据信息和该第二步骤指示信息,该分区信息用于指示数据的存储位置,该第二步骤指示信息用于指示该第二数据存储节点需要执行的第二运算步骤在该目标存储过程包括的多个运算步骤中的位置,该第二数据信息为该第二数据或者该第二数据的指示信息。3) The target data storage node determines the second data storage node and the second step indication information based on the second data, the stored partition information, and the first step indication information, and sends the second data information to the second data storage node. And the second step indication information, the partition information is used to indicate a storage location of the data, and the second step indication information is used to indicate that the second operation step that the second data storage node needs to perform is included in the target storage process. a location in the computing step, the second data information being the second data or the indication information of the second data.
也即是,执行存储过程的每个数据存储节点均可以通过自身直接对数据进行处理,然后确定第二数据存储节点,并将数据处理结果直接发送给第二数据存储节点,而不用再将数据处理结果回传给数据存储节点管理器,从而大大减少了数据的传输量,提高了存储过程的执行效率和运行性能。That is, each data storage node executing the stored procedure can directly process the data by itself, then determine the second data storage node, and directly send the data processing result to the second data storage node without using the data. The processing result is passed back to the data storage node manager, thereby greatly reducing the data transmission amount and improving the execution efficiency and running performance of the storage process.
在一个具体实施例中,可以在分布式数据库100中的每个数据存储节点10中配置流处理模块,并在每个数据存储节点10存储分区信息,以使每个数据存储节点10通过配置的流处理模块和存储的分区信息实现本发明实施例提供的存储过程的执行方法。In a specific embodiment, stream processing modules can be configured in each data storage node 10 in the distributed database 100, and partition information is stored at each data storage node 10 such that each data storage node 10 is configured. The flow processing module and the stored partition information implement the execution method of the stored procedure provided by the embodiment of the present invention.
图1D是本发明实施例提供的一个数据存储节点10的逻辑结构示意图,如图1D所示,该数据存储节点10包括流处理模块11和分区信息12。流处理模块11用于执行存储过程,分区信息12用于指示数据的存储位置。FIG. 1D is a schematic diagram showing the logical structure of a data storage node 10 according to an embodiment of the present invention. As shown in FIG. 1D, the data storage node 10 includes a stream processing module 11 and partition information 12. The stream processing module 11 is for executing a stored procedure, and the partition information 12 is for indicating a storage location of the data.
其中,分区信息12可以为列表或者分区策略等形式。The partition information 12 may be in the form of a list or a partition policy.
其中,流处理模块11包括拓扑管理器11a和路径规划模块11b。The stream processing module 11 includes a topology manager 11a and a path planning module 11b.
拓扑管理器11a用于存储至少一个存储过程的拓扑信息,每个存储过程的拓扑信息用于指示该存储过程包括的多个运算步骤和多个运算步骤的执行顺序。进一步地,该拓扑管理器11a还可以用于对上传的目标存储过程进行拓扑编译,得到该目标存储过程的拓扑信息,并可以指示该数据存储节点10将该目标存储过程的拓扑信息发送其他数据存储节点10。The topology manager 11a is configured to store topology information of at least one stored procedure, and the topology information of each stored procedure is used to indicate a plurality of operation steps and an execution sequence of the plurality of operation steps included in the storage process. Further, the topology manager 11a may be further configured to perform topology compilation on the uploaded target storage process, obtain topology information of the target storage process, and instruct the data storage node 10 to send other data of the topology information of the target storage process. Storage node 10.
路径规划模块11b用于执行存储过程的分区调度和运算操作。分区调度是指基于数据和存储的分区信息,确定用于处理该数据的下一个存储节点。运算操作是指确定该数据存储节点10需要执行的运算步骤,并基于数据执行该运算步骤。具体可以基于步骤指示信息和存储的存储过程的拓扑信息,确定需要执行的运算步骤。The path planning module 11b is configured to perform partition scheduling and arithmetic operations of the stored procedure. Partition scheduling refers to determining the next storage node for processing the data based on the partition information of the data and the storage. The arithmetic operation refers to determining an operation step that the data storage node 10 needs to perform, and performs the operation step based on the data. Specifically, the operation step that needs to be performed may be determined based on the step indication information and the stored topology information of the stored procedure.
在对本发明实施例提供的存储过程的执行方法进行说明之前,先对本发明实施例涉及的数据存储节点的结构进行详细介绍。Before describing the execution method of the stored procedure provided by the embodiment of the present invention, the structure of the data storage node according to the embodiment of the present invention is described in detail.
图1E是本发明实施例提供的一种数据存储节点10的硬件结构示意图。参见图1E,该数据存储节点10包括处理器13,通信总线14,存储器15以及至少一个通信接口16。本领域技术人员可以理解,图1E中示出的数据存储节点10的结构并不构成对数据存储节点10的限定,实际应用中,数据存储节点10可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,本发明实施例对此不做限定。FIG. 1E is a schematic diagram showing the hardware structure of a data storage node 10 according to an embodiment of the present invention. Referring to FIG. 1E, the data storage node 10 includes a processor 13, a communication bus 14, a memory 15, and at least one communication interface 16. It will be understood by those skilled in the art that the structure of the data storage node 10 shown in FIG. 1E does not constitute a limitation on the data storage node 10. In practical applications, the data storage node 10 may include more or fewer components than illustrated. The embodiment of the present invention does not limit this, or combines some components, or different component arrangements.
处理器13可以是一个通用中央处理器(Central Processing Unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。The processor 13 can be a general purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the execution of the program of the present application. integrated circuit.
通信总线14可包括一通路,在上述组件之间传送信息。 Communication bus 14 may include a path for communicating information between the components described above.
存储器15可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其它类型的静态存储设备,随机存取存储器(random access memory,RAM))或者可存储信息和指令的其它类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。The memory 15 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (RAM), or other information that can store information and instructions. Type of dynamic storage device, or Electro Scientific Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disc storage, optical disc Storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or capable of carrying or storing desired program code in the form of instructions or data structures and capable of being Any other medium accessed by the computer, but is not limited thereto.
存储器15可以是独立存在,通过通信总线14与处理器13相连接。存储器15也可以和处理器13集成在一起。本发明实施例中,存储器15可以用于存储数据,比如可以用于存储分区信息、存储过程的拓扑信息或者第一数据存储节点发送的 信息等,并且,该存储器15也可以用于存储用于执行本发明实施例提供的存储过程的执行方法的一个或多个运行程序和/或模块。The memory 15 can exist independently and is coupled to the processor 13 via a communication bus 14. The memory 15 can also be integrated with the processor 13. In the embodiment of the present invention, the memory 15 may be used to store data, such as may be used to store partition information, topology information of a stored procedure, or information sent by a first data storage node, and the like, and the memory 15 may also be used for storage. One or more running programs and/or modules that execute the method of executing the stored procedure provided by the embodiments of the present invention.
通信接口16,使用任何收发器一类的装置,用于与其它设备或通信网络通信,如以太网,无线接入网(RAN),无线局域网(Wireless Local Area Networks,WLAN)等。The communication interface 16 uses devices such as any transceiver for communicating with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Networks (WLAN), and the like.
在具体实现中,作为一种实施例,处理器13可以包括一个或多个CPU,例如图1C中所示的CPU0和CPU1。In a particular implementation, as an embodiment, processor 13 may include one or more CPUs, such as CPU0 and CPU1 shown in Figure 1C.
在具体实现中,作为一种实施例,UE还可以包括输出设备17和输入设备18。In a specific implementation, as an embodiment, the UE may further include an output device 17 and an input device 18.
其中,输出设备17和处理器13通信,可以以多种方式来显示信息。例如,输出设备17可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等。Among them, the output device 17 communicates with the processor 13, and the information can be displayed in various ways. For example, the output device 17 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector. Wait.
其中,输入设备18和处理器13通信,可以以多种方式接收用户的输入。例如,输入设备18可以是键盘、触摸屏设备或传感设备等。Wherein the input device 18 is in communication with the processor 13, the user's input can be received in a variety of ways. For example, input device 18 can be a keyboard, a touch screen device, or a sensing device, and the like.
上述的数据存储节点10可以是终端或者其他具有数据存储功能的节点。在具体实现中,数据存储节点10可以是手机、便携式电脑、网络服务器、掌上电脑(Personal Digital Assistant,PDA)、平板电脑、无线用户设备(User Equipment,UE)、通信设备或者嵌入式设备等。本发明实施例不限定数据存储节点10的类型。The data storage node 10 described above may be a terminal or other node having a data storage function. In a specific implementation, the data storage node 10 can be a mobile phone, a portable computer, a network server, a personal digital assistant (PDA), a tablet computer, a user equipment (UE), a communication device, or an embedded device. The embodiment of the invention does not limit the type of data storage node 10.
其中,存储器15用于存储执行本申请方案的程序代码,并由处理器13来控制执行。处理器13用于执行存储器15中存储的程序代码。例如,图1E中所示的数据存储节点10可以通过处理器13以及存储器15中的程序代码,来实现下述图1F和图2B实施例所述的方法。The memory 15 is used to store program code for executing the solution of the present application, and is controlled by the processor 13 for execution. The processor 13 is operative to execute program code stored in the memory 15. For example, the data storage node 10 shown in FIG. 1E can implement the methods described in the following embodiments of FIGS. 1F and 2B through the processor 13 and program code in the memory 15.
接下来将结合上述图1A或图1B,对发明实施例提供的存储过程的执行方法进行详细介绍。图1F是本发明实施例提供的一种存储过程的执行方法的流程图,该方法应用于上述图1A或图1B该的分布式数据库,该分布式数据库包括的多个数据存储节点用于执行目标存储过程,该多个数据存储节点中的第一数据存储节点、目标数据存储节点和第二数据存储节点为顺序执行目标存储过程的三个数据存储节点。参见图1F,该方法包括如下步骤:The method for executing the stored procedure provided by the embodiment of the present invention will be described in detail below with reference to FIG. 1A or FIG. 1B. FIG. 1F is a flowchart of a method for executing a stored procedure according to an embodiment of the present invention. The method is applied to the distributed database of FIG. 1A or FIG. 1B, and the distributed database includes multiple data storage nodes for executing. The target storage process, the first data storage node, the target data storage node, and the second data storage node of the plurality of data storage nodes are three data storage nodes that sequentially execute the target storage process. Referring to FIG. 1F, the method includes the following steps:
步骤101:第一数据存储节点向目标数据存储节点发送第一数据信息和第一步骤指示信息。Step 101: The first data storage node sends the first data information and the first step indication information to the target data storage node.
其中,目标数据存储节点可以用于执行目标存储过程的多个数据存储节点中的任一数据存储节点,第一数据存储节点是指按照目标存储过程的执行顺序排列的目标数据存储节点的上一个数据存储节点,第二数据存储节点是指按照目标存储过程的执行顺序排列的目标数据存储节点的下一个数据存储节点。The target data storage node may be used to execute any one of the plurality of data storage nodes of the target storage process, and the first data storage node refers to the previous one of the target data storage nodes arranged according to the execution order of the target storage process. The data storage node, the second data storage node refers to the next data storage node of the target data storage node arranged in the order of execution of the target storage process.
其中,目标存储过程可以为分布式数据库调用的任一存储过程。需要说明的是,本发明实施例所述的存储过程并不是存储数据的过程,而是一种存储在数据库中的可调用对象,类似于一种可调用的函数,实际应用中可以通过存储过程名和输入数据进行调用。存储过程实质是一组能够实现特定功能的SQL语句集合, 即存储过程包括多个运算步骤,该多个运算步骤即为其对应的SQL语句集合,每个运算步骤为对应SQL语句集合中的一个SQL语句。具体地,每个运算步骤可以用于进行查询处理,当然也可以用于进行其他处理,本发明实施例对此不做限定。Wherein, the target stored procedure can be any stored procedure called by the distributed database. It should be noted that the stored procedure in the embodiment of the present invention is not a process of storing data, but a callable object stored in a database, which is similar to a callable function, and may be stored in a practical application. Name and input data are called. A stored procedure is essentially a set of SQL statements capable of implementing a specific function, that is, the stored procedure includes a plurality of operation steps, that is, a corresponding set of SQL statements, each of which is one of the corresponding SQL statement sets. SQL statement. Specifically, each of the operation steps may be used for performing the query processing, and may be used for other processing, which is not limited by the embodiment of the present invention.
其中,第一数据信息为第一数据或者第一数据的指示信息,第一数据为目标数据存储节点执行目标数据存储节点所包括的第一运算步骤时所需的数据。第一数据的指示信息可以用于指示第一数据或者第一数据的存储位置,根据第一数据的指示信息能够获取第一数据。The first data information is the first data or the indication information of the first data, and the first data is data required when the target data storage node executes the first operation step included in the target data storage node. The indication information of the first data may be used to indicate a storage location of the first data or the first data, and the first data may be acquired according to the indication information of the first data.
其中,第一步骤信息用于指示该目标数据存储节点需要执行的第一运算步骤在该目标存储过程包括的多个运算步骤中的位置,也即是,用于指示该目标数据存储节点需要执行的第一运算步骤为该目标存储过程包括的多个运算步骤中的第几个运算步骤,且该多个运算步骤为已按照执行先后顺序排列的多个运算步骤。The first step information is used to indicate a location of the first operation step that the target data storage node needs to perform in the multiple operation steps included in the target storage process, that is, to indicate that the target data storage node needs to be executed. The first operation step is a plurality of operation steps of the plurality of operation steps included in the target storage process, and the plurality of operation steps are a plurality of operation steps that have been arranged in the execution order.
示例的,该第一步骤指示信息可以为数值。比如,当该第一步骤信息用于指示该目标存储过程包括的多个运算步骤中的第1个运算步骤时,该第一步骤指示信息可以为数值1。当该第一步骤信息用于指示该目标存储过程包括的多个运算步骤中的第2个运算步骤时,该第一步骤指示信息可以为数值2等。For example, the first step indication information may be a numerical value. For example, when the first step information is used to indicate the first operation step of the plurality of operation steps included in the target storage process, the first step indication information may be a value of 1. When the first step information is used to indicate the second one of the plurality of operation steps included in the target storage process, the first step indication information may be a value of 2 or the like.
其中,该第一数据存储节点可以为输入节点,也可以为中间节点。输入节点是指该分布式数据库中接收到目标存储过程的调用请求的数据存储节点,中间节点是指用于执行该目标存储过程包括的多个运算步骤中的任一运算步骤的数据存储节点,具体为用于执行该目标数据存储节点所执行的第一运算步骤的上一个运算步骤的数据存储节点。The first data storage node may be an input node or an intermediate node. The input node refers to a data storage node in the distributed database that receives a call request of a target stored procedure, and the intermediate node refers to a data storage node used to execute any one of a plurality of operation steps included in the target storage process, Specifically, it is a data storage node for performing the previous operation step of the first operation step performed by the target data storage node.
本发明实施例中,根据该第一数据存储节点的不同,该第一数据和该第一步骤指示信息的获取方式和含义相应有所不同,具体包括以下两种:In the embodiment of the present invention, according to the difference of the first data storage node, the manner and meaning of the first data and the first step indication information are different, and specifically include the following two types:
1)当该第一数据存储节点为输入节点时,该第一数据为该输入节点基于目标存储过程的调用请求获取的输入数据,该第一步骤指示信息为该输入节点基于该目标存储过程的调用请求确定得到,且该第一步骤指示信息用于指示该目标存储过程包括的多个运算步骤中的第一个运算步骤。1) when the first data storage node is an input node, the first data is input data acquired by the input node based on a call request of the target storage process, and the first step indication information is that the input node is based on the target storage process The call request determination is obtained, and the first step indication information is used to indicate a first one of the plurality of operation steps included in the target storage process.
其中,该目标存储过程的调用请求用于调用该目标存储过程,可以携带该目标存储过程的标识和输入数据。该目标存储过程的标识用于唯一标识该目标存储过程,具体可以为该目标存储过程的名称或者编号等。该输入数据是指调用该目标存储过程所使用的输入参数,具体可以为用户发起业务时针对该目标存储过程输入的数据。The calling request of the target stored procedure is used to invoke the target stored procedure, and may carry the identifier and input data of the target stored procedure. The identifier of the target stored procedure is used to uniquely identify the target stored procedure, and may be a name or a number of the target stored procedure. The input data refers to an input parameter used to invoke the target stored procedure, and specifically may be data input for the target stored procedure when the user initiates a service.
实际应用中,该目标存储过程的调用请求可以由用户通过客户端触发,该客户端可以为分布式数据库之外的实体,也可以为该分布式数据库中的任一数据存储节点,本发明实施例对此不做限定。In an actual application, the calling request of the target stored procedure may be triggered by the user through the client, and the client may be an entity outside the distributed database, or may be any data storage node in the distributed database, and the present invention is implemented. This example does not limit this.
当分布式数据库中的任一数据存储节点接收到该目标存储过程的调用请求时,该数据存储节点即可作为输入节点。且该输入节点当接收到该目标存储过程的调用请求时,可以根据该目标存储过程调用请求携带的目标存储过程的标识和输入数据调用该目标存储过程,并可以执行输入步骤,该输入步骤是指在调用该目标存储过程的过程中获取输入数据。获取输入数据之后,该输入节点即可将该 输入数据作为该第一数据,并基于该第一数据和存储的分区信息,确定用于处理该第一数据的第二数据存储节点即目标数据储存节点,然后向该目标数据存储节点发送第一数据信息和第一步骤指示信息。When any data storage node in the distributed database receives the call request of the target stored procedure, the data storage node can serve as an input node. And the input node, when receiving the call request of the target stored procedure, may invoke the target stored procedure according to the identifier and the input data of the target stored procedure carried by the target stored procedure call request, and may perform an input step, where the input step is Refers to the input data obtained during the process of calling the target stored procedure. After obtaining the input data, the input node may use the input data as the first data, and determine, according to the first data and the stored partition information, a second data storage node, that is, a target data storage, for processing the first data. The node then sends the first data information and the first step indication information to the target data storage node.
2)当该第一数据存储节点为中间节点时,该第一数据为该中间节点基于接收的数据信息对数据进行存储过程处理得到的数据处理结果,该中间节点是指用于执行该目标存储过程包括的多个运算步骤中的任一运算步骤的数据存储节点,具体为用于执行该目标数据存储节点所执行的第一运算步骤的上一个运算步骤的数据存储节点。2) when the first data storage node is an intermediate node, the first data is a data processing result obtained by the intermediate node processing the data based on the received data information, where the intermediate node is used to perform the target storage The data storage node of any of the plurality of operational steps included in the process is specifically a data storage node for performing the previous operational step of the first operational step performed by the target data storage node.
其中,该第一步骤指示信息为该第一数据存储节点基于接收到步骤指示信息确定得到,且该第一步骤指示信息用于指示该第一数据存储节点所执行的运算步骤的下一个运算步骤。The first step indication information is determined by the first data storage node based on the received step indication information, and the first step indication information is used to indicate the next operation step of the operation step performed by the first data storage node. .
其中,对数据进行存储过程处理是指基于该数据执行当前需要执行的运算步骤,比如根据该数据进行查询处理,以查询与该数据相关的其他数据。The storing process processing on the data refers to performing an operation step that needs to be performed currently based on the data, for example, performing query processing according to the data to query other data related to the data.
也即是,当该第一数据存储节点为中间节点时,该第一数据存储节点与该目标数据存储节点的处理方式相同,可以按照该目标数据存储节点的方式进行处理,得到第一数据信息和第一步骤指示信息,然后向该目标数据存储节点发送第一数据信息和第一步骤指示信息。That is, when the first data storage node is an intermediate node, the first data storage node and the target data storage node are processed in the same manner, and may be processed according to the target data storage node to obtain the first data information. And the first step indication information, and then transmitting the first data information and the first step indication information to the target data storage node.
步骤102:目标数据存储节点接收第一数据存储节点发送的第一数据信息和第一步骤指示信息。Step 102: The target data storage node receives the first data information and the first step indication information sent by the first data storage node.
步骤103:目标数据存储节点基于该第一数据信息、该第一步骤指示信息和存储的该目标存储过程的拓扑信息,对该第一数据进行存储过程处理,得到第二数据。Step 103: The target data storage node performs a storage process on the first data based on the first data information, the first step indication information, and the stored topology information of the target storage process, to obtain second data.
其中,对第一数据进行存储过程处理是指基于该第一数据执行第一运算步骤,比如当该第一运算步骤为查询步骤时,可以根据该第一数据进行查询处理,该第二数据即为查询得到的与该第一数据相关的其他数据。The performing the processing of the first data in the storage process means that the first operation step is performed based on the first data. For example, when the first operation step is the query step, the query processing may be performed according to the first data, where the second data is Other data related to the first data obtained for the query.
其中,目标存储过程的拓扑信息用于指示该目标存储过程内部的执行逻辑,具体用于指示该目标存储过程包括的多个运算步骤和该多个运算步骤的执行顺序,也即是,该目标存储过程的拓扑信息可以包括该目标存储过程包括的多个运算步骤,且该多个运算步骤为已按照执行先后顺序排列的多个运算步骤。The topology information of the target storage process is used to indicate execution logic inside the target storage process, specifically for indicating a plurality of operation steps included in the target storage process and an execution sequence of the multiple operation steps, that is, the target The topology information of the stored procedure may include multiple operation steps included in the target storage process, and the multiple operation steps are multiple operation steps that have been arranged in the order of execution.
在一个实施例中,该目标存储过程的拓扑信息可以包括按照执行先后顺序排列的多个拓扑节点,该多个拓扑节点与该多个运算步骤一一对应,即每个拓扑节点用于指示该多个运算步骤中的一个运算步骤,而且,该多个拓扑节点还可以通过箭头连线相连,箭头连线用于指示该多个拓扑节点执行顺序。例如,假设目标存储过程包括3个运算步骤,则参见图1G,该目标存储过程的拓扑信息可以包括3个拓扑节点,分别为拓扑节点1、拓扑节点2和拓扑节点3,且这三个拓扑节点通过如图1G所示箭头连线相连。其中,拓扑节点1用于指示该目标存储包括的3个运算步骤中的第一个运算步骤,拓扑节点2用于指示3个运算步骤中的第二个运算步骤,拓扑节点3用于指示3个运算步骤中的第三个运步骤。In an embodiment, the topology information of the target stored procedure may include a plurality of topology nodes arranged in an order of execution, the plurality of topology nodes being in one-to-one correspondence with the plurality of operation steps, that is, each topology node is used to indicate the One of the plurality of operation steps, and the plurality of topology nodes may also be connected by an arrow, and the arrow connection is used to indicate the execution order of the multiple topology nodes. For example, if the target storage process includes three operation steps, referring to FIG. 1G, the topology information of the target storage process may include three topology nodes, namely, a topology node 1, a topology node 2, and a topology node 3, and the three topologies. The nodes are connected by an arrow as shown in Fig. 1G. The topology node 1 is used to indicate the first operation step of the three operation steps included in the target storage, the topology node 2 is used to indicate the second operation step of the three operation steps, and the topology node 3 is used to indicate 3 The third of the steps in the operation.
进一步地,该标存储过程的拓扑信息还可以用于指示输入步骤和输出步骤, 且该输入步骤的执行顺序在该多个运算步骤之前,该多个输出步骤的执行顺序在该多个运算步骤之后。也即是,该标存储过程的拓扑信息可以包括多个执行步骤,该多个执行步骤包括按照执行先后顺序排列的输入步骤、该目标存储过程包括的多个运算步骤和输出步骤。Further, the topology information of the target storage process may also be used to indicate an input step and an output step, and the execution sequence of the input step is before the plurality of operation steps, and the execution order of the plurality of output steps is in the multiple operation steps after that. That is, the topology information of the target stored procedure may include a plurality of execution steps including an input step arranged in order of execution, a plurality of operation steps and an output step included in the target storage process.
其中,输入步骤用于指示对目标存储过程的输入数据进行输入,具体用于在调用目标存储过程的过程中获取输入数据;输出步骤用于对目标存储过程的输出数据进行输出,具体用于将执行该目标存储过程的多个数据存储节点中的最后一个数据存储节点的数据处理结果作为输出数据进行输出。The input step is used to input input data of the target storage process, specifically for acquiring input data during the process of calling the target storage process; and the output step is for outputting the output data of the target storage process, specifically for The data processing result of the last one of the plurality of data storage nodes executing the target stored procedure is output as output data.
在一个实施例中,该目标存储过程的拓扑信息可以包括按照执行先后顺序排列的多个拓扑节点,该多个拓扑节点与多个执行步骤一一对应。其中,该多个拓扑节点中的第一个拓扑节点用于指示输入步骤,最后一个拓扑节点用于指示输出步骤,第一个拓扑节点和最后一个拓扑节点之间的拓扑节点用于指示该目标存储过程包括的多个运算步骤。而且,该多个拓扑节点还可以通过箭头连线相连,箭头连线用于指示该多个拓扑节点执行顺序。In an embodiment, the topology information of the target stored procedure may include a plurality of topology nodes arranged in an order of execution, the plurality of topology nodes corresponding to the plurality of execution steps. The first topology node of the multiple topology nodes is used to indicate an input step, and the last topology node is used to indicate an output step, and a topology node between the first topology node and the last topology node is used to indicate the target The storage process includes multiple operational steps. Moreover, the plurality of topology nodes may also be connected by an arrow, and the arrow connection is used to indicate the execution order of the multiple topology nodes.
例如,假设目标存储过程包括3个运算步骤,则参见图1H,该目标存储过程的拓扑信息可以包括5个拓扑节点,分别为拓扑节点1、拓扑节点2、拓扑节点3、拓扑节点4和拓扑节点5,且这5个拓扑节点通过如图1H所示箭头连线相连。其中,拓扑节点1用于指示输入步骤,拓扑节点2、拓扑节点3、拓扑节点4分别用于指示该目标存储包括的3个运算步骤中的第一个运算步骤、第二个运算步骤和第三个运步骤,拓扑节点5用于指示输出步骤。For example, if the target storage process includes three operation steps, referring to FIG. 1H, the topology information of the target storage process may include five topology nodes, namely, a topology node 1, a topology node 2, a topology node 3, a topology node 4, and a topology. Node 5, and the five topological nodes are connected by an arrow as shown in FIG. 1H. The topology node 1 is used to indicate an input step, and the topology node 2, the topology node 3, and the topology node 4 are respectively used to indicate the first operation step, the second operation step, and the third of the three operation steps included in the target storage. In the three steps, the topology node 5 is used to indicate the output step.
该目标数据存储节点存储有该目标存储过程的拓扑信息,该目标存储过程的拓扑信息可以为该目标数据存储节点在接收到上传的目标存储过程时,对该目标存储过程进行拓扑编译得到,也可以为该分布式数据库中的其他数据存储节点在接收到上传的目标存储过程时,对该目标存储过程进行拓扑编译得到该目标存储过程的拓扑信息之后,将该目标存储过程的拓扑信息向该目标数据存储节点发送得到。The target data storage node stores topology information of the target storage process, and the topology information of the target storage process may be obtained by topologically compiling the target storage process when the target data storage node receives the uploaded target storage process. After the other data storage nodes in the distributed database receive the uploaded target storage process, the topology of the target storage process is topologically compiled to obtain the topology information of the target storage process, and then the topology information of the target storage process is sent to the target storage process. The target data storage node sends it.
也即是,当该分布式数据库中的任一数据存储节点接收到上传的目标存储过程时,均可以对该目标存储过程进行拓扑编译,得到该目标存储过程的拓扑信息,然后将该目标存储过程的拓扑信息发送给该分布式数据库中除该数据存储节点之外的其他数据存储节点。That is, when any data storage node in the distributed database receives the uploaded target storage process, the target storage process may be topologically compiled, the topology information of the target storage process is obtained, and then the target storage is performed. The topology information of the process is sent to other data storage nodes in the distributed database other than the data storage node.
比如,当该目标数据存储节点接收到上传的目标存储过程时,可以对该目标存储过程进行拓扑编译,得到该目标存储过程的拓扑信息,然后将该目标存储过程的拓扑信息发送给该分布式数据库中除该目标数据存储节点之外的其他数据存储节点。For example, when the target data storage node receives the uploaded target storage process, the target storage process may be topologically compiled to obtain topology information of the target storage process, and then the topology information of the target storage process is sent to the distributed A data storage node in the database other than the target data storage node.
具体地,任一数据存储节点可以通过配置的拓扑管理器对存储过程进行拓扑编译,并通过拓扑管理器对得到的存储过程的拓扑信息进行存储,然后通过拓扑管理器向其他的数据存储节点进行发送。Specifically, any data storage node may perform topology compilation on the stored procedure through the configured topology manager, and store the topology information of the obtained stored procedure through the topology manager, and then perform the data to the other data storage nodes through the topology manager. send.
具体地,目标数据存储节点对目标存储过程进行拓扑编译,得到目标存储过程的拓扑信息可以包括以下两种实现方式:Specifically, the target data storage node performs topology compilation on the target storage process, and the topology information of the target storage process may include the following two implementation manners:
第一种实现方式:对该目标存储过程进行分解处理,得到多个运算步骤,基于该多个运算步骤和该多个运算步骤的执行顺序确定该目标存储过程的拓扑信息。The first implementation manner is: performing decomposition processing on the target storage process to obtain a plurality of operation steps, and determining topology information of the target storage process based on the multiple operation steps and the execution order of the multiple operation steps.
具体地,可以对该目标存储过程以单个SQL语句为单位进行分解,得到多个SQL语句,该多个SQL语句即为该多个运算步骤。Specifically, the target stored procedure may be decomposed in units of a single SQL statement to obtain a plurality of SQL statements, and the plurality of SQL statements are the plurality of operation steps.
第二种实现方式:对该目标存储过程进行分解处理,得到多个运算步骤,然后按照该多个运算步骤的执行顺序,在该多个运算步骤之前增加输入步骤,并在该多个运算步骤之后增加输出步骤,得到该目标存储过程的拓扑信息。The second implementation manner is: performing decomposition processing on the target storage process, obtaining a plurality of operation steps, and then adding an input step before the plurality of operation steps according to an execution order of the plurality of operation steps, and in the multiple operation steps Then increase the output step to get the topology information of the target stored procedure.
其中,该输入步骤用于对该目标存储过程的输入数据进行输入,该输出步骤用于对该目标存储过程的输出数据进行输出。The input step is for inputting input data of the target stored procedure, and the outputting step is for outputting output data of the target stored procedure.
进一步地,当该分布式数据库中的任一数据存储节点检测到对目标存储过程的更新操作和删除操作时,还可以对已存储的目标存储过程的拓扑信息进行更新或删除,并可以指示其他数据存储节点对存储的目标存储过程的拓扑信息进行更新或删除。Further, when any data storage node in the distributed database detects an update operation and a delete operation on the target storage process, the topology information of the stored target storage process may also be updated or deleted, and may indicate other The data storage node updates or deletes the topology information of the stored target stored procedure.
具体地,任一数据存储节点可以通过配置的拓扑管理器对存储过程上传、更新或删除操作进行响应。实际应用中,用户可以在任一数据存储节点的管理系统中,执行对存储过程的上传、更新和删除操作。Specifically, any data storage node may respond to a stored procedure upload, update, or delete operation through the configured topology manager. In practical applications, the user can perform upload, update, and delete operations on the stored procedure in the management system of any data storage node.
在一个实施例中,当该分布式数据库中的任一数据存储节点接收到对目标存储过程的更新指令时,还可以对更新后的目标存储过程进行拓扑编译,得到更新后的目标存储过程的拓扑信息,然后将更新后的目标存储过程的拓扑信息与已存储的目标存储过程的拓扑信息进行替换。而且,该数据存储节点还可以向其他数据存储节点发送该更新后的目标存储过程的拓扑信息和更新指令,以使其他数据存储节点对存储的目标存储过程的拓扑信息进行更新,也即是,将更新后的目标存储过程的拓扑信息与已存储的目标存储过程的拓扑信息进行替换。In an embodiment, when any data storage node in the distributed database receives an update instruction to the target stored procedure, the updated target storage process may also be topologically compiled to obtain an updated target storage process. The topology information is then replaced with the topology information of the updated target stored procedure and the topology information of the stored target stored procedure. Moreover, the data storage node may further send topology information and an update instruction of the updated target storage process to other data storage nodes, so that other data storage nodes update the topology information of the stored target storage process, that is, The topology information of the updated target stored procedure is replaced with the topology information of the stored target stored procedure.
在另一实施例中,当该分布式数据库中的任一数据存储节点接收到对目标存储过程的删除指令时,还可以对存储的目标存储过程的拓扑信息进行删除,然后向其他数据存储节点发送该目标存储过程的拓扑信息的删除指令,以使其他数据存储节点对存储的目标存储过程的拓扑信息进行删除。In another embodiment, when any data storage node in the distributed database receives the delete instruction for the target stored procedure, the topology information of the stored target stored procedure may also be deleted, and then the other data storage node is Sending a deletion instruction of the topology information of the target stored procedure, so that other data storage nodes delete the topology information of the stored target storage process.
本发明实施例中,基于所述第一数据信息、所述第一步骤指示信息和存储的所述目标存储过程的拓扑信息,对所述第一数据进行处理,得到第二数据可以包括如下步骤1)-3):In the embodiment of the present invention, the first data is processed according to the first data information, the first step indication information, and the stored topology information of the target storage process, and the obtaining the second data may include the following steps. 1)-3):
1)基于该第一数据信息确定该第一数据。1) determining the first data based on the first data information.
具体地,基于该第一数据信息确定该第一数据可以包括以下三种实现方式:Specifically, determining, according to the first data information, the first data may include the following three implementation manners:
第一种实现方式:当该第一数据信息为第一数据时,可以直接将该第一数据信息确定为该第一数据。The first implementation manner: when the first data information is the first data, the first data information may be directly determined as the first data.
第二种实现方式:当该第一数据信息为该第一数据的指示信息,且该第一中间信息的指示信息用于指示该第一数据时,可以基于预设转换规则,将该第一中间信息的指示信息转换为该第一数据。The second implementation manner is: when the first data information is the indication information of the first data, and the indication information of the first intermediate information is used to indicate the first data, the first The indication information of the intermediate information is converted into the first data.
例如,当该第一数据的指示信息为该第一数据的哈希值时,即可按照预设哈 希算法,将该第一数据的哈希值转换为该第一数据。For example, when the indication information of the first data is a hash value of the first data, the hash value of the first data may be converted into the first data according to a preset hash algorithm.
第三种实现方式:当该第一数据信息为该第一数据的指示信息,且该第一中间信息的指示信息用于指示该第一数据的存储位置时,可以基于该第一数据的指示信息,从该目标数据存储节点存储的数据中获取该第一数据。The third implementation manner is: when the first data information is the indication information of the first data, and the indication information of the first intermediate information is used to indicate the storage location of the first data, the indication of the first data may be based Information, the first data is obtained from data stored by the target data storage node.
2)基于该第一步骤指示信息和该目标存储过程的拓扑信息,确定该目标数据存储节点需要执行的第一运算步骤。2) determining, according to the first step indication information and the topology information of the target storage process, a first operation step that the target data storage node needs to perform.
由于该第一步骤指示信息用于指示该目标数据存储节点需要执行的第一运算步骤在该目标存储过程包括的多个运算步骤中的位置,即用于指示该第一运算步骤为该多个运算步骤中的第几个运算步骤,且该目标存储过程的拓扑信息用于指示该目标存储过程包括的多个运算步骤和该多个运算步骤的执行顺序,因此,基于该第一步骤指示信息和该目标存储过程的拓扑信息,即能够确定该目标数据存储节点需要执行的第一运算步骤。The first step indication information is used to indicate that the first operation step that the target data storage node needs to perform is in a plurality of operation steps included in the target storage process, that is, to indicate that the first operation step is the plurality of a plurality of operation steps in the operation step, and the topology information of the target storage process is used to indicate a plurality of operation steps included in the target storage process and an execution order of the plurality of operation steps, and therefore, the information is indicated based on the first step And the topology information of the target stored procedure, that is, the first operation step that the target data storage node needs to perform.
例如,假设目标存储过程包3个运算步骤,分别为按照执行先后顺序排列的运算步骤1、运算步骤2和运算步骤3,当该第一步骤指示信息用于该目标数据存储节点需要执行的第一运算步骤为该目标存储过程包括的多个运算步骤中的第2个运算步骤,则该目标数据存储节点即可基于该目标存储过程的拓扑信息,确定该目标存储过程包括的多个运算步骤中的第2个运算步骤为运算步骤2,然后将运算步骤2确定为该目标数据存储节点需要执行的第一运算步骤。For example, assume that the target storage process package has three operation steps, which are respectively an operation step 1, an operation step 2, and an operation step 3 arranged in the order of execution, when the first step indication information is used for the target data storage node to be executed. An operation step is a second operation step of the plurality of operation steps included in the target storage process, and the target data storage node may determine, according to the topology information of the target storage process, a plurality of operation steps included in the target storage process. The second operation step is the operation step 2, and then the operation step 2 is determined as the first operation step that the target data storage node needs to perform.
3)基于该第一数据执行该第一运算步骤,得到该第二数据。3) Performing the first operation step based on the first data to obtain the second data.
具体地,可以基于该第一数据,执行该第一运算步骤对应的SQL语句,得到该第二数据。Specifically, the SQL statement corresponding to the first operation step may be executed based on the first data to obtain the second data.
例如,假设该第一数据为人名Z3,该第一运算步骤对应的SQL语句的语义为查询某个人名的父亲的名字,则该目标数据储存节点即可从存储的数据(如列表)中查询Z3的父亲的名字,得到Z2,Z2即为第二数据。For example, if the first data is the person name Z3, and the semantics of the SQL statement corresponding to the first operation step is the name of the father who queries a certain person name, the target data storage node can query from the stored data (such as a list). Z3's father's name, get Z2, Z2 is the second data.
进一步地,由于目标数据存储节点可能存储有多个存储过程的拓扑信息,为了使得目标数据存储节点在接收到第一步骤指示信息之后,能够确定要基于哪个存储过程的拓扑信息来确定第一运算步骤,该第一数据存储节点向该目标数据存储节点发送第一数据信息和第一步骤指示信息的同时,还要向该目标数据存储节点发送所调用的目标存储过程的标识,相应地,该目标数据存储节点也需要向第二数据存储节点发送给目标存储过程的标识。也即是,需要在各个数据存储节点之间传递目标存储过程的标识。Further, since the target data storage node may store topology information of the plurality of stored procedures, in order to enable the target data storage node to determine the first operation based on the topology information of the stored procedure after receiving the first step indication information, Step, the first data storage node sends the first data information and the first step indication information to the target data storage node, and sends the identifier of the called target storage process to the target data storage node, and correspondingly, the identifier The target data storage node also needs to send an identification to the second data storage node to the target stored procedure. That is, the identity of the target stored procedure needs to be passed between the various data storage nodes.
进一步地,为了确定具体执行哪个存储过程,该目标存储节点还可以接收第一数据存储节点发送的目标存储过程的标识,基于目标存储过程的标识,获取目标存储过程的拓扑信息;和/或,向第二数据存储节点发送目标存储过程的标识。Further, in order to determine which storage process is specifically executed, the target storage node may further receive an identifier of the target storage process sent by the first data storage node, and acquire topology information of the target storage process based on the identifier of the target storage process; and/or, The identity of the target stored procedure is sent to the second data storage node.
其中,目标存储过程的标识用于唯一标识目标存储过程。也即是,当分布式数据库中的每个数据存储节点存储有多个存储过程的拓扑信息时,各个用于执行目标存储过程的数据存储节点之间还可以传递目标存储过程的标识,以指示下一个数据存储节点当前所执行是那个存储过程。The identifier of the target stored procedure is used to uniquely identify the target stored procedure. That is, when each data storage node in the distributed database stores topology information of multiple stored procedures, each of the data storage nodes for executing the target storage process may also transmit an identifier of the target stored procedure to indicate The next data storage node is currently executing that stored procedure.
具体地,目标数据存储节点可以基于目标存储过程的标识,从存储的多个存 储过程的拓扑信息中获取目标存储过程的拓扑信息。Specifically, the target data storage node may acquire topology information of the target storage process from the stored topology information of the plurality of storage processes based on the identifier of the target storage process.
在一个具体实施例中,目标数据存储节点可以在基于第一数据信息、第一步骤指示信息和存储的目标存储过程的拓扑信息对第一数据进行处理之前,接收第一数据存储节点发送的目标存储过程的标识,然后基于目标存储过程的标识,从存储的至少一个存储过程的拓扑信息中确定目标存储过程的拓扑信息。并在基于该第一数据信息、第一步骤指示信息和存储的目标存储过程的拓扑信息,对第一数据进行处理之后,向第二数据存储节点发送目标存储过程的标识,以使第二数据存储节点基于接收的目标存储过程的标识,获取目标存储过程的拓扑信息。In a specific embodiment, the target data storage node may receive the target sent by the first data storage node before processing the first data based on the first data information, the first step indication information, and the stored topology information of the target storage process. The identifier of the stored procedure is then determined based on the identifier of the target stored procedure, and the topology information of the target stored procedure is determined from the stored topology information of the at least one stored procedure. And after processing the first data based on the first data information, the first step indication information, and the stored topology information of the target storage process, sending the identifier of the target storage process to the second data storage node, so that the second data is The storage node acquires topology information of the target stored procedure based on the identifier of the received target storage process.
进一步地,第目标数据存储节点可以在向第二数据存储节点发送第二数据信息和第二步骤指示信息的同时,向该第二数据存储节点发送该目标存储过程的标识。Further, the first target data storage node may send the identifier of the target stored procedure to the second data storage node while sending the second data information and the second step indication information to the second data storage node.
步骤104:目标数据存储节点基于该第二数据、存储的分区信息和该第一步骤指示信息,确定第二数据存储节点和第二步骤指示信息。Step 104: The target data storage node determines the second data storage node and the second step indication information based on the second data, the stored partition information, and the first step indication information.
其中,该分区信息用于指示数据的存储位置,该第二步骤指示信息用于指示该第二数据存储节点需要执行的第二运算步骤在该目标存储过程包括的多个运算步骤中的位置,也即是,用于指示该第二运算步骤为该多个运算步骤中的哪个运算步骤。The partition information is used to indicate a storage location of the data, and the second step indication information is used to indicate a location of the second operation step that the second data storage node needs to perform in the multiple operation steps included in the target storage process, That is, it is used to indicate which of the plurality of operation steps is the second operation step.
具体地,基于该第二数据、存储的分区信息和该第一步骤指示信息,确定第二数据存储节点和第二步骤指示信息包括如下步骤1)-2):Specifically, determining, according to the second data, the stored partition information, and the first step indication information, that the second data storage node and the second step indication information comprise the following steps 1)-2):
1)基于该第一步骤指示信息和目标存储过程的拓扑信息确定该第二步骤指示信息。1) determining the second step indication information based on the first step indication information and the topology information of the target stored procedure.
具体地,可以基于该第一步骤指示信息和目标存储过程的拓扑信息,确定该目标数据存储节点需要执行的第一运算步骤在该目标存储过程包括的多个运算步骤中的位置;然后基于该第一运算步骤在该目标存储过程包括的多个运算步骤中的位置的下一个位置,确定该第二步骤指示信息。Specifically, the location of the first operation step that the target data storage node needs to perform in the multiple operation steps included in the target storage process may be determined based on the first step indication information and the topology information of the target storage process; The first operation step determines the second step indication information at a next position of the position of the plurality of operation steps included in the target storage process.
其中,当该第一运算步骤信息用于指示该第一运算步骤为该目标存储过程包括的多个运算步骤中除最后一个运算步骤之外的其他运算步骤时,该第二步骤指示信息用于指示该第一运算步骤的下一个运算步骤;当该第一运算步骤信息用于指示该第一运算步骤为该目标存储过程包括的多个运算步骤中的最后一个运算步骤时,该第二步骤指示信息用于指示输出步骤,该输出步骤用于指示将接收到的数据作为该目标存储过程的输出数据进行输出。Wherein, when the first operation step information is used to indicate that the first operation step is another operation step other than the last operation step among the plurality of operation steps included in the target storage process, the second step indication information is used for Determining a next operation step of the first operation step; when the first operation step information is used to indicate that the first operation step is the last one of the plurality of operation steps included in the target storage process, the second step The indication information is used to indicate an output step for instructing output of the received data as output data of the target stored procedure.
2)基于该第二数据和存储的分区信息,确定用于处理该第二数据的该第二数据存储节点。2) determining, based on the second data and the stored partition information, the second data storage node for processing the second data.
具体地,可以基于该第二数据和存储的分区信息,从该分布式数据库中确定预先存储有该第二数据的数据存储节点;然后将预先存储有该第二数据的数据存储节点,确定为该第二数据存储节点。Specifically, the data storage node pre-stored with the second data may be determined from the distributed database based on the second data and the stored partition information; and then the data storage node pre-stored with the second data may be determined as The second data storage node.
进一步地,该目标数据存储节点还可以在确定该第二步骤指示信息用于指示该第一运算步骤的下一个运算步骤时,执行基于该第二数据和存储的分区信息,确定用于处理该第二数据的该第二数据存储节点的步骤。而当确定该第二步骤指 示信息用于指示输出步骤时,该目标数据存储节点还可以将输出节点确定为该第二数据存储节点,以便该第二数据存储节点将该第二数据作为该目标存储过程的输出数据进行输出。Further, the target data storage node may further perform determining, according to the second data and the stored partition information, when determining that the second step indication information is used to indicate the next operation step of the first operation step. The step of the second data storage node of the second data. And determining that the second step indication information is used to indicate the output step, the target data storage node may further determine the output node as the second data storage node, so that the second data storage node uses the second data as the target The output data of the stored procedure is output.
为了便于目标数据存储节点确定输出节点,输入节点在向用于执行目标存储节点的中间节点发送数据信息的同时,还可以向数据发送输出节点的标识,而每个中间节点在向第二数据存储节点发送数据的同时,也可以向第二数据存储节点发送输出节点的标识,以便在任一中间节点确定第二数据存储节点用于执行输出步骤时,基于输出节点的标识确定输出节点,并将输出节点确定为第二数据存储节点。In order to facilitate the target data storage node to determine the output node, the input node may also send the identifier of the output node to the data while sending the data information to the intermediate node for executing the target storage node, and each intermediate node is in the second data storage. While the node sends the data, the identifier of the output node may also be sent to the second data storage node, so that when any intermediate node determines that the second data storage node is used to perform the output step, the output node is determined based on the identifier of the output node, and the output is determined. The node is determined to be the second data storage node.
其中,该输出节点可以由分布式数据库设置,也可以由用户设置,该输出节点与输入节点可以为相同的数据存储节点,也可以为不同的数据存储节点,本发明实施例对此不做限定。例如,可以将接收到该目标存储过程的调用请求的数据存储节点,即输入节点默认设置为输出节点。The output node may be set by a distributed database, or may be set by a user. The output node and the input node may be the same data storage node, or may be different data storage nodes, which is not limited in this embodiment of the present invention. . For example, the data storage node that receives the call request of the target stored procedure, ie, the input node, may be set as the output node by default.
步骤105:目标数据存储节点向第二数据存储节点发送第二数据信息和该第二步骤指示信息。Step 105: The target data storage node sends the second data information and the second step indication information to the second data storage node.
其中,该第二数据信息为该第二数据或者该第二数据的指示信息,第二数据的指示信息可以用于指示该第二数据,或者该第二数据的存储位置,根据该第二数据的指示信息能够获取该第二数据。The second data information is the second data or the indication information of the second data, and the indication information of the second data may be used to indicate the second data, or a storage location of the second data, according to the second data. The indication information can acquire the second data.
实际应用中,为了提高传输速率,当该第二数据的数据量较大时,可以将该第二数据转换为数据量较小的第二数据的指示信息发送,当该第二数据的数据量较小时,可以直接发送该第二数据。In an actual application, in order to increase the transmission rate, when the data amount of the second data is large, the second data may be converted into the indication information of the second data with a smaller amount of data, and the data amount of the second data is used. When it is small, the second data can be sent directly.
本发明实施例中,当该第二数据存储节点也是用于执行该目标存储过程的中间节点时,则该第二数据存储节点的执行逻辑与该目标存储节点的执行逻辑相同,即可以直接基于该第二数据信息对该第二数据进行处理得到第三数据,然后确定能够处理该第三数据的下第二数据存储节点,并将第三数据信息发送给该下第二数据存储节点。In the embodiment of the present invention, when the second data storage node is also an intermediate node for executing the target storage process, the execution logic of the second data storage node is the same as the execution logic of the target storage node, that is, it may be directly based on The second data information processes the second data to obtain third data, and then determines a lower second data storage node capable of processing the third data, and transmits the third data information to the lower second data storage node.
具体地,该第二数据存储节点可以接收该目标存储节点发送的第二数据信息和第二步骤指示信息;基于该第二数据信息、第二步骤指示信息和存储的该目标存储过程的拓扑信息,对该第二数据进行存储过程处理,得到第三数据;基于该第三数据、存储的分区信息和该第二步骤指示信息,确定用于处理该第三数据的下第二数据存储节点和第三步骤指示信息,并向第三数据存储节点发送第三数据信息和该第三步骤指示信息。其中,第三数据存储节点为按照目标存储过程的执行顺序排列的第二数据存储节点的下一个数据存储节点,该第三步骤指示信息用于指示该第三数据存储节点需要执行的第三运算步骤在该目标存储过程包括的多个运算步骤中的位置,该第三数据信息为该第二数据或者该第二数据的指示信息。Specifically, the second data storage node may receive the second data information and the second step indication information sent by the target storage node; and based on the second data information, the second step indication information, and the stored topology information of the target storage process. Performing a storage process on the second data to obtain third data; determining, according to the third data, the stored partition information, and the second step indication information, a second data storage node for processing the third data and The third step indicates information, and sends the third data information and the third step indication information to the third data storage node. The third data storage node is a next data storage node of the second data storage node arranged in the execution order of the target storage process, and the third step indication information is used to indicate a third operation that the third data storage node needs to perform. The step is at a position in the plurality of operation steps included in the target storage process, and the third data information is indication information of the second data or the second data.
其中,该第二数据存储节点可以按照目标数据存储节点基于该第一数据信息、该第一步骤指示信息和存储的该目标存储过程的拓扑信息,对该第一数据进行存储过程处理,得到第二数据的方法,基于该第二数据信息、第二步骤指示信息和存储的该目标存储过程的拓扑信息,对该第二数据进行存储过程处理,得到第三 数据,具体实现过程可以参考步骤103的相关描述,此处不再赘述。The second data storage node may perform a storage process on the first data according to the first data information, the first step indication information, and the stored topology information of the target storage process. The second data method, based on the second data information, the second step indication information, and the stored topology information of the target storage process, performs a storage process on the second data to obtain a third data, and the specific implementation process may refer to step 103. The related description is not repeated here.
其中,该第二数据存储节点可以按照目标数据存储节点基于该第二数据、存储的分区信息和该第一步骤指示信息,确定第二数据存储节点和第二步骤指示信息的方法,基于该第三数据、存储的分区信息和该第二步骤指示信息,确定用于处理该第三数据的第三数据存储节点和第三步骤指示信息,具体实现过程可以参考步骤104的相关描述,此处不再赘述。The second data storage node may determine, according to the second data storage node and the second step indication information, based on the second data, the stored partition information, and the first step indication information, according to the method. The third data, the stored partition information, and the second step indication information are used to determine the third data storage node and the third step indication information for processing the third data. For the specific implementation process, refer to the related description of step 104. Let me repeat.
也即是,用于执行目标存储过程的任一数据存储节点都可以按照该目标数据存储节点的处理逻辑,直接对数据进行存储过程处理得到数据处理结果,然后确定用于处理该数据处理结果的下一个数据存储节点,并将数据处理结果信息发送给下一个数据存储节点,由下一个数据存储节点进行处理,而不用再回传给存储节点管理器,从而避免了数据的往返传输,减小了数据的传输消耗。That is, any data storage node for executing the target storage process may directly process the data into a data processing result according to the processing logic of the target data storage node, and then determine the processing result for processing the data. The next data storage node sends the data processing result information to the next data storage node, and is processed by the next data storage node without being transmitted back to the storage node manager, thereby avoiding round-trip transmission of data and reducing The transmission of data is consumed.
进一步地,当该第二数据存储节点为输出节点时,该第二步骤指示信息用于指示输出操作,则该第二数据存储节点即可基于第二数据信息确定第二数据,然后将第二数据作为该目标存储过程的输出数据进行输出。例如,可以将该第二数据作为该目标存储过程的输出数据发送给发起该目标存储过程的调用请求的客户端,以便通过该客户端反馈给用户。Further, when the second data storage node is an output node, the second step indication information is used to indicate an output operation, and the second data storage node may determine the second data based on the second data information, and then the second The data is output as output data of the target stored procedure. For example, the second data may be sent as output data of the target stored procedure to a client that initiates a call request of the target stored procedure for feedback to the user through the client.
本发明实施例中,分布式数据库中用于执行目标存储过程的多个数据存储节点中的任一数据存储节点均可以接收上一个数据存储节点发送的第一数据信息,然后基于存储的目标存储过程的拓扑信息对第一数据直接进行处理,得到第二数据,基于第二数据和存储的分区信息确定用于处理第二数据的下一个数据存储节点,最后将第二数据信息发送给下一个数据存储节点,使得下一个数据存储节点基于存储的目标存储过程的拓扑信息对第二数据进行处理。也即是,每个数据存储节点均可以通过自身直接对数据进行处理,然后确定下一个数据存储节点,并将数据处理结果直接发送给下一个数据存储节点,而不用再将数据处理结果回传给数据存储节点管理器,从而大大减少了数据的传输量,提高了存储过程的执行效率和运行性能。In the embodiment of the present invention, any one of the plurality of data storage nodes for executing the target storage process in the distributed database may receive the first data information sent by the previous data storage node, and then store the target data based on the storage. The topology information of the process directly processes the first data to obtain the second data, determines the next data storage node for processing the second data based on the second data and the stored partition information, and finally sends the second data information to the next one. The data storage node causes the next data storage node to process the second data based on the stored topology information of the target storage process. That is, each data storage node can directly process the data by itself, then determine the next data storage node, and directly send the data processing result to the next data storage node without returning the data processing result. Give the data storage node manager, which greatly reduces the amount of data transmission and improves the execution efficiency and running performance of the stored procedure.
接下来将结合图2A所示的系统架构图,以用于执行目标存储过程的数据存储节点包括两个数据存储节点,分别为第四数据存储节点和第五数据存储节点为例,对本发明实施例提供的存储过程的执行方法进行详细介绍,其中,第四数据存储节点和第五数据存储节点的执行逻辑与上述图1F所示实施例中的目标数据存储节点的执行逻辑相同。Next, in conjunction with the system architecture diagram shown in FIG. 2A, the data storage node for executing the target storage process includes two data storage nodes, which are respectively a fourth data storage node and a fifth data storage node, for implementing the present invention. The execution method of the stored procedure provided by the example is described in detail, wherein the execution logic of the fourth data storage node and the fifth data storage node is the same as the execution logic of the target data storage node in the embodiment shown in FIG. 1F above.
图2B是本发明实施例提供的另一种存储过程的执行方法的流程图,该方法应用于上述图2A所示的系统架构中,该方法包括如下步骤:FIG. 2B is a flowchart of another method for executing a stored procedure according to an embodiment of the present invention. The method is applied to the system architecture shown in FIG. 2A, and the method includes the following steps:
步骤201:输入节点接收目标存储过程的调用请求,根据该目标存储过程的调用请求调用该目标存储过程,并在调用该目标存储过程的过程中获取输入数据。Step 201: The input node receives the call request of the target stored procedure, invokes the target stored procedure according to the call request of the target stored procedure, and acquires the input data during the process of calling the target stored procedure.
其中,该输入数据为该目标存储过程的调用请求携带的输入数据。The input data is input data carried by the call request of the target stored procedure.
也即是,输入节点可以基于目标存储过程的调用请求执行输入步骤,输入步骤是指在调用该目标存储过程的过程中获取输入数据。That is, the input node may perform an input step based on a call request of the target stored procedure, and the input step refers to acquiring input data during the process of calling the target stored procedure.
步骤202:输入节点基于输入数据和存储的分区信息,确定用于处理该输入数据的第四数据存储节点。Step 202: The input node determines a fourth data storage node for processing the input data based on the input data and the stored partition information.
其中,该四数据存储节点是指按照该目标存储过程的执行顺序排列的该输入节点的下一个数据存储节点。The four data storage nodes refer to the next data storage node of the input node arranged in the order of execution of the target storage process.
步骤203:输入节点向第四数据存储节点发送第一数据信息和第一步骤指示信息。Step 203: The input node sends the first data information and the first step indication information to the fourth data storage node.
其中,该第一数据信息为第一数据或第一数据的指示信息,该第一数据为输入数据。The first data information is indication information of the first data or the first data, and the first data is input data.
其中,该第一步骤指示信息由该输入节点基于该目标存储过程的调用请求确定得到,也即是,输入节点可以基于接收的目标存储过程的调用请求,确定下一个数据存储节点用于执行该目标存储过程包括的多个运算步骤中的第一个运算步骤,并基于该第一个运算步骤确定该第一步骤指示信息。也即是,该第一步骤指示信息用于指示该目标存储过程包括的多个运算步骤中的第一个运算步骤。The first step indication information is determined by the input node based on the call request of the target storage process, that is, the input node may determine, according to the invocation request of the received target storage procedure, that the next data storage node is configured to execute the The target storage process includes a first one of the plurality of operation steps, and determines the first step indication information based on the first operation step. That is, the first step indication information is used to indicate the first one of the plurality of operation steps included in the target storage process.
步骤204:第四数据存储节点基于第一数据信息、第一步骤指示信息和存储的目标存储过程的拓扑信息,对第一数据进行存储过程处理,得到第二数据。Step 204: The fourth data storage node performs a storage process on the first data to obtain the second data, based on the first data information, the first step indication information, and the stored topology information of the target storage process.
其中,第四数据存储节点可以基于该第一数据执行该目标存储过程包括的多个运算步骤中的第一个运算步骤,得到第二数据。The fourth data storage node may perform, according to the first data, a first one of the plurality of operation steps included in the target storage process to obtain the second data.
步骤205:第四数据存储节点基于第二数据、存储的分区信息和第一步骤指示信息,确定用于处理第二数据的第五数据存储节点,并确定第二步骤指示信息。Step 205: The fourth data storage node determines, according to the second data, the stored partition information, and the first step indication information, a fifth data storage node for processing the second data, and determines the second step indication information.
其中,第五数据存储节点是指按照该目标存储过程的执行顺序排列的该第四数据存储节点的下一个数据存储节点。该第二步骤指示信息用于指示该目标存储过程包括的多个运算步骤中的第二个运算步骤。The fifth data storage node refers to the next data storage node of the fourth data storage node arranged in the order of execution of the target storage process. The second step indication information is used to indicate a second one of the plurality of operation steps included in the target storage process.
步骤206:第四数据存储节点向第五数据存储节点发送第二数据信息和第二步骤指示信息。Step 206: The fourth data storage node sends the second data information and the second step indication information to the fifth data storage node.
步骤207:第五数据存储节点基于第二数据信息、第二步骤指示信息和存储的目标存储过程的拓扑信息,对第二数据进行存储过程处理,得到第三数据。Step 207: The fifth data storage node performs a storage process on the second data based on the second data information, the second step indication information, and the stored topology information of the target storage process to obtain the third data.
其中,第五数据存储节点可以基于该第二数据执行该目标存储过程包括的多个运算步骤中的第二个运算步骤,得到第三数据。The fifth data storage node may perform a second one of the plurality of operation steps included in the target storage process based on the second data to obtain the third data.
步骤208:第五数据存储节点基于第二步骤指示信息,确定用于处理第二数据的下一个数据存储节点为输出节点,并基于输出步骤确定第三步骤指示信息。Step 208: The fifth data storage node determines, according to the second step indication information, that the next data storage node for processing the second data is an output node, and determines the third step indication information based on the outputting step.
当该第二步骤指示信息用于指示该目标存储过程包括的多个运算步骤中的最后一个运算步骤时,即可确定用于处理该第二数据的下一个数据存储节点为输出节点,且该输出节点用于指示输出步骤。When the second step indication information is used to indicate the last one of the plurality of operation steps included in the target storage process, determining that the next data storage node for processing the second data is an output node, and The output node is used to indicate the output step.
步骤209:第五数据存储节点向输出节点发送第三数据信息和第三步骤指示信息。Step 209: The fifth data storage node sends the third data information and the third step indication information to the output node.
其中,该第三数据信息为第三数据或者第三数据的指示信息,该第三步骤指示信息用于指示输出步骤。The third data information is indication information of the third data or the third data, and the third step indication information is used to indicate an output step.
步骤210:输出节点基于该第三数据信息和第三步骤指示信息,将第三数据作为该目标存储过程的输出数据进行输出。Step 210: The output node outputs the third data as output data of the target storage process based on the third data information and the third step indication information.
需要说明的是,本发明实施例仅是以执行目标存储过程的数据存储节点包括两个数据存储节点为例进行说明,而实际应用中,执行目标存储过程的数据存储节点还可以包括更多数据存储节点,而其中每个数据存储节点均可以按照图1F所示的目标数据存储节点的执行逻辑执行,本发明实施例在此不再赘述。It should be noted that, in the embodiment of the present invention, only the data storage node that performs the target storage process includes two data storage nodes as an example, and in actual applications, the data storage node that executes the target storage process may further include more data. The storage node, and each of the data storage nodes can be executed according to the execution logic of the target data storage node shown in FIG. 1F, and details are not described herein again.
本发明实施例中,用于执行存储过程的每个数据存储节点均可以通过自身直接对数据进行处理,然后确定下一个数据存储节点,并将数据处理结果直接发送给下一个数据存储节点,而不用再将数据处理结果回传给数据存储节点管理器,从而大大减少了数据的传输量,提高了存储过程的执行效率和运行性能。In the embodiment of the present invention, each data storage node for executing a stored procedure may directly process the data by itself, then determine the next data storage node, and directly send the data processing result to the next data storage node, and The data processing result is no longer transmitted back to the data storage node manager, thereby greatly reducing the data transmission amount and improving the execution efficiency and running performance of the storage process.
图3是本发明实施例提供的另一种存储过程的执行流程示意图。如图3所示,分布式数据库100至少包括数据存储节点A、数据存储节点B、数据存储节点C和数据存储节点M,且各个数据存储节点可以通过网络互相连接。FIG. 3 is a schematic diagram of an execution flow of another storage process according to an embodiment of the present invention. As shown in FIG. 3, the distributed database 100 includes at least a data storage node A, a data storage node B, a data storage node C, and a data storage node M, and each data storage node can be connected to each other through a network.
假设该分布式数据库100待执行存储过程S,且存储过程S的语义为“假如@name存在曾祖父则查询其曾祖父的年龄”,则该存储过程S包括3个运算步骤,分别为运算步骤S1:查询@name的父亲的名字,运算步骤S2:根据@name的父亲查询@name的祖父的名字,运算步骤S3:根据@name的祖父的名称查询@name的曾祖父的年龄。其中,@name为存储过程S的待输入数据,可以为任一人名。It is assumed that the distributed database 100 is to be executed by the stored procedure S, and the semantics of the stored procedure S is “if the name of the great-grandfather has the name of the great-grandfather”, the stored procedure S includes three operation steps, which are respectively the operation step S1: Query the name of the father of @name, operation step S2: query the name of the grandfather of @name according to the father of @name, operation step S3: query the age of the great-grandfather of @name according to the name of the grandfather of @name. Where @name is the data to be input of the stored procedure S, which can be any name.
另外,假设图3所示的分布式数据库中的各个数据存储节点均存储有不同的数据列表,每个数据列表用于存储人名、对应的父亲的名字和父亲的年龄。也即是,不同的数据列表可以按照人名进行数据分区,从而存储在不同的存储节点中。例如,如图3所示,数据存储节点A存储有数据列表1,数据存储节点B存储有数据列表2,数据存储节点C存储有数据列表3。而且,数据存储节点M存储有分区信息,该分区信息用于指示数据的存储位置,即可以指示不同人名的数据存储节点。In addition, it is assumed that each data storage node in the distributed database shown in FIG. 3 stores different data lists, and each data list is used to store the name of the person, the name of the corresponding father, and the age of the father. That is, different data lists can be partitioned by data according to the name of the person, and thus stored in different storage nodes. For example, as shown in FIG. 3, the data storage node A stores the data list 1, the data storage node B stores the data list 2, and the data storage node C stores the data list 3. Moreover, the data storage node M stores partition information for indicating a storage location of the data, that is, a data storage node that can indicate a different person name.
另外,假设图3所示的分布式数据库中的各个数据存储节点均存储有存储过程S的拓扑信息,该存储过程S的拓扑信息如图1H所示,其中,拓扑节点1用于指示输入步骤,拓扑节点2、拓扑节点3、拓扑节点4分别用于运算步骤S1、运算步骤S2和运算步骤S3,拓扑节点5用于指示输出步骤。In addition, it is assumed that each data storage node in the distributed database shown in FIG. 3 stores topology information of the storage process S, and the topology information of the storage process S is as shown in FIG. 1H, wherein the topology node 1 is used to indicate an input step. The topology node 2, the topology node 3, and the topology node 4 are used for the operation step S1, the operation step S2, and the operation step S3, respectively, and the topology node 5 is used to indicate the output step.
如果按照本发明实施例提供的存储过程的执行方法执行存储过程S,如图3所示,假设数据存储节点M接收到存储过程S的调用请求时,存储过程S的调用请求携带的输入数据为Z3,即@name为Z3,则该存储过程S的执行流程可以包括以下步骤1)-5):If the execution method of the stored procedure according to the embodiment of the present invention executes the stored procedure S, as shown in FIG. 3, if the data storage node M receives the call request of the stored procedure S, the input data carried by the call request of the stored procedure S is Z3, that is, @name is Z3, the execution process of the stored procedure S may include the following steps 1)-5):
1)数据存储节点M根据存储过程S的调用请求调用存储过程S,并执行输入步骤,也是在调用存储过程S的过程中后获取输入数据Z3。然后基于Z3和存储的分区信息,确定存储有Z3的数据存储节点A,并将Z3和第一步骤指示信息发送给数据存储节点A。1) The data storage node M calls the stored procedure S according to the call request of the stored procedure S, and performs an input step, which is also acquired after the process of calling the stored procedure S. Then, based on Z3 and the stored partition information, the data storage node A storing Z3 is determined, and Z3 and the first step indication information are transmitted to the data storage node A.
其中,该第一步骤指示信息用于指示存储过程S包括的多个运算步骤中的第一个运算步骤,比如,可以为数值1。The first step indication information is used to indicate the first operation step of the multiple operation steps included in the storage process S, for example, may be a value of 1.
2)数据存储节点A基于Z3、第一步骤指示信息和存储的存储过程S的拓扑 信息,确定数据存储节点A用于执行运算步骤S1,并基于Z3执行运算步骤S1,即从存储的数据列表1中查询Z3的父亲的名字为Z2。然后,数据存储节点A基于Z2和存储的分区信息确定存储有Z2的数据存储节点B,基于第一步骤指示信息确定第二步骤指示信息,并将Z2和第二步骤指示信息发送给数据存储节点B。2) The data storage node A determines, based on the Z3, the first step indication information and the stored topology information of the stored procedure S, that the data storage node A is used to perform the operation step S1, and performs the operation step S1 based on Z3, that is, from the stored data list. The name of the father who queries Z3 in 1 is Z2. Then, the data storage node A determines the data storage node B storing Z2 based on the Z2 and the stored partition information, determines the second step indication information based on the first step indication information, and transmits the Z2 and the second step indication information to the data storage node. B.
其中,第二步骤指示信息用于指示存储过程S包括的多个运算步骤中的第二个运算步骤,比如,可以为数值2。The second step indication information is used to indicate a second one of the plurality of operation steps included in the storage process S, for example, may be a value of 2.
3)数据存储节点B基于Z2、第二步骤指示信息和存储的存储过程S的拓扑信息,确定数据存储节点B用于执行运算步骤S2,并基于Z2执行运算步骤S2,即从存储的数据列表2中查询Z2的父亲的名字为Z1。然后,数据存储节点B基于Z1和存储的分区信息确定存储有Z1的数据存储节点C,基于第二步骤指示信息确定第三步骤指示信息,并将Z1和第三步骤指示信息发送给数据存储节点C。3) The data storage node B determines, based on the Z2, the second step indication information and the stored topology information of the stored procedure S, that the data storage node B is used to perform the operation step S2, and performs the operation step S2 based on Z2, that is, from the stored data list. The name of the father who queried Z2 in 2 is Z1. Then, the data storage node B determines the data storage node C storing Z1 based on the Z1 and the stored partition information, determines the third step indication information based on the second step indication information, and transmits the Z1 and the third step indication information to the data storage node. C.
其中,第三步骤指示信息用于指示存储过程S包括的多个运算步骤中的第三个运算步骤,比如,可以为数值3。The third step indication information is used to indicate a third operation step of the plurality of operation steps included in the storage process S, for example, may be a value of 3.
4)数据存储节点C基于Z1、第三步骤指示信息和存储的存储过程S的拓扑信息,确定数据存储节点C用于执行运算步骤S3,并基于Z3执行运算步骤S3,即从存储的数据列表3中查询Z1的父亲的年龄为85。然后,数据存储节点C基于第三步骤指示信息确定下一个数据存储节点为输出节点,即数据存储节点M,并基于输出步骤确定第四步骤指示信息,将85和第四步骤指示信息发送给数据存储节点M。4) The data storage node C determines, based on the Z1, the third step indication information and the stored topology information of the stored procedure S, that the data storage node C is used to perform the operation step S3, and performs the operation step S3 based on Z3, that is, from the stored data list. The age of the father who queried Z1 in 3 was 85. Then, the data storage node C determines, based on the third step indication information, that the next data storage node is the output node, that is, the data storage node M, and determines the fourth step indication information based on the output step, and sends the 85 and the fourth step indication information to the data. Storage node M.
其中,第四步骤指示信息用于指示输出步骤,比如,可以为字符串out。The fourth step indication information is used to indicate an output step, for example, may be a string out.
5)数据存储节点M接收到85和第四步骤指示信息之后,基于第四步骤指示信息将85作为存储过程S的输出数据进行输出。5) After receiving the 85 and the fourth step indication information, the data storage node M outputs 85 as the output data of the stored procedure S based on the fourth step indication information.
由图3可知,相比与图1C,数据存储节点A、数据存储节点B和数据存储节点C中的每个数据存储节点可以通过自身直接对数据进行处理,然后确定下一个数据存储节点,并将数据处理结果直接发送给下一个数据存储节点,而不用再将数据处理结果回传给数据存储节点M,将每一步两次往返的数据交互减少到单次数据传输,从而大大减少了数据的传输量,减少了数据传输消耗,提高了存储过程的执行效率和运行性能。As can be seen from FIG. 3, compared with FIG. 1C, each of the data storage node A, the data storage node B, and the data storage node C can directly process the data by itself, and then determine the next data storage node, and The data processing result is directly sent to the next data storage node, and the data processing result is not transmitted back to the data storage node M, and the data interaction of each round trip is reduced to a single data transmission, thereby greatly reducing the data. The amount of transmission reduces the data transmission consumption and improves the execution efficiency and running performance of the stored procedure.
图4A是本发明实施例提供的一种存储过程的执行装置的结构示意图,该装置应用于分布式数据库中的目标数据存储节点;该分布式数据库包括的多个数据存储节点用于执行目标存储过程,该多个数据存储节点中的第一数据存储节点、该目标数据存储节点和第二数据存储节点为顺序执行该目标存储过程的三个数据存储节点。参见图4A,该装置包括:4A is a schematic structural diagram of an apparatus for executing a storage process according to an embodiment of the present invention, where the apparatus is applied to a target data storage node in a distributed database; the distributed database includes a plurality of data storage nodes for performing target storage. The first data storage node, the target data storage node, and the second data storage node of the plurality of data storage nodes are three data storage nodes that sequentially execute the target storage process. Referring to Figure 4A, the apparatus includes:
接收模块401,用于执行上述图1F所述实施例中步骤102执行的操作;The receiving module 401 is configured to perform the operations performed by step 102 in the foregoing embodiment of FIG. 1F;
处理模块402,用于执行上述图1F所述实施例中步骤103执行的操作;The processing module 402 is configured to perform the operations performed by step 103 in the foregoing embodiment of FIG. 1F;
确定模块403,用于执行上述图1F所述实施例中步骤104执行的操作;a determining module 403, configured to perform the operations performed by step 104 in the foregoing embodiment of FIG. 1F;
发送模块404,用于执行上述图1F所述实施例中步骤105执行的操作。The sending module 404 is configured to perform the operations performed by step 105 in the foregoing embodiment of FIG. 1F.
可选地,当该第一数据存储节点为输入节点时,该第一数据为该输入节点基 于该目标存储过程的调用请求获取的输入数据,该输入节点为该分布式数据库中接收到该目标存储过程的调用请求的数据存储节点;Optionally, when the first data storage node is an input node, the first data is input data acquired by the input node based on a call request of the target storage process, and the input node receives the target in the distributed database. a data storage node that invokes a request for a stored procedure;
当该第一数据存储节点为中间节点时,该第一数据为该中间节点基于接收的数据信息对数据进行处理得到的数据处理结果,该中间节点是指用于执行该目标存储过程包括的多个运算步骤中的任一运算步骤的数据存储节点。When the first data storage node is an intermediate node, the first data is a data processing result obtained by the intermediate node processing the data based on the received data information, where the intermediate node is used to execute the target storage process. A data storage node of any of the operational steps.
可选地,参见图4B,该处理模块402包括:Optionally, referring to FIG. 4B, the processing module 402 includes:
第一确定单元4021,用于基于该第一数据信息确定该第一数据;The first determining unit 4021 is configured to determine the first data based on the first data information;
第二确定单元4022,用于基于该第一步骤指示信息和该目标存储过程的拓扑信息,确定该目标数据存储节点需要执行的第一运算步骤;The second determining unit 4022 is configured to determine, according to the first step indication information and the topology information of the target storage process, a first operation step that the target data storage node needs to perform;
执行单元4023,用于基于该第一数据执行该第一运算步骤,得到该第二数据。The executing unit 4023 is configured to perform the first computing step based on the first data to obtain the second data.
可选地,该第一确定单元4021用于:Optionally, the first determining unit 4021 is configured to:
当该第一数据信息为该第一数据的指示信息时,基于该第一数据的指示信息,从该目标数据存储节点存储的数据中获取该第一数据。When the first data information is the indication information of the first data, the first data is obtained from the data stored by the target data storage node based on the indication information of the first data.
可选地,该接收模块401,还用于接收该第一数据存储节点发送的该目标存储过程的标识;该处理模块402,还用于基于该目标存储过程的标识,获取该目标存储过程的拓扑信息;和/或Optionally, the receiving module 401 is further configured to receive an identifier of the target storage process sent by the first data storage node, where the processing module 402 is further configured to acquire the target storage process based on the identifier of the target storage process. Topology information; and/or
该发送模块404,还用于向该第二数据存储节点发送该目标存储过程的标识。The sending module 404 is further configured to send the identifier of the target stored procedure to the second data storage node.
可选地,当该第一步骤指示信息用于指示该第一运算步骤为该目标存储过程包括的多个运算步骤中除最后一个运算步骤之外的其他运算步骤时,该下一个位置用于指示该第一运算步骤的下一个运算步骤;Optionally, when the first step indication information is used to indicate that the first operation step is another operation step other than the last operation step among the plurality of operation steps included in the target storage process, the next location is used for Instructing the next operational step of the first operational step;
当该第一步骤指示信息用于指示该第一运算步骤为该目标存储过程包括的多个运算步骤中的最后一个运算步骤时,该下一个位置用于指示输出步骤,该输出步骤用于指示将接收到的数据作为该目标存储过程的输出数据进行输出。When the first step indication information is used to indicate that the first operation step is the last one of the plurality of operation steps included in the target storage process, the next position is used to indicate an output step, the output step is used to indicate The received data is output as output data of the target stored procedure.
可选地,参见图4C,该确定模块403包括:Optionally, referring to FIG. 4C, the determining module 403 includes:
第三确定单元4031,用于基于该第二数据和存储的分区信息,从该分布式数据库中确定预先存储有该第二数据的数据存储节点;a third determining unit 4031, configured to determine, according to the second data and the stored partition information, a data storage node that stores the second data in advance from the distributed database;
第四确定单元4032,用于将预先存储有该第二数据的数据存储节点,确定为该第二数据存储节点。The fourth determining unit 4032 is configured to determine, as the second data storage node, a data storage node that stores the second data in advance.
本发明实施例中,分布式数据库中用于执行目标存储过程的多个数据存储节点中的任一数据存储节点均可以接收上一个数据存储节点发送的第一数据信息,然后基于存储的目标存储过程的拓扑信息对第一数据直接进行处理,得到第二数据,基于第二数据和存储的分区信息确定用于处理第二数据的下一个数据存储节点,最后将第二数据信息发送给下一个数据存储节点,使得下一个数据存储节点基于存储的目标存储过程的拓扑信息对第二数据进行处理。也即是,每个数据存储节点均可以通过自身直接对数据进行处理,然后确定下一个数据存储节点,并将数据处理结果直接发送给下一个数据存储节点,而不用再将数据处理结果回传给数据存储节点管理器,从而大大减少了数据的传输量,提高了存储过程的执行效率和运行性能。In the embodiment of the present invention, any one of the plurality of data storage nodes for executing the target storage process in the distributed database may receive the first data information sent by the previous data storage node, and then store the target data based on the storage. The topology information of the process directly processes the first data to obtain the second data, determines the next data storage node for processing the second data based on the second data and the stored partition information, and finally sends the second data information to the next one. The data storage node causes the next data storage node to process the second data based on the stored topology information of the target storage process. That is, each data storage node can directly process the data by itself, then determine the next data storage node, and directly send the data processing result to the next data storage node without returning the data processing result. Give the data storage node manager, which greatly reduces the amount of data transmission and improves the execution efficiency and running performance of the stored procedure.
需要说明的是:上述实施例提供的存储过程的执行装置在执行存储过程时, 仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的存储过程的执行装置与存储过程的执行方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that the execution device of the stored procedure provided by the foregoing embodiment is only illustrated by the division of each functional module described above when executing the stored procedure. In an actual application, the foregoing function may be allocated by different functional modules according to requirements. Upon completion, the internal structure of the device is divided into different functional modules to perform all or part of the functions described above. In addition, the execution device of the stored procedure provided by the foregoing embodiment is the same as the embodiment of the method for executing the stored procedure, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(Digital Subscriber Line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(Digital Versatile Disc,DVD))、或者半导体介质(例如:固态硬盘(Solid State Disk,SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present invention are generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transfer to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital versatile disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)). )Wait.
在另一实施例中,还提供了一种存储过程的执行装置,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器被配置为执行上述图1F或图2B任一实施例所述的存储过程的执行方法。In another embodiment, there is also provided an apparatus for executing a stored procedure, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processing The apparatus is configured to perform the execution method of the stored procedure described in any of the above-described embodiments of FIG. 1F or FIG. 2B.
在另一实施例中,还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述图1F或图2B任一实施例所述的存储过程的执行方法。In another embodiment, there is also provided a computer readable storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform any of the above-described embodiments of FIG. 1F or FIG. 2B The execution method of the stored procedure as described in the example.
在另一实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述图1F或图2B任一实施例所述的存储过程的执行方法。In another embodiment, there is also provided a computer program product comprising instructions which, when executed on a computer, cause the computer to perform the method of performing the stored procedure described in any of the above-described embodiments of FIG. 1F or FIG. 2B.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
以上所述为本申请提供的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above description of the embodiments of the present application is not intended to limit the application, and any modifications, equivalents, improvements, etc. made within the spirit and principles of the present application are included in the scope of the present application. Inside.

Claims (16)

  1. 一种存储过程的执行方法,其特征在于,应用于分布式数据库;所述分布式数据库包括的多个数据存储节点用于执行目标存储过程,所述多个数据存储节点中的第一数据存储节点、目标数据存储节点和第二数据存储节点为顺序执行所述目标存储过程的三个数据存储节点;所述方法包括:A method for executing a stored procedure, characterized in that it is applied to a distributed database; the distributed database includes a plurality of data storage nodes for executing a target storage process, and a first data storage of the plurality of data storage nodes The node, the target data storage node, and the second data storage node are three data storage nodes that sequentially execute the target storage process; the method includes:
    所述目标数据存储节点接收所述第一数据存储节点发送的第一数据信息和第一步骤指示信息,所述第一数据信息为第一数据或者所述第一数据的指示信息,所述第一数据为所述目标存储过程被执行时所需的数据,所述第一步骤信息用于指示所述目标数据存储节点需要执行的第一运算步骤在所述目标存储过程包括的多个运算步骤中的位置;Receiving, by the target data storage node, first data information and first step indication information sent by the first data storage node, where the first data information is first data or indication information of the first data, where a data is data required when the target stored procedure is executed, the first step information is used to indicate a plurality of operation steps included in the target storage process by the first operation step that the target data storage node needs to perform Position in
    所述目标数据存储节点基于所述第一数据信息、所述第一步骤指示信息和存储的所述目标存储过程的拓扑信息,对所述第一数据进行存储过程处理,得到第二数据,所述目标存储过程的拓扑信息用于指示所述目标存储过程包括的多个运算步骤和所述多个运算步骤的执行顺序;The target data storage node performs a storage process on the first data to obtain a second data based on the first data information, the first step indication information, and the stored topology information of the target storage process. The topology information of the target storage process is used to indicate a plurality of operation steps included in the target storage process and an execution sequence of the multiple operation steps;
    所述目标数据存储节点基于所述第二数据、分区信息和所述第一步骤指示信息,确定所述第二数据存储节点和第二步骤指示信息,并向所述第二数据存储节点发送第二数据信息和所述第二步骤指示信息,所述分区信息用于指示数据的存储位置,所述第二步骤指示信息用于指示所述第二数据存储节点需要执行的第二运算步骤在所述目标存储过程包括的多个运算步骤中的位置,所述第二数据信息为所述第二数据或者所述第二数据的指示信息。The target data storage node determines the second data storage node and the second step indication information based on the second data, the partition information, and the first step indication information, and sends the second data storage node to the second data storage node. Two data information and the second step indication information, the partition information is used to indicate a storage location of the data, and the second step indication information is used to indicate that the second data storage node needs to perform a second operation step in the The location in the plurality of operation steps included in the target storage process, the second data information being indication information of the second data or the second data.
  2. 如权利要求1所述的方法,其特征在于,The method of claim 1 wherein
    当所述第一数据存储节点为输入节点时,所述第一数据为所述输入节点基于所述目标存储过程的调用请求获取的输入数据,所述输入节点为所述分布式数据库中接收到所述目标存储过程的调用请求的数据存储节点;When the first data storage node is an input node, the first data is input data acquired by the input node based on a call request of the target storage process, and the input node is received in the distributed database. a data storage node of the call request of the target stored procedure;
    当所述第一数据存储节点为中间节点时,所述第一数据为所述中间节点基于接收的数据信息对数据进行存储过程处理得到的数据处理结果,所述中间节点是指用于执行所述目标存储过程包括的多个运算步骤中的任一运算步骤的数据存储节点。When the first data storage node is an intermediate node, the first data is a data processing result obtained by the intermediate node performing a storage process on the data based on the received data information, where the intermediate node is used to execute the A data storage node of any of the plurality of operational steps included in the target stored procedure.
  3. 如权利要求1或2所述的方法,其特征在于,所述目标数据存储节点基于所述第一数据信息、所述第一步骤指示信息和存储的所述目标存储过程的拓扑信息,对所述第一数据进行存储过程处理,得到第二数据,包括:The method according to claim 1 or 2, wherein the target data storage node is based on the first data information, the first step indication information, and the stored topology information of the target storage process. The first data is processed by the stored procedure to obtain the second data, including:
    所述目标数据存储节点基于所述第一数据信息确定所述第一数据;Determining, by the target data storage node, the first data based on the first data information;
    所述目标数据存储节点基于所述第一步骤指示信息和所述目标存储过程的拓扑信息,确定所述目标数据存储节点需要执行的第一运算步骤;Determining, by the target data storage node, a first operation step that the target data storage node needs to perform, based on the first step indication information and topology information of the target storage process;
    所述目标数据存储节点基于所述第一数据执行所述第一运算步骤,得到所述第二数据。The target data storage node performs the first operation step based on the first data to obtain the second data.
  4. 如权利要求3所述的方法,其特征在于,所述目标数据存储节点基于所述 第一数据信息确定所述第一数据,包括:The method of claim 3, wherein the determining, by the target data storage node, the first data based on the first data information comprises:
    当所述第一数据信息为所述第一数据的指示信息时,所述目标数据存储节点基于所述第一数据的指示信息从存储的数据中获取所述第一数据。When the first data information is the indication information of the first data, the target data storage node acquires the first data from the stored data based on the indication information of the first data.
  5. 如权利要求1-4任一所述的方法,其特征在于,所述方法还包括:The method of any of claims 1-4, wherein the method further comprises:
    所述目标数据存储节点接收所述第一数据存储节点发送的所述目标存储过程的标识,Receiving, by the target data storage node, an identifier of the target storage process sent by the first data storage node,
    基于所述目标存储过程的标识,获取所述目标存储过程的拓扑信息;和/或,所述目标数据存储节点向所述第二数据存储节点发送所述目标存储过程的标识。Acquiring topology information of the target storage process based on the identifier of the target storage process; and/or, the target data storage node sending the identifier of the target storage process to the second data storage node.
  6. 如权利要求1-5任一所述的方法,其特征在于,A method according to any of claims 1-5, wherein
    当所述第一步骤指示信息用于指示所述第一运算步骤为所述目标存储过程包括的多个运算步骤中除最后一个运算步骤之外的其他运算步骤时,所述第二步骤指示信息用于指示所述第一运算步骤的下一个运算步骤;When the first step indication information is used to indicate that the first operation step is another operation step other than the last operation step among the plurality of operation steps included in the target storage process, the second step indication information a next operational step for indicating the first operational step;
    当所述第一步骤指示信息用于指示所述第一运算步骤为所述目标存储过程包括的多个运算步骤中的最后一个运算步骤时,所述第二步骤指示信息用于指示输出步骤,所述输出步骤用于指示将接收到的数据作为所述目标存储过程的输出数据进行输出。When the first step indication information is used to indicate that the first operation step is the last one of the plurality of operation steps included in the target storage process, the second step indication information is used to indicate an output step, The outputting step is for indicating that the received data is output as output data of the target storage process.
  7. 如权利要求1-6任一所述的方法,其特征在于,所述目标数据存储节点基于所述第二数据、分区信息和所述第一步骤指示信息,确定所述第二数据存储节点和第二步骤指示信息,包括:The method according to any one of claims 1 to 6, wherein the target data storage node determines the second data storage node and based on the second data, the partition information, and the first step indication information. The second step indicates information, including:
    所述目标数据存储节点基于所述第二数据和分区信息,从所述分布式数据库中确定预先存储有所述第二数据的数据存储节点;Determining, by the target data storage node, a data storage node pre-stored with the second data from the distributed database based on the second data and partition information;
    所述目标数据存储节点将预先存储有所述第二数据的数据存储节点,确定为所述第二数据存储节点。The target data storage node determines a data storage node in which the second data is stored in advance as the second data storage node.
  8. 一种存储过程的执行装置,其特征在于,应用于分布式数据库中的目标数据存储节点;所述分布式数据库包括的多个数据存储节点用于执行目标存储过程,所述多个数据存储节点中的第一数据存储节点、所述目标数据存储节点和第二数据存储节点为顺序执行所述目标存储过程的三个数据存储节点;所述装置包括:An execution device for a stored procedure, characterized by being applied to a target data storage node in a distributed database; the distributed database includes a plurality of data storage nodes for executing a target storage process, the plurality of data storage nodes The first data storage node, the target data storage node, and the second data storage node are three data storage nodes that sequentially execute the target storage process; the device includes:
    接收模块,用于接收第一数据存储节点发送的第一数据信息和第一步骤指示信息,所述第一数据信息为第一数据或者所述第一数据的指示信息,所述第一数据为所述目标存储过程被执行时所需的数据,所述第一步骤信息用于指示所述目标数据存储节点需要执行的第一运算步骤在所述目标存储过程包括的多个运算步骤中的位置;a receiving module, configured to receive first data information and first step indication information sent by the first data storage node, where the first data information is first data or indication information of the first data, where the first data is Data required when the target stored procedure is executed, the first step information is used to indicate a position of the first operation step that the target data storage node needs to perform in a plurality of operation steps included in the target storage process ;
    处理模块,用于基于所述第一数据信息、所述第一步骤指示信息和存储的所述目标存储过程的拓扑信息,对所述第一数据进行存储过程处理,得到第二数据,所述目标存储过程的拓扑信息用于指示所述目标存储过程包括的多个运算步骤和 所述多个运算步骤的执行顺序;a processing module, configured to perform a storage process on the first data to obtain second data, according to the first data information, the first step indication information, and the stored topology information of the target storage process, where The topology information of the target storage process is used to indicate a plurality of operation steps included in the target storage process and an execution sequence of the multiple operation steps;
    确定模块,用于基于所述第二数据、存储的分区信息和所述第一步骤指示信息,确定第二数据存储节点和第二步骤指示信息,所述分区信息用于指示数据的存储位置,所述第二步骤指示信息用于指示所述第二数据存储节点需要执行的第二运算步骤在所述目标存储过程包括的多个运算步骤中的位置;a determining module, configured to determine, according to the second data, the stored partition information, and the first step indication information, a second data storage node and second step indication information, where the partition information is used to indicate a storage location of the data, The second step indication information is used to indicate a location of the second operation step that the second data storage node needs to perform in a plurality of operation steps included in the target storage process;
    发送模块,用于向所述第二数据存储节点发送第二数据信息和所述第二步骤指示信息,所述第二数据信息为所述第二数据或者所述第二数据的指示信息。And a sending module, configured to send second data information and the second step indication information to the second data storage node, where the second data information is indication information of the second data or the second data.
  9. 如权利要求8所述的装置,其特征在于,The device of claim 8 wherein:
    当所述第一数据存储节点为输入节点时,所述第一数据为所述输入节点基于所述目标存储过程的调用请求获取的输入数据,所述输入节点为所述分布式数据库中接收到所述目标存储过程的调用请求的数据存储节点;When the first data storage node is an input node, the first data is input data acquired by the input node based on a call request of the target storage process, and the input node is received in the distributed database. a data storage node of the call request of the target stored procedure;
    当所述第一数据存储节点为中间节点时,所述第一数据为所述中间节点基于接收的数据信息对数据进行处理得到的数据处理结果,所述中间节点是指用于执行所述目标存储过程包括的多个运算步骤中的任一运算步骤的数据存储节点。When the first data storage node is an intermediate node, the first data is a data processing result obtained by the intermediate node processing the data based on the received data information, where the intermediate node is used to execute the target A data storage node of any of the plurality of operational steps included in the stored procedure.
  10. 如权利要求8或9所述的装置,其特征在于,所述处理模块包括:The device according to claim 8 or 9, wherein the processing module comprises:
    第一确定单元,用于基于所述第一数据信息确定所述第一数据;a first determining unit, configured to determine the first data based on the first data information;
    第二确定单元,用于基于所述第一步骤指示信息和所述目标存储过程的拓扑信息,确定所述目标数据存储节点需要执行的第一运算步骤;a second determining unit, configured to determine, according to the first step indication information and topology information of the target storage process, a first operation step that the target data storage node needs to perform;
    执行单元,用于基于所述第一数据执行所述第一运算步骤,得到所述第二数据。And an execution unit, configured to perform the first operation step based on the first data to obtain the second data.
  11. 如权利要求8-10任一所述的装置,其特征在于,所述第一确定单元用于:The apparatus according to any one of claims 8 to 10, wherein the first determining unit is configured to:
    当所述第一数据信息为所述第一数据的指示信息时,基于所述第一数据的指示信息,从所述目标数据存储节点存储的数据中获取所述第一数据。And when the first data information is the indication information of the first data, acquiring the first data from the data stored by the target data storage node based on the indication information of the first data.
  12. 如权利要求8-11任一所述的装置,其特征在于,A device according to any of claims 8-11, wherein
    所述接收模块,还用于接收所述第一数据存储节点发送的所述目标存储过程的标识;The receiving module is further configured to receive an identifier of the target storage process sent by the first data storage node;
    所述处理模块,还用于基于所述目标存储过程的标识,获取所述目标存储过程的拓扑信息;和/或The processing module is further configured to acquire topology information of the target storage process based on the identifier of the target storage process; and/or
    所述发送模块,还用于向所述第二数据存储节点发送所述目标存储过程的标识。The sending module is further configured to send the identifier of the target stored procedure to the second data storage node.
  13. 如权利要求8-12任一所述的装置,其特征在于,A device according to any of claims 8-12, wherein
    当所述第一步骤指示信息用于指示所述第一运算步骤为所述目标存储过程包括的多个运算步骤中除最后一个运算步骤之外的其他运算步骤时,所述下一个位置用于指示所述第一运算步骤的下一个运算步骤;When the first step indication information is used to indicate that the first operation step is another operation step other than the last operation step among the plurality of operation steps included in the target storage process, the next location is used for Instructing a next operational step of the first operational step;
    当所述第一步骤指示信息用于指示所述第一运算步骤为所述目标存储过程包 括的多个运算步骤中的最后一个运算步骤时,所述下一个位置用于指示输出步骤,所述输出步骤用于指示将接收到的数据作为所述目标存储过程的输出数据进行输出。When the first step indication information is used to indicate that the first operation step is the last one of the plurality of operation steps included in the target storage process, the next position is used to indicate an output step, The outputting step is for instructing to output the received data as output data of the target stored procedure.
  14. 如权利要求8-13任一所述的装置,其特征在于,所述确定模块包括:The device of any of claims 8-13, wherein the determining module comprises:
    第三确定单元,用于基于所述第二数据和存储的分区信息,从所述分布式数据库中确定预先存储有所述第二数据的数据存储节点;a third determining unit, configured to determine, according to the second data and the stored partition information, a data storage node that stores the second data in advance from the distributed database;
    第四确定单元,用于将预先存储有所述第二数据的数据存储节点,确定为所述第二数据存储节点。And a fourth determining unit, configured to determine, as the second data storage node, a data storage node that stores the second data in advance.
  15. 一种存储过程的执行装置,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器被配置为执行权利要求1-7所述的任一项方法的步骤。An execution device for a stored procedure, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor is configured to perform the claims 1-7 The steps of any of the methods described.
  16. 一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如权利要求1-7任意一项所述的方法。A computer readable storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1-7.
PCT/CN2018/087384 2017-09-27 2018-05-17 Storage procedure executing method and device, and storage medium WO2019062156A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710892803.6 2017-09-27
CN201710892803.6A CN107729421B (en) 2017-09-27 2017-09-27 The execution method, apparatus and storage medium of storing process

Publications (1)

Publication Number Publication Date
WO2019062156A1 true WO2019062156A1 (en) 2019-04-04

Family

ID=61207446

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/087384 WO2019062156A1 (en) 2017-09-27 2018-05-17 Storage procedure executing method and device, and storage medium

Country Status (2)

Country Link
CN (1) CN107729421B (en)
WO (1) WO2019062156A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729421B (en) * 2017-09-27 2019-11-15 华为技术有限公司 The execution method, apparatus and storage medium of storing process
CN111611251B (en) * 2020-04-24 2021-06-29 华智众创(北京)投资管理有限责任公司 Data processing system
CN116089823B (en) * 2023-03-29 2023-06-20 成都信息工程大学 Intelligent community visual real-time supervision method based on big data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1399209A (en) * 2001-07-20 2003-02-26 华为技术有限公司 Parallel distributed-data base processing method and device
US6636855B2 (en) * 2001-03-09 2003-10-21 International Business Machines Corporation Method, system, and program for accessing stored procedures in a message broker
CN102523249A (en) * 2011-11-24 2012-06-27 哈尔滨工业大学 Distributed long-distance simulation system and simulation method based on Web
CN102955801A (en) * 2011-08-25 2013-03-06 中兴通讯股份有限公司 Data control method and data control system based on distributed database system
US8892599B2 (en) * 2012-10-24 2014-11-18 Marklogic Corporation Apparatus and method for securing preliminary information about database fragments for utilization in mapreduce processing
CN106250566A (en) * 2016-08-31 2016-12-21 天津南大通用数据技术股份有限公司 A kind of distributed data base and the management method of data operation thereof
CN107729421A (en) * 2017-09-27 2018-02-23 华为技术有限公司 The execution method, apparatus and storage medium of storing process

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9128990B2 (en) * 2013-03-15 2015-09-08 Microsoft Technology Licensing, Llc Executing stored procedures at parallel databases
CN105516367B (en) * 2016-02-02 2018-02-13 北京百度网讯科技有限公司 Distributed data-storage system, method and apparatus

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6636855B2 (en) * 2001-03-09 2003-10-21 International Business Machines Corporation Method, system, and program for accessing stored procedures in a message broker
CN1399209A (en) * 2001-07-20 2003-02-26 华为技术有限公司 Parallel distributed-data base processing method and device
CN102955801A (en) * 2011-08-25 2013-03-06 中兴通讯股份有限公司 Data control method and data control system based on distributed database system
CN102523249A (en) * 2011-11-24 2012-06-27 哈尔滨工业大学 Distributed long-distance simulation system and simulation method based on Web
US8892599B2 (en) * 2012-10-24 2014-11-18 Marklogic Corporation Apparatus and method for securing preliminary information about database fragments for utilization in mapreduce processing
CN106250566A (en) * 2016-08-31 2016-12-21 天津南大通用数据技术股份有限公司 A kind of distributed data base and the management method of data operation thereof
CN107729421A (en) * 2017-09-27 2018-02-23 华为技术有限公司 The execution method, apparatus and storage medium of storing process

Also Published As

Publication number Publication date
CN107729421B (en) 2019-11-15
CN107729421A (en) 2018-02-23

Similar Documents

Publication Publication Date Title
US11036754B2 (en) Database table conversion
KR102415845B1 (en) Internet of Things Resource Subscription Methods, Devices, and Systems
WO2017088358A1 (en) Distributed database processing method and device
US20170161291A1 (en) Database table conversion
CN105740048A (en) Image management method, device and system
WO2018040722A1 (en) Table data query method and device
WO2020164290A1 (en) Policy control method, apparatus, and system
WO2015062444A1 (en) System and method for creating a distributed transaction manager supporting repeatable read isolation level in a mpp database
WO2019153488A1 (en) Service configuration management method, apparatus, storage medium and server
KR100671506B1 (en) A mobile middleware and a method for processing business logic using it
US20230362251A1 (en) Method and apparatus for managing iot device, and server and storage medium thereof
CN105681477B (en) A kind of data access method and a kind of server
US11210277B2 (en) Distributing and processing streams over one or more networks for on-the-fly schema evolution
WO2019062156A1 (en) Storage procedure executing method and device, and storage medium
WO2020215752A1 (en) Graph computing method and device
WO2023040432A1 (en) Data query method, apparatus, and multi-party secure database
WO2020253344A1 (en) Authorization control method and apparatus, and storage medium
CN107302849A (en) The distribution method and device of a kind of light path
US20200334080A1 (en) Systems and methods for recomputing services
WO2023029485A1 (en) Data processing method and apparatus, computer device, and computer-readable storage medium
US20220101962A1 (en) Enabling distributed semantic mashup
CN113439418A (en) Method, system, terminal and storage medium for changing resource state
JP2011186695A (en) Transmission information control apparatus, method and program
CN116149847A (en) Distributed computing function supporting method for onboard embedded system
CN116450725A (en) Method, apparatus, electronic device, and medium for performing database operations

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18863182

Country of ref document: EP

Kind code of ref document: A1