CN113420041A

CN113420041A - Data processing method, device, equipment and medium in distributed database

Info

Publication number: CN113420041A
Application number: CN202010732669.5A
Authority: CN
Inventors: 王欢明
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-07-27
Filing date: 2020-07-27
Publication date: 2021-09-21

Abstract

The embodiment of the invention provides a data processing method and device in a distributed database, electronic equipment and a computer storage medium. The data processing method in the distributed database is applied to the coordination nodes, and the method comprises the following steps: acquiring a set of Structured Query Language (SQL) statements of a to-be-executed storage process, wherein the SQL statements in the set all contain data distribution information; determining data distribution information in each SQL statement in the set; grouping the SQL sentences in the set according to the data distribution information to obtain a plurality of groups of SQL sentences; and aiming at any group of SQL sentences, determining a storage node corresponding to the group of SQL sentences, and sending the group of SQL sentences to the corresponding storage node to execute the storage process. According to the embodiment of the invention, the execution efficiency of the storage process is improved.

Description

Data processing method, device, equipment and medium in distributed database

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a data processing method and device in a distributed database, electronic equipment and a computer storage medium.

Background

Stored procedures are often used in distributed databases to achieve efficient batch data processing. A Stored Procedure (Stored Procedure) is a programmable function that is created in a database and compiled and Stored in the database, and calls a corresponding function that the Stored Procedure can implement in the batch data processing.

In a conventional distributed database, a stored procedure is implemented, and usually at a database coordinating node, an interpretation engine of the stored procedure is used to interpret and execute each Structured Query Language (SQL) statement. For example, when an SQL query is encountered, each SQL query statement needs to be parsed, optimized, and executed using an SQL engine. If distributed SQL query of a plurality of nodes is involved, each query statement needs to be specifically optimized according to the distribution of data in a cluster in the optimization and execution stage, so that each query statement needs to be distributed to a database storage node to execute calculation, namely, data interaction needs to be carried out on a coordinating node and the storage node based on a storage process.

In this case, since each SQL statement needs to be parsed and optimized and then distributed to the corresponding storage node, when a batch SQL command is executed, the execution overhead and the network overhead between the coordinating node and the storage node are significant.

Disclosure of Invention

In view of the above, embodiments of the present invention provide a data processing scheme in a distributed database to at least partially solve the above problems.

According to a first aspect of the embodiments of the present invention, a data processing method in a distributed database is provided, which is applied to a coordinating node, and the method includes:

acquiring a set of Structured Query Language (SQL) statements of a to-be-executed storage process, wherein the SQL statements in the set all contain data distribution information;

determining data distribution information in each SQL statement in the set;

grouping the SQL sentences in the set according to the data distribution information to obtain a plurality of groups of SQL sentences;

and aiming at any group of SQL sentences, determining the storage node corresponding to the group of SQL sentences, and sending the group of SQL sentences to the corresponding storage node for execution.

According to a second aspect of the embodiments of the present invention, there is provided a data processing apparatus in a distributed database, which is applied in a coordinating node, the apparatus including:

the acquisition module is used for acquiring a set of Structured Query Language (SQL) sentences of a storage process to be executed, wherein the SQL sentences in the set all contain data distribution information;

the determining module is used for determining data distribution information in each SQL statement in the set;

the grouping module is used for classifying the SQL sentences in the set according to the data distribution information to obtain a plurality of groups of SQL sentences;

and the sending module is used for determining the storage node corresponding to any group of SQL statements and sending the group of SQL statements to the corresponding storage node for execution.

According to a third aspect of embodiments of the present invention, there is provided an electronic apparatus, including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the corresponding operation of the data processing method according to the first aspect.

According to a fourth aspect of embodiments of the present invention, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the data processing method according to the first aspect.

According to the data processing scheme provided by the embodiment of the invention, when SQL sentences which need to execute the storage process in batches are processed, the coordination node does not need to execute the actual SQL sentences any more, but only needs to decompose and accurately push down the batch SQL sentences to the storage node to execute the storage process, so that the cost of SQL sentence processing in the coordination node is eliminated, and the processing efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and it is also possible for a person skilled in the art to obtain other drawings based on the drawings.

FIG. 1 is a diagram illustrating a distributed database system architecture in the related art;

fig. 2 is a schematic flowchart of a data processing method in a distributed database according to an embodiment of the present invention;

FIG. 3A is a block diagram of a system architecture according to an embodiment of the present invention;

fig. 3B is a schematic view of a shopping mall according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a data processing apparatus in a distributed database according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.

In a distributed database system, a coordinating node and a storage node are generally included. As shown in fig. 1, fig. 1 is a schematic diagram of a distributed database system architecture in the related art. In a distributed database system, a coordinating node is mostly used for lightweight operations such as request forwarding, request response processing, and the like, generally speaking, the coordinating node itself does not store specific application data, and the specific application data is still stored and executed in each storage node.

Under the system architecture shown in fig. 1, if a batch of SQL statements needs to execute a storage process, a common method is to call a pre-created storage process at a coordinating node to parse, optimize and execute each SQL statement, then distribute each SQL statement to a database storage node to perform computation, and in the architecture of the distributed database system, the coordinating node and the storage node still need to interact again for optimization and execution. Resulting in significant overhead for executing SQL statements on the coordinating node and network overhead between the coordinating node and the storage nodes. Based on this, the embodiment of the present invention provides a data processing scheme in a distributed database to implement efficient data processing.

The following further describes a specific implementation of the data processing scheme provided by the embodiment of the present invention with reference to the accompanying drawings. As shown in fig. 2, fig. 2 is a schematic flowchart of a data processing method in a distributed database according to an embodiment of the present invention, including:

s201, acquiring a set of Structured Query Language (SQL) statements of a storage process to be executed, wherein the SQL statements in the set all contain data distribution information.

The specific SQL statement included in the set of structured query language SQL statements to be subjected to the storage procedure may be determined according to the actual processing scenario. A collection can be considered as a whole, i.e., can be considered as all SQL statements involved in a batch process. The description of the storage process is as described above, and is not repeated here.

For example, in the e-commerce trading scenario, SQL statements corresponding to all trades included in the same trade order may be determined as a set; or, SQL statements corresponding to transaction orders submitted by the same merchant within a certain time may be determined as a set.

The SQL statements may contain SQL statements of multiple operation types, such as update, delete, query, add, and so on, for various operations. Generally, the operation types of SQL statements processed by the same batch in the same set are generally the same.

In the embodiment of the present invention, each SQL statement includes data distribution information, where the data distribution information is used to indicate which table or which node a SQL statement executes.

For example, in one SQL statement for updating, the SQL statement "update data table a1 set merchant name" B1 where business name "C1". In the SQL statement, the "commodity name" field in the "data table" and the where condition word is a designated field, and the respective values "a 1" and "C1" are data distribution information of the SQL statement, it is explicitly indicated by "a 1" that the information carried by the SQL statement should be executed based on the "a 1" table, and it is explicitly indicated by "C1" that the operation should be executed in the "C1" field in the "a 1 table.

S203, determining data distribution information in each SQL statement in the set.

As mentioned above, the SQL statements in the set each contain data distribution information. Based on different practical applications, the positions of the data distribution information in the SQL statements may also be different, and thus, the corresponding acquisition manners may also be different.

In one embodiment, the data distribution information of an SQL statement may be obtained from a specified field of the SQL statement. For example, a value of a specified field in the SQL statement (e.g., a data table name, a field name in the data table such as a business name, a commodity name, etc.) is determined as the data distribution information of the SQL statement.

In one embodiment, when the operation types of the SQL statements in the set are all the same, and at least one of the table name, the field name, and the storage node name has a predetermined specification, an offset may be specified from the beginning of the SQL statement, and a character string in the specified offset may be determined as the data distribution information of the SQL statement. The predetermined rule can be set by those skilled in the art as appropriate according to actual requirements, and if the table names are all set to 5 characters, for example, if the 10 th character to the 15 th character from the beginning of an SQL statement is data distribution information, the offset can be specified to be from the 10 th character to the 15 th character from the beginning.

In one embodiment, the data distribution information of the SQL statement may also be determined according to the specified identification characters in the SQL statement. Wherein, the appointed identification character is different from the existing character in the SQL specification. For example, the "$" in an SQL statement is used as a designated identification character, and a character string between two designated identification characters is determined as the data distribution information of the SQL statement.

S205, grouping the SQL sentences in the set according to the data distribution information to obtain a plurality of groups of SQL sentences.

Specifically, in the grouping process, SQL statements containing the same data distribution information should be assigned to the same group. In practical application, the SQL statements can be grouped according to data distribution information in various ways. Different from the operation type of the SQL statement, in this step, the SQL statement performs grouping processing according to the data distribution information, which can be used as a basis for subsequently determining the storage node, so that the determined storage node processes the group of SQL statements.

For example, in one embodiment, the coordinating node may perform a routing computation on the data distribution information of any SQL statement, thereby grouping classes according to the result of the routing computation. One way of route calculation may be to hash the data distribution information of the SQL statements, then take the remainder, and determine the SQL statements with the same remainder as the same group.

For another example, in one embodiment, the coordinating node may determine, for any SQL statement, the semantics of the data distribution information of the SQL statement; and determining SQL sentences with the same semantics of the data distribution information into the same group. For example, for three different SQL statements, the data distribution information for statement 1 is "article 1", the data distribution information for statement 2 is "article 2", and the data distribution information for statement 3 is "article 3". At this time, the coordinating node may perform corresponding semantic analysis on the three data distribution information, so as to obtain that the data distribution information in statement 1 and statement 2 both include the semantic "sports brand", and the data distribution information in statement 3 is "snack brand". So that statements 1 and 2 can be determined as the same group and statement 3 as the other group.

Obviously, one SQL statement can and can only be grouped into one group, i.e., the same SQL statement does not appear in different groups.

S207, aiming at any group of SQL statements, determining a storage node corresponding to the group of SQL statements, and sending the group of SQL statements to the corresponding storage node for execution.

The manner in which the corresponding storage nodes are determined may be different based on the manner in which the groups are determined. For example, in the case of determining the group with the remainder, the remainder may be used to determine the storage node corresponding to the SQL statement of the group at the same time. When the batch SQL sentences are processed in the embodiment, due to the randomness of the hash value, a plurality of SQL sentences can be distributed to each storage node to be executed in a balanced manner, and the load balance of each storage node is ensured.

In the manner of semantically determining the groupings, the semantics of an SQL statement may also be used to determine the corresponding storage node. For example, in a distributed database, since there are a plurality of distributed nodes, in practical applications, it may happen that a certain storage node or a certain part of the storage nodes is designed to store certain types of data exclusively. For example, a portion of the storage nodes may be used to store sports related data tables, while another portion of the storage nodes may be used to store game related data tables.

Therefore, at this time, the correspondence between semantics and storage nodes may be stored in advance in the coordinating node, so that it may be directly determined to which storage node or which part of storage node the SQL statement of a group should be pushed to execute according to the correspondence between semantics and storage nodes stored in advance.

As shown in fig. 3A, fig. 3A is a schematic diagram of a system architecture provided in the embodiment of the present disclosure. In the schematic diagram, the SQL sentences in the batch are respectively divided into a plurality of groups, and the SQL sentences in each group can be processed in a single mode in the storage nodes.

After the storage node receives the SQL statement pushed down to the storage node by the coordinating node, since the storage process executed by the storage node is a storage process that can be executed by a single machine and does not need to be executed in a distributed manner, the SQL processing overhead on the storage node can be optimized by using the conventional schemes of SQL caching, query plan caching, and the like. Of course, in order to ensure data consistency in the distributed database, distributed transactions may be performed on the storage nodes.

In one embodiment, when pushing down the SQL statements in a certain group to the storage node, the coordinating node may push down the SQL statements of the whole group as a whole in a packaged manner. Namely, a set containing the SQL sentences of the group is generated, and the whole set is sent to the storage node corresponding to the group for execution. The push-down frequency of the coordination node can be reduced through the packed push-down, and the system overhead is reduced.

Optionally, when the coordinating node performs the packed push-down SQL statement, in order to avoid the data size in one package being too large, the upper limit of the statement in one package may be predefined. Acquiring a quantity threshold of SQL sentences which can be contained in a predetermined packet; when the number of the SQL sentences of the group exceeds the number threshold, generating a plurality of subsets which contain the SQL sentences of the group and do not exceed the number threshold, wherein any two intersections in the plurality of subsets are empty. The number threshold may be set by a person skilled in the art according to actual needs, and the embodiment of the present invention is not limited thereto.

For example, if the number of SQL statements in a group is 100, and the threshold of the number of SQL statements that can be included in a predetermined subset is 20, then 100 SQL statements may be divided equally into 5 non-repeating packets, 20 in each packet, or 10 non-repeating packets, and 10 in each packet are packed and pushed down, so as to avoid the influence on network transmission caused by too many SQL statements pushed down at one time.

In one embodiment, the set of SQL statements needs to be executed as a whole because of the structured query language to be processed. Then the coordinating node needs to determine the execution result of the storage node for the stored procedure of each SQL statement in the set, for example, the storage node may send notification information containing the execution result to the coordinating node during the execution process. Therefore, the coordination node can accurately know the execution condition of each SQL statement on each storage node, and if any SQL statement fails to be executed, the SQL statement should be regarded as the storage process of the whole set fails to be executed. That is, at this time, the coordinating node should issue a rollback instruction to each storage node executing the SQL statements in the set, where the rollback instruction is used to instruct any storage node to rollback each received SQL statement in the set. I.e., rolled back to a pre-execution state, ensuring atomicity of process processing for the entire collection store.

To make the solution of the present application more obvious, a specific embodiment is given below with reference to an application scenario, as shown in fig. 3B, and fig. 3B is a schematic diagram related to a shopping mall provided by an embodiment of the present invention.

When a user makes a purchase through the merchant platform, it is possible to make a purchase of goods through the shopping cart at a plurality of merchants. For example, beef jerky and shoes are purchased at merchant a, and chestnuts are purchased at merchant B. After ordering via the shopping cart, a trade order is formed containing 3 trades, which may be briefly described as follows: trade 1 (merchant a, beef jerky), trade 2 (merchant a, shoes), and trade 3 (merchant B, chestnut).

The trading order is processed through a platform service end, the platform service end forms 3 SQL sentences which are in one-to-one correspondence with the trading, and each SQL sentence correspondingly comprises related information in the trading. For example, in SQL1, the values of different fields are "merchant a" and "beef jerky", respectively, and the e-commerce platform forwards the corresponding three SQL statements to the coordination node of the database, and the coordination node needs to execute a certain storage process.

In the conventional scheme, the storage process needs to be sequentially executed for each SQL statement in the coordinating node, and data interaction with different nodes is also needed in the execution process.

In the embodiment of the present invention, firstly, grouping is performed according to the data distribution information included in each SQL statement, so that three SQL statements can be grouped and pushed down for execution.

For example, assuming that the values of the business fields are used as data distribution information and are grouped according to the semantics of the business fields, SQL1 and SQL2 both contain "business a", and these two pieces of SQL will be used as the same group. Further, SQL1 and 2 would be pushed down as a whole to the storage node for processing "merchant a" to perform the storage process, while SQL3 would be pushed down as another group to the storage node corresponding to "merchant B" to perform the storage process.

Assuming that the values of the commodity fields are used as data distribution information and are grouped according to the semantics of the values of the commodity fields, the semantics of the commodity fields of SQL1 and SQL3 are both "snacks" after being analyzed, the semantics of the commodity fields of SQL2 are "clothing", and SQL1 and SQL3 are to be used as the same group. Further, SQL1 and SQL3 are pushed down as a whole to the storage node for processing "snacks" to execute the storage process. And SQL2 will be pushed down as another group to the storage node corresponding to the "clothing" to perform the storage process.

After each storage node executes the storage process, the corresponding execution result may be sent to the coordinating node. And if the coordination node finds that the execution of any SQL statement is wrong or fails, sending a rollback instruction to each storage node corresponding to the set, so that each storage node receiving the SQL statement in the set can execute corresponding rollback. The consistency of the execution state of either the SQL in the whole set as a whole is guaranteed. That is, for the user, after ordering through the shopping cart, the ordering is either entirely successful or entirely failed, so that the e-commerce shopping experience of the user is improved.

Correspondingly, an embodiment of the present invention further provides a data processing apparatus in a distributed database, which is applied to a coordination node, as shown in fig. 4, where fig. 4 is a schematic structural diagram of the data processing apparatus in the distributed database provided in the embodiment of the present invention, and the apparatus includes:

the acquiring module 401 acquires a set of Structured Query Language (SQL) statements of a storage process to be executed, wherein the SQL statements in the set all include data distribution information;

a determining module 403, configured to determine data distribution information in each SQL statement in the set;

the grouping module 405 groups the SQL statements in the set according to the data distribution information to obtain a plurality of groups of SQL statements;

the sending module 407 determines, for any group of SQL statements, a storage node corresponding to the group of SQL statements, and sends the group of SQL statements to the corresponding storage node for execution.

Optionally, the determining module 403 determines, for any SQL statement, a value of a specified field in the SQL statement as data distribution information of the SQL statement.

Optionally, the grouping module 405 determines, for any SQL statement, semantics of data distribution information of the SQL statement; and determining SQL sentences with the same semantics of the data distribution information into the same group.

Optionally, the sending module 407 determines, for any group of SQL statements, a storage node corresponding to the group of SQL statements according to semantics of data distribution information of the group of SQL statements.

Optionally, the sending module 407 generates a subset including the group of SQL statements, and sends the whole subset to the storage node corresponding to the group for execution.

Optionally, the sending module 407 obtains a threshold of the number of SQL statements that may be included in the subset; when the number of the group of SQL statements exceeds the number threshold, generating a plurality of subsets which contain the group of SQL statements and do not exceed the number threshold, wherein any two intersections in the plurality of subsets are empty.

Optionally, the apparatus further includes a rollback module 409, which determines an execution result of the stored procedure of each SQL statement in the set by the storage node; and if the execution result of the storage node on the storage process of any SQL statement in the set is execution failure, issuing a rollback instruction to each storage node executing the SQL statement in the set, wherein the rollback instruction is used for indicating the storage node to rollback each received SQL statement in the set.

The data processing apparatus of this embodiment is configured to implement the corresponding data processing method in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again. In addition, the functional implementation of each module in the data processing apparatus of this embodiment can refer to the description of the corresponding part in the foregoing method embodiment, and is not repeated here.

Referring to fig. 5, a schematic structural diagram of an electronic device according to an embodiment of the present invention is shown, and the specific embodiment of the present invention does not limit the specific implementation of the electronic device.

As shown in fig. 5, the electronic device may include: a processor (processor)502, a Communications Interface 504, a memory 506, and a communication bus 508. Wherein:

the processor 502, communication interface 504, and memory 506 communicate with one another via a communication bus 508.

A communication interface 504 for communicating with other electronic devices or servers.

The processor 502 is configured to execute the program 510, and may specifically perform relevant steps in the above-described data processing method embodiments.

In particular, program 510 may include program code that includes computer operating instructions.

The processor 502 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 506 for storing a program 510. The memory 506 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 510 may specifically be used to cause the processor 502 to perform the following operations:

determining data distribution information in each SQL statement in the set;

and aiming at any group of SQL sentences, determining a storage node corresponding to the group of SQL sentences, and sending the group of SQL sentences to the corresponding storage node to execute the storage process.

In an alternative embodiment, the program 510 is further configured to cause the processor 502, in determining the data distribution information in each SQL statement in the set: and for any SQL statement, determining the value of a specified field in the SQL statement as the data distribution information of the SQL statement.

In an alternative embodiment, the specified field includes: a table name or a field in a conditional statement.

In an optional implementation, the program 510 is further configured to, when the processor 502 groups the SQL statements in the set according to the data distribution information to obtain multiple groups of SQL statements: determining the semantics of the data distribution information of any SQL statement in the set; and determining SQL sentences with the same semantics of the data distribution information into the same group.

In an alternative embodiment, the program 510 is further configured to enable the processor 502, when determining, for any group of SQL statements, a storage node corresponding to the group of SQL statements: and aiming at any group of SQL sentences, determining the storage nodes corresponding to the group of SQL sentences according to the semantics of the data distribution information of the group of SQL sentences.

In an alternative embodiment, the program 510 is further configured to cause the processor 502, when sending the set of SQL statements to the corresponding storage node to execute: and generating a subset containing the group of SQL statements, and sending the subset to the storage node corresponding to the group for execution.

In an alternative embodiment, the program 510 is further configured to cause the processor 502, in generating the subset containing the set of SQL statements, to: acquiring a quantity threshold of SQL sentences contained in the subset; when the number of the group of SQL statements exceeds the number threshold, generating a plurality of subsets which contain the group of SQL statements and do not exceed the number threshold, wherein any two intersections in the plurality of subsets are empty.

In an alternative embodiment, the program 510 is further configured to cause the processor 502 to determine the result of the execution of the stored procedure by the storage node for each SQL statement in the set; and if the execution result of the storage node on the storage process of any SQL statement in the set is execution failure, issuing a rollback instruction to each storage node executing the SQL statement in the set, wherein the rollback instruction is used for indicating the storage node to rollback each received SQL statement in the set.

For specific implementation of each step in the program 510, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing data processing method embodiments, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

Embodiments of the present specification also provide a computer-readable storage medium on which a computer program is stored, where the computer program is executed by a processor to implement the data processing method shown in fig. 2.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present invention may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present invention.

The above-described method according to an embodiment of the present invention may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein may be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the data processing generation methods described herein. Further, when a general-purpose computer accesses code for implementing the data processing method shown herein, execution of the code converts the general-purpose computer into a special-purpose computer for executing the data processing method shown herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.

The above embodiments are only for illustrating the embodiments of the present invention and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.

Claims

1. A data processing method in a distributed database is applied to a coordination node, and the method comprises the following steps:

determining data distribution information in each SQL statement in the set;

2. The method of claim 1, wherein determining data distribution information in each SQL statement in the set comprises:

and for any SQL statement, determining the value of a specified field in the SQL statement as the data distribution information of the SQL statement.

3. The method of claim 2, wherein the specified field comprises: a table name or a field in a conditional statement.

4. The method of claim 1, wherein grouping the SQL statements in the set according to the data distribution information to obtain a plurality of groups of SQL statements comprises:

determining the semantics of the data distribution information of any SQL statement in the set;

and determining SQL sentences with the same semantics of the data distribution information into the same group.

5. The method of claim 4, wherein determining, for any group of SQL statements, the storage node corresponding to the SQL statement of the category comprises:

and aiming at any group of SQL sentences, determining the storage nodes corresponding to the group of SQL sentences according to the semantics of the data distribution information of the group of SQL sentences.

6. The method of claim 1, wherein sending the set of SQL statements to the corresponding storage node for execution comprises:

and generating a subset containing the SQL statement of the group, and sending the subset to the storage node corresponding to the group for execution.

7. The method of claim 6, wherein generating the subset containing the set of SQL statements comprises:

acquiring a quantity threshold of SQL sentences contained in the subset;

when the number of the group of SQL statements exceeds the number threshold, generating a plurality of subsets which contain the group of SQL statements and do not exceed the number threshold, wherein any two intersections in the plurality of subsets are empty.

8. The method of claim 1, wherein the method further comprises:

determining an execution result of a storage node on a storage process of each SQL statement in the set;

and if the execution result of the storage node on the storage process of any SQL statement in the set is execution failure, issuing a rollback instruction to each storage node executing the SQL statement in the set, wherein the rollback instruction is used for indicating the storage node to rollback each received SQL statement in the set.

9. A data processing apparatus in a distributed database, applied in a coordinating node, the apparatus comprising:

the grouping module is used for grouping the SQL sentences in the set according to the data distribution information to obtain a plurality of groups of SQL sentences;

10. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the corresponding operation of the data processing method according to any one of claims 1-8.

11. A computer storage medium, on which a computer program is stored which, when being executed by a processor, carries out the data processing method of any one of claims 1 to 8.