CN117573730B

CN117573730B - Data processing method, apparatus, device, readable storage medium, and program product

Info

Publication number: CN117573730B
Application number: CN202410058084.8A
Authority: CN
Inventors: 谢灿扬; 郑礼雄; 潘安群; 雷海林; 伍鑫
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2024-01-16
Filing date: 2024-01-16
Publication date: 2024-04-05
Anticipated expiration: 2044-01-16
Also published as: CN117573730A

Abstract

Embodiments of the present application provide a data processing method, apparatus, device, readable storage medium, and program product, where the method includes: determining a database statement queue according to a data processing request, wherein the data processing request is used for requesting to process data stored in a data node of a distributed database system, and the database statement queue comprises a first number of database statements to be executed; determining a target database statement from a first number of database statements to be executed included in a database statement queue; the number of the target database sentences is a second number, and the second number of the target database sentences can be executed in parallel; determining target data nodes corresponding to all target database sentences; and transmitting the second number of target database sentences to the corresponding target data nodes in parallel. The method provided by the embodiment of the application can send a plurality of database sentences to the corresponding data nodes in parallel, thereby effectively reducing the data transmission delay and improving the data processing efficiency.

Description

Data processing method, apparatus, device, readable storage medium, and program product

Technical Field

The present application relates to the field of computer technology, and in particular, to a data processing method, a data processing apparatus, a computer device, a computer readable storage medium, and a computer program product.

Background

A distributed database is a database system that stores data in a decentralized manner across a plurality of physical or logical nodes. The distributed database has the main characteristics of expandability, high availability, fault tolerance, data consistency and multi-region deployment. The distributed database can expand the capacity and performance of the database by adding more nodes, and the database can still operate normally even if a certain node fails. The distributed database system generally comprises a coordination node and a data node, wherein the coordination node is used for responding to a data processing request of the terminal equipment and generating database sentences, and the data node is used for storing data and executing the database sentences.

After receiving a data processing request sent by a terminal device, the coordination node generally generates a plurality of database sentences to be executed, and sends a first database sentence to a corresponding data node, the data node executes the database sentence, generates an execution result, sends the execution result to the coordination node, and then sends a second database sentence to the corresponding data node. The one-by-one processing mode has low processing efficiency and high data transmission delay.

Disclosure of Invention

The embodiment of the application provides a data processing method, a device, equipment, a readable storage medium and a program product, which can send a plurality of database sentences to corresponding data nodes in parallel, thereby effectively reducing data transmission delay and improving data processing efficiency.

In one aspect, an embodiment of the present application provides a data processing method, where the method includes:

determining a database statement queue according to a data processing request, wherein the data processing request is used for requesting to process data stored in a data node, the database statement queue comprises a first number of database statements to be executed, and the data node is contained in a distributed database system;

determining a target database statement from a first number of database statements to be executed, which are included in the database statement queue; the number of the target database sentences is a second number, the second number is smaller than or equal to the first number, and the second number of the target database sentences can be executed in parallel;

determining target data nodes corresponding to the target database sentences from a plurality of data nodes included in the distributed database system;

And transmitting the second number of target database sentences to the corresponding target data nodes in parallel so that the target data nodes process the stored data according to the received target database sentences.

In one aspect, an embodiment of the present application provides a data processing apparatus, including:

the determining unit is used for determining a database statement queue according to a data processing request, wherein the data processing request is used for requesting to process data stored in a data node, the database statement queue comprises a first number of database statements to be executed, and the data node is contained in the distributed database system;

the processing unit is used for determining a target database statement from a first number of database statements to be executed, which are included in the database statement queue; the number of the target database sentences is a second number, the second number is smaller than or equal to the first number, and the second number of the target database sentences can be executed in parallel;

the determining unit is further configured to determine a target data node corresponding to each target database statement from a plurality of data nodes included in the distributed database system;

And the sending unit is used for sending the second number of target database sentences to the corresponding target data nodes in parallel so that the target data nodes process the stored data according to the received target database sentences.

In one aspect, the processing unit is specifically configured to, when determining the target database statement from the first number of database statements to be executed included in the database statement queue: sequentially determining a third number of database sentences to be analyzed from the database sentence queue according to the arrangement sequence of the database sentences to be executed in the database sentence queue, wherein the arrangement sequence of the third number of database sentences to be analyzed is continuous;

carrying out parallel analysis processing on the third number of database sentences to be analyzed, and determining the data dependency analysis result of each database sentence to be analyzed;

and determining a target database statement from a third number of database statements to be analyzed according to the data dependence analysis result.

In one aspect, the data-dependent analysis result is a first data-dependent analysis result or a second data-dependent analysis result, where the first data-dependent analysis result is used to indicate that the database statement to be analyzed has data dependency, and the second data-dependent analysis result is used to indicate that the database statement to be analyzed has no data dependency; the processing unit is specifically configured to, when determining a target database statement from a third number of database statements to be analyzed according to the data-dependent analysis result: if the third number of database sentences to be analyzed has the database sentences to be analyzed with the data dependency analysis result being the first data dependency analysis result, determining a first database sentence to be analyzed with the forefront arrangement sequence from the database sentences to be analyzed with the data dependency analysis result being the first data dependency analysis result;

And determining a second database statement to be analyzed, which is arranged in sequence before the first database statement to be analyzed, from the third number of database statements to be analyzed as a target database statement.

In one aspect, the processing unit is further configured to: if the data dependency analysis result of each database statement to be analyzed is the second data dependency analysis result, determining each database statement to be analyzed as a target database statement;

according to the arrangement sequence, determining a third number of new database sentences to be analyzed from the database sentences to be executed except for the first database sentences to be executed in the database sentence queue, wherein the first database sentences to be executed refer to the database sentences to be executed which are determined to be target database sentences;

carrying out parallel analysis processing on the third number of new database sentences to be analyzed, and determining the data dependent analysis result of each new database sentence to be analyzed;

and determining a target database statement from the third number of new database statements to be analyzed according to the data-dependent analysis result of each new database statement to be analyzed.

In one aspect, the processing unit is specifically configured to, when performing parallel analysis processing on the third number of database statements to be analyzed and determining a data-dependent analysis result of each database statement to be analyzed: determining whether a dependent database statement of a third database statement to be analyzed exists or not from database statements to be executed, wherein the arrangement sequence of the database statement queue is positioned before the third database statement to be analyzed, the third database statement to be analyzed is any database statement to be analyzed in the third number of database statements to be analyzed, and execution of the third database statement to be analyzed depends on an execution result of the dependent database statement;

if the dependent database statement of the third database statement to be analyzed exists and the execution result of the dependent database statement is not obtained, determining that the data dependent analysis result of the third database statement to be analyzed is the first data dependent analysis result.

In one aspect, the processing unit is further configured to: if a dependent database statement of the third database statement to be analyzed exists and an execution result of the dependent database statement is obtained, determining that a data dependent analysis result of the third database statement to be analyzed is the second data dependent analysis result; or,

And if the dependent database statement of the third database statement to be analyzed does not exist, determining the data dependent analysis result of the third database statement to be analyzed as the second data dependent analysis result.

In one aspect, the determining unit is specifically configured to, when determining, from a plurality of data nodes included in the distributed database system, a target data node corresponding to each target database statement: determining node description information of each data node in the distributed database system, wherein the node description information is used for describing data stored by the data node;

performing data analysis processing on each target database statement to obtain data demand information of each target database statement;

and carrying out matching processing on the data demand information of each target database statement and the node description information of each data node, and determining the target data node corresponding to each target database statement from the plurality of data nodes according to a matching result.

In one aspect, the data processing request includes a data processing identifier and a data processing parameter, the data processing identifier being used to indicate a type of processing performed on data stored in the distributed database system; the determining unit is specifically configured to, when determining a database statement queue according to a data processing request: determining a first number of initial database statements according to a data processing identifier in the data processing request;

And carrying out combination processing on the data processing parameters in the data processing request and the first number of initial database sentences to obtain a first number of database sentences to be executed, and adding the first number of database sentences to be executed into a database sentence queue.

In one aspect, embodiments of the present application provide a computer device, including: the data processing method comprises a processor, a communication interface and a memory, wherein the processor, the communication interface and the memory are connected with each other, executable program codes are stored in the memory, and the processor is used for calling the executable program codes to realize the data processing method provided by the embodiment of the application.

Accordingly, the embodiment of the application also provides a computer readable storage medium, wherein instructions are stored in the computer readable storage medium, when the computer readable storage medium runs on a computer, the computer is enabled to realize the data processing method provided by the embodiment of the application.

Accordingly, embodiments of the present application also provide a computer program product comprising a computer program or computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer program or the computer instructions from the computer readable storage medium, and the processor executes the computer program or the computer instructions, so that the computer device implements the data processing method provided by the embodiment of the application.

According to the data processing method provided by the embodiment of the application, the database statement queue can be determined according to the data processing request, so that the execution sequence among a plurality of database statements to be executed can be determined conveniently; the target database statement which can be executed in parallel can be determined from the first number of database statements to be executed, so that the parallel processing of the target database statement by the data node is facilitated, and the processing efficiency of the database statement is improved; the target data node corresponding to each target database statement can be determined from a plurality of data nodes, so that the subsequent target data node can conveniently execute the corresponding database statement, and an accurate execution result can be obtained; the target database sentences can be sent to the corresponding target database nodes in parallel, so that high time delay of data transmission caused by sending the database sentences one by one is avoided, the data transmission time delay is reduced, and meanwhile, the computing resources of the distributed database system are fully utilized, so that a plurality of target data nodes can execute the corresponding database sentences at the same time, and the data processing efficiency is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a database statement sending manner provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a system architecture of a data processing system according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a data processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of another database statement sending method according to an embodiment of the present application;

FIG. 5 is a flowchart of another data processing method according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a method for determining a target database statement according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a data processing method according to an embodiment of the present disclosure;

FIG. 8 is a block diagram of a data processing apparatus according to an embodiment of the present application;

fig. 9 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that the descriptions of "first," "second," and the like in the embodiments of the present application are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a technical feature defining "first", "second" may include at least one such feature, either explicitly or implicitly.

A distributed database is a database system that stores data in a decentralized manner across a plurality of physical or logical nodes. A distributed database system generally includes a coordination node and a plurality of data nodes, the coordination node is used for responding to a data processing request of a terminal device and generating a database statement to be executed, and the data nodes are used for storing data and executing the database statement. The data stored in different data nodes in the distributed database system are usually different, when data processing is performed, the coordination node generates a plurality of database sentences to be executed and sends one database sentence to be executed to the corresponding data node, the data node executes the database sentences and returns an execution result to the coordination node, and the coordination node sends the next database sentence to be executed to the corresponding data node after receiving the execution result.

Please refer to fig. 1, which is a schematic diagram of a database statement sending manner provided in an embodiment of the present application. FIG. 1 illustrates the manner in which a prior art distributed database system sends database statements: the coordination node sends a 'start' database statement to the data node, the data node processes according to the database statement and returns an execution result to the coordination node, the coordination node sends an 'update' database statement to the data node again, the data node processes according to the database statement and returns an execution result to the coordination node, and so on until the execution of the four database statements (namely, the 'start' database statement, the 'update' database statement, the 'modify' database statement and the 'promise' database statement) is completed. The calculation formula of the data processing time in the method is (statement execution time+data transmission delay) multiplied by the number of statement execution times, wherein the statement execution time is the time spent by a data node for executing database statements, the data transmission delay refers to the sum of the transmission delay from a coordination node to the data node of one database statement and the transmission delay from the data node to the coordination node of the execution result of the database statement (namely, t1+t2 in fig. 1), and the number of statement execution times is the number of database statements to be executed. Taking fig. 1 as an example, assuming that the statement execution time of four database statements is 0.001 seconds and the data transmission delay is 0.01 seconds, the method shown in fig. 1 is adopted, and the data processing time is Second. It is known that this method of sequentially transmitting database statements one by one results in a long time delay for data transmission, and thus results in low processing efficiency for data processing.

Based on this, the embodiment of the application provides a data processing method, which can determine a database statement queue according to a data processing request, where the data processing request is used to request to process data stored in a data node, the database statement queue includes a first number of database statements to be executed, and the data node is included in a distributed database system; determining a target database statement from a first number of database statements to be executed, which are included in a database statement queue; the number of the target database sentences is a second number, the second number is smaller than or equal to the first number, and the second number of the target database sentences can be executed in parallel; determining target data nodes corresponding to each target database statement from a plurality of data nodes included in the distributed database system; and transmitting the second number of target database sentences to the corresponding target data nodes in parallel so that the target data nodes process the stored data according to the received target database sentences. By the method provided by the embodiment of the application, the database sentences which can be executed in parallel can be sent to the data nodes in parallel, so that the data transmission delay is reduced, and the data processing efficiency is improved.

The data processing method provided by the embodiment of the application can be applied to the field of artificial intelligence. Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The data processing method provided by the embodiment of the application can be applied to the distributed storage in the artificial intelligence field, related data of an artificial intelligence technology can be stored in data nodes of a distributed database system, when the data stored in the data nodes need to be accessed, terminal equipment can send a data processing request to a coordination node in the distributed database system, and the coordination node can determine a database statement queue according to the data processing request, wherein the database statement queue comprises a first number of database statements to be executed; the coordination node can determine target database sentences which can be executed in parallel from the first number of database sentences to be executed, and determine target data nodes corresponding to the target database sentences, and the coordination node can send the target database sentences to the corresponding target data nodes in parallel, so that the data transmission delay is effectively reduced, and the data processing efficiency is improved.

The data processing method provided by the embodiment of the application can also be applied to the field of cloud storage. Cloud storage (cloud storage) is a new concept that extends and develops in the concept of cloud computing, and a distributed cloud storage system (hereinafter referred to as a storage system for short) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of various types in a network to work cooperatively through application software or application interfaces through functions such as cluster application, grid technology, and a distributed storage file system, so as to provide data storage and service access functions for the outside. The data processing method provided by the embodiment of the application can also be applied to a distributed cloud storage system, wherein the distributed cloud storage system comprises a coordination node and a cloud storage node, the cloud storage node is used for storing data, the coordination node can generate a plurality of database sentences to be executed, and a target database sentence which can be executed in parallel is determined from the plurality of database sentences to be executed; the coordination node can send the target database statement to the corresponding cloud storage node, so that the plurality of cloud storage nodes execute the corresponding database statement simultaneously, thereby reducing the data transmission delay, fully utilizing the computing resources of the distributed storage system and improving the data processing efficiency.

The architecture of a data processing system provided in embodiments of the present application will be described below with reference to the accompanying drawings.

Referring to fig. 2, the system architecture of a data processing system provided in the embodiment of the present application is shown, where the data processing system includes a terminal device 201 and a distributed database system 202, and the distributed database system 202 includes a coordination node 2021 and a plurality of data nodes (such as data node 1, data node 2, … shown in fig. 2, where N is an integer greater than 2), where the coordination node and each data node may perform data interaction, and the coordination node 2021 may perform data interaction with the terminal device 201. Wherein:

the terminal device 201 may interact with the object, receive an instruction input by the object, and generate a data processing request. The data processing request is for requesting processing of data stored in the distributed database system (e.g., processing of stored data such as deletion or verification). The terminal device 201 may send a data processing request to the distributed database system 202, or may receive processing result information returned by the distributed database system 202. The terminal device 201 may be, but is not limited to, a handheld device (e.g., a smart phone, a tablet computer), a computing device (e.g., a personal computer (Personal Computer, PC)), a vehicle-mounted terminal, a smart voice interaction device, a wearable device, or other smart apparatus, etc. with communication functions.

The distributed database system 202 may implement distributed storage of data, where the distributed database system 202 includes a coordination node 2021 and a plurality of data nodes, where:

the coordination node 2021 may receive the data processing request sent by the terminal device 201, and send processing result information corresponding to the data processing request to the terminal device 201, and may determine a database statement queue including database statements to be executed. The coordinating node 2021 stores therein node description information of each data node, which may indicate the data content stored in the data node, for example: table names of data tables in the data nodes, data attributes in the tables, data amounts and the like. The coordination node 2021 may send the database statement to be executed to the data node, and receive the execution result returned by the data node. The device corresponding to the coordination node 2021 may be different from the device corresponding to each data node, i.e., the coordination node 2021 may be disposed in a device different from the data node; the device corresponding to the coordination node 2021 may be the same as the device corresponding to a certain data node, that is, the coordination node 2021 may be disposed in the same device as a certain data node. The device corresponding to the coordination node may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms.

The data nodes may store data in a distributed database system, with the data stored in different data nodes typically being different. Each data node can receive database sentences to be executed, which are sent by the coordination node, and execute the database sentences to obtain an execution result, and the data node can send the execution result to the coordination node. The storage device corresponding to the data node may be a local database of the server corresponding to the coordination node 2021, or may be a cloud database (i.e. a database deployed in the cloud) associated with the server corresponding to the coordination node 2021, specifically may be deployed based on any one of a private cloud, a public cloud, a hybrid cloud, an edge cloud, and the like, so that the functions of the cloud database on which the emphasis is based are different. For example, the database deployed in the private cloud is a personal device of the user, and is more focused on serving a small part of the user, while the database deployed in the public cloud is deployed based on a cloud platform provided by a third party, so that data stored in the database can be shared, data of any user can be stored in the database, and data in the database can be used by any user.

The working principle of the data processing system as shown in fig. 2 will be explained in detail below:

the terminal device 201 may interact with the object to generate a data processing request for requesting processing of data stored in the data nodes of the distributed database system 202; the terminal device 201 sends the data processing request to the coordination node 2021 in the distributed database system 202, and the coordination node 2021 may determine a database statement queue according to the data processing request, where the database statement queue includes a first number of database statements to be executed; the coordination node 2021 may determine a target database statement from a first number of to-be-executed database statements included in the database statement queue, where the number of target database statements is a second number, and the second number is smaller than or equal to the first number, and the second number of target database statements may be executed in parallel; the coordination node 2021 may determine target data nodes corresponding to each target database statement from a plurality of data nodes included in the distributed database system, and if it is determined that the number of target databases is 5, the determined target data nodes are data node 1, data node 2 and data node N, the coordination node 2021 may send the second number of target database statements to the respective corresponding target data nodes in parallel, that is, send 5 target database statements to the corresponding data nodes at the same time, where the data transmission delay of the 5 target database statements is far less than the transmission delay of sequentially sending the 5 target database statements. After receiving the target database statement, the data node can execute the target database statement to realize the processing of the stored data and obtain an execution result; the data node transmits the execution result to the coordination node 2021; upon receiving the execution results of all the database statements to be executed in the database statement queue, the coordinator node 2021 determines processing result information and transmits the processing result information to the terminal device 201. By the data processing method provided by the embodiment of the application, the data transmission time delay can be effectively reduced, the computing resources of the distributed database are fully utilized, and the data processing efficiency is effectively improved.

It will be appreciated that the architecture diagram of the data processing system described in the embodiments of the present application is for more clearly describing the data processing method of the embodiments of the present application, and is not limited to the data processing method provided in the embodiments of the present application. For example, the data processing method provided by the embodiment of the present application may be performed by, in addition to the coordination node 2021, other devices that are different from the coordination node 2021 and that are capable of communicating with the terminal device 201 and the data node. Those of ordinary skill in the art will appreciate that the number of terminal devices 201, coordinator nodes, and data nodes in fig. 2 are merely illustrative. Any number of devices and nodes may be configured according to the service implementation needs. Moreover, with the evolution of the system architecture and the appearance of new service scenarios, the data processing method provided by the embodiment of the application is also applicable to similar technical problems.

It should be noted that, in the application, the collection and processing of the relevant data (such as the data processing request and the like) should be strictly based on the requirements of the relevant laws and regulations during the application of the example, obtain the informed consent or the independent consent of the personal information body, and develop the subsequent data use and processing behavior within the authorized range of the laws and regulations and the personal information body.

Referring to fig. 3, fig. 3 is a flow chart of a data processing method according to an embodiment of the present application. The data processing method may be implemented by the above-described coordination node 2021, or may be implemented by another device. The flow of the data processing method provided in the embodiment of the application includes, but is not limited to:

s301, determining a database statement queue according to a data processing request, wherein the data processing request is used for requesting to process data stored in a data node, the database statement queue comprises a first number of database statements to be executed, and the data node is contained in a distributed database system.

In the embodiment of the application, the distributed database system comprises a coordination node and a plurality of data nodes, wherein the coordination node can respond to a data processing request sent by the terminal equipment, can send a database statement to the data nodes, and the data nodes can store data and can execute the received database statement. The data stored in different data nodes in the distributed database system are not always identical, and node description information of each data node is stored in the coordination node, wherein the node description information is used for indicating the content of the data stored in the data node. The terminal device may send a data processing request to the coordinator node, the data processing request being for requesting processing of data stored in the data node (the processing may include addition/deletion/modification processing of data, etc.), for example: the data processing request may request data query processing of data stored in the data node. The coordination node can determine a database statement queue according to the data processing request, and the database statement queue can comprise a first number of database statements to be executed, wherein the database statements to be executed in the database statement queue are arranged according to the logical sequence of the statements. By the method provided by the embodiment of the application, the database statement queue can be determined, so that the database statement which can be executed in parallel can be determined conveniently, the data transmission delay is reduced, and the data processing efficiency is improved.

It should be noted that, the database statement to be executed in the database statement queue may be determined based on a structured query language (Structured Query Language, SQL), which is a database language with multiple functions such as data manipulation and data definition, and the language has an interactive characteristic, so that the processing quality and efficiency of the database system can be effectively improved. The database statement to be executed may be written based on SQL. For example: the database statement to be executed may be "select score from test", and the operation corresponding to the database statement to be executed is to select the value corresponding to "score" from the data table "test".

In an embodiment, the implementation manner of determining the database statement queue according to the data processing request may be: inquiring a preset statement library in the coordination node according to the data processing request to obtain an inquiring result; determining a plurality of database sentences to be executed according to the query result; and generating a database statement queue according to the plurality of database statements to be executed. The coordination node comprises a preset statement library, wherein the preset statement library comprises a plurality of pre-written database statements; after receiving the data processing request, the coordination node can query the preset database according to the data processing request, if the database sentences associated with the data processing request exist in the preset database, the database sentences are determined to be the database sentences to be executed, and a database sentence queue is determined according to the database sentences to be executed. For example: the preset statement library comprises a plurality of database statements associated with 'inquiring all data in 5 months', the coordination node receives a data processing request, the data processing request is used for requesting the inquiring all data in 5 months, the coordination node can directly determine the plurality of database statements from the preset statement library according to the data processing request, determine the database statements as to-be-executed database statements, and determine a database statement queue according to the to-be-executed database statements. By the method provided by the embodiment of the application, the database statement queue can be rapidly determined, the overall data processing efficiency is improved, and the use experience is improved.

In one embodiment, the data processing request includes a data processing identifier and a data processing parameter, the data processing identifier being used to indicate a type of processing performed on data stored in the distributed database system; the implementation manner of determining the database statement queue according to the data processing request can also be as follows: determining a first number of initial database statements according to the data processing identifier in the data processing request; and carrying out combination processing on the data processing parameters in the data processing request and the first number of initial database sentences to obtain the first number of database sentences to be executed, and adding the first number of database sentences to be executed into a database sentence queue. The data processing request sent by the terminal device may carry parameters, where the data processing request may include a data processing identifier and data processing parameters, where the data processing identifier may be used to indicate a type of processing performed on data stored in the distributed database system, for example: new addition, modification, query, deletion, etc. The data processing parameters may be used to indicate specific operational objects, such as: a particular data table. The coordinating node may determine a first number of initial database statements, which may be incomplete database statements, based on the data processing identifier in the data processing request, for example: the data processing identifier in the data processing request is used to indicate that the stored data is to be queried, a first number of initial database statements may be determined, where the initial database statements may be "select X from Y", where X and Y are parameters to be determined. The coordination node can perform combination processing on the data processing parameters in the data processing request and the first number of initial database sentences to obtain the first number of to-be-executed database sentences, and add the first number of to-be-executed database sentences into a database sentence queue. For example: the first number of initial database sentences is determined, wherein the initial database sentences can be 'select X from Y', wherein X and Y are parameters to be determined, the data processing parameters in the data processing request are 'num' and 'income', and after the combination processing is performed, the database sentences to be executed are 'select num from income', namely the value of 'num' is selected from a data table named 'income'. The method provided by the embodiment of the application has better universality, the terminal equipment is not required to modify the data processing request, the optimization without perception can be realized, and the use experience is improved.

It should be noted that, the data processing method provided in the embodiment of the present application is not only applicable to database statements, but also applicable to dynamically spliced statements. The method provided by the embodiment of the application has good universality and can be applied to different use scenes.

In one embodiment, the distributed database system of embodiments of the present application may be built based on an online transaction processing (Online Transaction Processing, OLTP) system, which is a computer system for handling large numbers of short-term transactions, and is characterized by high concurrency, high availability, high performance, and high reliability. OLTP systems support transaction management, concurrency control, database indexing, data backup and recovery, security management, etc., and may allow a large number of operational objects to perform a large number of database transactions in real time over a network (database transactions refer to processing of changes, insertions, deletions, or queries, etc., on data in a database). The method provided by the embodiment of the application can effectively reduce the data transmission delay and improve the data processing efficiency, so that the method provided by the embodiment of the application can be used for improving the execution performance of the OLTP and improving the use experience.

S302, determining a target database statement from a first number of database statements to be executed, which are included in the database statement queue; the number of target database statements is a second number, the second number is smaller than or equal to the first number, and the second number of target database statements can be executed in parallel.

In this embodiment of the present application, the database statement queue includes a first number of database statements to be executed, and then there are some database statements to be executed in the database statement queue, which are executed independently of each other and do not affect each other, for example: a database statement to be executed "select date from income" (indicating a value of "date" is queried from the data table "income") and a database statement to be executed "select num from outcome" (indicating a value of "num" is queried from the data table "outome"), which are processed for different data tables and have no data dependency relationship therebetween, the two database statements to be executed may be executed in parallel. Also for example: the database statement to be executed "a= select num from income" (indicating that the value of "num" is queried from the data table "income" and the queried value is assigned to a) and the database statement to be executed "select date from outcome where date.num=a" (indicating that the value of "date" corresponds to the case that "num" is queried from the data table "outome") which are processed for different data tables, but have a data dependency relationship therebetween, and the value of a can be determined only after the execution of the first database statement to be executed, so that the second database statement to be executed cannot be executed in parallel.

The coordination node can determine target database sentences from a first number of to-be-executed database sentences included in the database sentence queues, and the target database sentences can be executed in parallel, namely, each target database sentence has no data dependency with other to-be-executed database sentences. The number of target database statements determined is a second number, which may be less than or equal to the first number, for example: the database statement queue comprises 10 database statements to be executed, and 10 target database statements can be determined if all the 10 database statements to be executed have no data dependency. By the method provided by the embodiment of the application, the target database statement which can be executed in parallel can be determined, so that the target database statement can be conveniently and subsequently sent to the data node in parallel, and the data processing efficiency is effectively improved.

S303, determining target data nodes corresponding to the target database sentences from a plurality of data nodes included in the distributed database system.

In the embodiment of the application, the coordination node comprises node description information of each data node in the distributed database system, and the node description information can comprise related information (such as table names, attributes in the table, object numbers and the like) of a data table stored by the data node. The coordinating node may determine a target data node corresponding to each target database statement from a plurality of data nodes included in the distributed database system. For example: the distributed database system comprises three data nodes, wherein a data table A is arranged in a data node 1, a data table B is arranged in a data node 2, and a data table C is arranged in a data node 3; two target database sentences exist first, namely a target database sentence 1 and a target database sentence 2 are respectively 'select num from A' and 'select num from B', and then the target data node corresponding to the target database sentence 1 can be determined to be a data node 1, and the target data node corresponding to the target database sentence 2 is a data node 2. By the method provided by the embodiment of the application, the target data node corresponding to the target database statement can be determined, so that the subsequent target data node can execute the target database statement in parallel, and the data processing efficiency is improved.

S304, the second number of target database sentences are sent to the corresponding target data nodes in parallel, so that the target data nodes process stored data according to the received target database sentences.

In this embodiment of the present application, after determining the target data node corresponding to each target database statement, the coordinating node may send the second number of target database statements to the corresponding target data nodes in parallel, so that the target data nodes may process the stored data according to the received target database statement. For example: and determining 4 target database sentences, wherein the coordination node can simultaneously send the 4 target database sentences to the corresponding target data nodes, and the data transmission delay of the 4 target database sentences is the same as the data transmission delay of sending one database sentence to the target data nodes. At this time, the calculation formula of the data processing time is the statement execution time multiplied by the number of times of statement execution, and the data transmission delay is added, assuming that the execution time of the 4 target database statements is 0.001 seconds, the data transmission delay is 0.01 seconds,if the 4 target database sentences are executed by the same data node, the data processing method provided by the embodiment of the application is adopted, and the total data processing time of the 4 target database sentences is Second, wherein the second is; if the 4 target database statements are executed by 4 different data nodes, the data processing method provided by the embodiment of the application is adopted, and the total data processing time of the 4 target database statements is +.>Second (i.e. 4 data nodes execute corresponding target database statements simultaneously, the time spent on executing the 4 target database statements is 0.001 seconds); if there is a data node that needs to execute two or more of the 4 target database statements, the total data processing time of the 4 database statements is +.>And second, max represents the number of the executed database sentences of the data nodes with the largest executed target database sentences among the data nodes corresponding to the 4 target database sentences. For any of the above cases, compared with the method shown in fig. 1, the data processing delay of the method provided in the embodiment of the present application is smaller than that of a method for sending database statements one by one. By the method provided by the embodiment of the application, the data transmission time delay can be effectively reduced, so that the data processing efficiency is effectively improved.

It should be noted that, in the embodiment of the present application, the method may also be referred to as a pipeline model, and in the embodiment of the present application, the coordination node may determine a database statement queue including a database statement to be executed, may analyze the database statement to be executed in the database statement queue, determine a target database statement that may be executed in parallel, and send the target database statement that may be executed in parallel to each corresponding target data node in parallel, thereby effectively saving data transmission time, enabling multiple data nodes to execute the database statement simultaneously, fully utilizing computing resources of the distributed database system, and effectively improving data processing efficiency.

Please refer to fig. 4, which is a schematic diagram of another database statement sending method provided in the embodiment of the present application. The distributed database system shown in fig. 4 includes a coordination node 401 and three data nodes, which are respectively a data node 402, a data node 403 and a data node 404, where the coordination node includes a state machine and a program interface, the state machine can determine a database statement queue and send a database statement to the data node through the program interface, and the program interface is used to implement data interaction between the coordination node and the data node. Each data node comprises a database statement asynchronous processing module which can process the data stored in the data node according to the received database statement. When data processing is performed, a state machine in the coordination node can determine a database statement queue according to the data processing request, wherein the database statement queue comprises three database statements to be executed, namely SQL1, SQL2 and SQL3. The coordination node can determine target database sentences which can be executed in parallel from a first number of database sentences to be executed, wherein the database sentences comprise database sentence queues, and the determined target database sentences are SQL1, SQL2 and SQL3; the coordination node may determine, from a plurality of data nodes included in the distributed database system, a target data node corresponding to each target database statement, that is, a target data node corresponding to SQL1 is a data node 402, a target data node corresponding to SQL2 is a data node 403, and a target data node corresponding to SQL3 is a data node 404. The coordination node can send three target database sentences to the corresponding target data nodes in parallel, and the database sentence asynchronous processing module in the three target data nodes can execute the received target database sentences and process the data stored by the data nodes. Since the execution time of different database statements may be different, the time at which different data nodes return the execution results of the database statements may be different, i.e., the data nodes asynchronously return the execution results of the database statements. According to the data processing method provided by the embodiment of the application, the data transmission time delay can be effectively reduced, the overall efficiency of data processing is improved, meanwhile, a plurality of data nodes can execute corresponding database sentences at the same time, the computing resources of a distributed database system are fully utilized, the optimization of the data processing process is realized, and the use experience is improved.

The method provided by the embodiment of the application can determine the database statement queue according to the data processing request, so that the method provided by the application can be applied to various different scenes, has good universality, can achieve the effect of non-perception optimization without manually modifying the data processing process, and greatly improves the use experience; the target database statement which can be executed in parallel can be determined from the database statement queue, so that the pipeline processing mode of the database statement can be realized, and the greatly improved sending efficiency of the database statement is facilitated; the method can send the target database sentences to the corresponding target data nodes in parallel, so that the data transmission delay is effectively reduced, the overall efficiency of data processing is improved, meanwhile, the data processing method provided by the embodiment of the application can enable a plurality of data nodes to execute the corresponding database sentences at the same time, the computing resources of the distributed database system are fully utilized, the optimization of the data processing process is realized, and the use experience is improved.

Referring to fig. 5, fig. 5 is a flowchart of another data processing method according to an embodiment of the present application. The data processing method may be implemented by the above-described coordination node 2021, or may be implemented by another device. The flow of the data processing method provided in the embodiment of the application includes, but is not limited to:

S501, determining a first number of initial database sentences according to data processing identifiers in data processing requests, wherein the data processing requests are used for requesting to process data stored in data nodes, and the data nodes are contained in a distributed database system.

In the embodiment of the application, the terminal device may generate a data processing request including the data processing identifier and the data processing parameter, and the terminal device may send the data processing request to the distributed database system. The distributed database system comprises a coordination node and a plurality of data nodes, wherein the coordination node can respond to a data processing request sent by the terminal equipment, can send a database statement to the data nodes, and the data nodes can store data and can execute the received database statement. The data stored in different data nodes in the distributed database system are not always identical, and node description information of each data node is stored in the coordination node, wherein the node description information is used for indicating the content of the data stored in the data node. The data processing request sent by the terminal device may be used to request processing of the data stored in the data node (the processing may include adding and deleting and modifying the data, etc.). The data processing identifier in the data processing request may be used to indicate the type of processing to be performed on the data stored in the distributed database system, for example: new addition, modification, query, deletion, etc. The coordinating node may determine a first number of initial database statements, which may be incomplete database statements, based on the data processing identifier in the data processing request. By the method provided by the embodiment of the application, the initial database statement can be determined, the subsequent determination of the database statement queue is facilitated, the data transmission delay is reduced, and the data processing efficiency is improved.

S502, carrying out combination processing on the data processing parameters in the data processing request and the first number of initial database sentences to obtain a first number of database sentences to be executed, and adding the first number of database sentences to be executed into a database sentence queue.

In the embodiment of the application, the coordination node can perform combined processing on the data processing parameters in the data processing request and the first number of initial database sentences to realize dynamic splicing of the database sentences, obtain the first number of to-be-executed database sentences, and add the first number of to-be-executed database sentences into the database sentence queue. For different application scenes and different data processing operations, the data processing identification and the data processing parameters in the data processing request can be different, and the data processing method provided by the embodiment of the application can be applied to various scenes and has good universality. The method provided by the embodiment of the application has better universality, the terminal equipment is not required to modify the data processing request, the optimization without perception can be realized, and the use experience is improved.

In an embodiment, determining the implementation manner of the database statement queue may further be: inquiring a preset statement library in the coordination node according to the data processing request to obtain an inquiring result; determining a plurality of database sentences to be executed according to the query result; and generating a database statement queue according to the plurality of database statements to be executed. The coordination node comprises a preset statement library, wherein the preset statement library comprises a plurality of pre-written database statements; after receiving the data processing request, the coordination node can query the preset database according to the data processing request, if the database sentences associated with the data processing request exist in the preset database, the database sentences are determined to be the database sentences to be executed, and a database sentence queue is determined according to the database sentences to be executed. For example: the preset statement library comprises a plurality of database statements associated with 'all data of the query object A', the coordination node receives a data processing request, the data processing request is used for requesting all data of the object A, the coordination node can directly determine the plurality of database statements from the preset statement library according to the data processing request, determine the plurality of database statements as to-be-executed database statements, and determine a database statement queue according to the to-be-executed database statements. By the method provided by the embodiment of the application, the database statement queue can be rapidly determined, the overall data processing efficiency is improved, and the use experience is improved.

It should be noted that, the database statement to be executed in the database statement queue may be determined based on a structured query language (Structured Query Language, SQL), which is a database language with multiple functions such as data manipulation and data definition, and the language has an interactive characteristic, so that the processing quality and efficiency of the database system can be effectively improved. The database statement to be executed may be written based on SQL. The distributed database system provided in embodiments of the present application may be built based on an online transaction processing (Online Transaction Processing, OLTP) system, which is a computer system for handling large numbers of short-term transactions, and is primarily characterized by high concurrency, high availability, high performance, and high reliability. OLTP systems support transaction management, concurrency control, database indexing, data backup and recovery, security management, etc., and may allow a large number of operational objects to perform a large number of database transactions in real time over a network (database transactions refer to processing of changes, insertions, deletions, or queries, etc., on data in a database). The method provided by the embodiment of the application can effectively reduce the data transmission delay and improve the data processing efficiency, so that the method provided by the embodiment of the application can be used for improving the execution performance of the OLTP and improving the use experience.

S503, determining a target database statement from a first number of database statements to be executed, which are included in the database statement queue; the number of target database statements is a second number, the second number is smaller than or equal to the first number, and the second number of target database statements can be executed in parallel.

In this embodiment of the present application, the database statement queue includes a first number of database statements to be executed, and then there are some database statements to be executed in the database statement queue, which are executed independently of each other and do not affect each other, for example: the database statement to be executed "select no from a" (indicating the value of "no" is queried from the data table "a") and the database statement to be executed "select no from B" (indicating the value of "no" is queried from the data table "B") are processed for different data tables and have no data dependency relationship therebetween, and can be executed in parallel. Also for example: the first to-be-executed database statement is 'count=execute' select count (t) from test '"(the value of' count (t) is inquired from the data table 'test' and the inquired value is assigned to the count) is indicated, the second to-be-executed database statement is 'sum=execute' select sum (t) from test '" (the value of' sum (t) is inquired from the data table 'test' and the inquired value is assigned to the sum), the third to-be-executed database statement is 'avg=sum/count', and in the three to-be-executed database statements, no data dependency relationship exists between the first to-be-executed database statement and the second to-be-executed database statement, and the third to-be-executed database statement needs the execution result of the first to-be-executed database statement and the second to-be-executed database statement.

The coordination node can determine target database sentences from a first number of to-be-executed database sentences included in the database sentence queues, and the target database sentences can be executed in parallel, namely, each target database sentence has no data dependency with other to-be-executed database sentences. The number of target database statements determined is a second number, which may be less than or equal to the first number, for example: the second number may be equal to the first number when there is no data dependency between all database statements to be executed in the database statement queue. By the method provided by the embodiment of the application, the target database statement which can be executed in parallel can be determined, so that the target database statement can be conveniently and subsequently sent to the data node in parallel, and the data processing efficiency is effectively improved.

In an embodiment, the determining, from the first number of database statements to be executed included in the database statement queue, an implementation manner of the target database statement may be: according to the arrangement sequence of the database sentences to be executed in the database sentence queue, sequentially determining a third number of database sentences to be analyzed from the database sentence queue, wherein the arrangement sequence of the third number of database sentences to be analyzed is continuous; carrying out parallel analysis processing on a third number of database sentences to be analyzed, and determining the data dependent analysis result of each database sentence to be analyzed; and determining a target database statement from the third number of database statements to be analyzed according to the data dependence analysis result. For the database statement queue, the coordination node may sequentially determine a third number of database statements to be analyzed from the database statement queue according to the arrangement order of the database statements to be executed in the database statement queue, where the arrangement order of the third number of database statements to be analyzed is continuous, for example: the database statement queue includes 100 database statements to be executed, and 10 (i.e., a third number of) database statements to be analyzed can be determined from the database statement queue according to the arrangement data of the database statements to be executed, and the arrangement positions of the determined 10 database statements to be analyzed in the database statement queue can be from the 1 st bit to the 10 th bit. And carrying out parallel analysis processing on the third number of database sentences to be analyzed, determining the data dependency analysis result of each database sentence to be analyzed, and determining a target database sentence from the third number of database sentences to be analyzed according to the data dependency analysis result, namely determining the database sentences which can be parallel. For example: performing parallel analysis processing on the 10 database sentences to be analyzed to obtain data dependency analysis results of all the database sentences to be analyzed, and determining 5 target database sentences from the third number of database sentences to be analyzed according to the data analysis results, namely determining that 5 database sentences in the 10 database sentences to be analyzed are database sentences which can be executed in parallel; also for example: and carrying out parallel analysis processing on the 10 database sentences to be analyzed to obtain data dependency analysis results of all the database sentences to be analyzed, and determining 10 target database sentences from the third number of database sentences to be analyzed according to the data analysis results, namely determining that all the 10 database sentences to be analyzed are database sentences which can be executed in parallel. By the method provided by the embodiment of the application, parallel analysis can be performed on the third number of database sentences, so that the analysis efficiency is improved, and the data processing efficiency is further improved.

In an embodiment, the data-dependent analysis result is a first data-dependent analysis result or a second data-dependent analysis result, where the first data-dependent analysis result is used to indicate that the database statement to be analyzed has a data dependency, and the second data-dependent analysis result is used to indicate that the database statement to be analyzed has no data dependency; the implementation manner of determining the target database statement from the third number of database statements to be analyzed according to the data-dependent analysis result may be: if the third number of database sentences to be analyzed has the database sentences to be analyzed with the data dependency analysis result being the first data dependency analysis result, determining the first database sentences to be analyzed with the forefront arrangement sequence from the database sentences to be analyzed with the data dependency analysis result being the first data dependency analysis result; and determining a second database statement to be analyzed, which is arranged in sequence before the first database statement to be analyzed, from the third number of database statements to be analyzed as a target database statement. The data dependency analysis result can be a first data dependency analysis result or a second data dependency analysis result, wherein the first data dependency analysis result is used for indicating that the database statement to be analyzed has data dependency, i.e. the database statement to be analyzed cannot be executed in parallel; the second data dependency analysis result is used for indicating that the database statement to be analyzed has no data dependency, i.e. the database statement to be analyzed can be executed in parallel; if the third number of database sentences to be analyzed has the database sentences to be analyzed with the data dependency analysis result being the first data dependency analysis result, determining the first database sentences to be analyzed with the forefront arrangement sequence from the database sentences to be analyzed with the data dependency analysis result being the first data dependency analysis result; and determining a second database statement to be analyzed, which is arranged in sequence before the first database statement to be analyzed, from the third number of database statements to be analyzed as a target database statement. Referring to fig. 6, the diagram is a schematic diagram of a target database statement determination method provided in an embodiment of the present application. In fig. 6, there are a third number (6) of database statements to be analyzed (SQL 1-SQL6, respectively), and in fig. 6 there are database statements to be analyzed (i.e. SQL4 and SQL 6) whose data-dependent analysis result is the first data-dependent analysis result, then from the database statements to be analyzed whose data-dependent analysis result is the first data-dependent analysis result, the first database statement to be analyzed (i.e. SQL 4) whose arrangement sequence is the first is determined, and in the third number of database statements to be analyzed, the second database statement to be analyzed whose arrangement sequence is before the first database statement to be analyzed (SQL 4) is determined as the target database statement, i.e. the determined target database statement is SQL1-SQL3, and these 3 database statements to be analyzed can be executed in parallel.

In an embodiment, the implementation manner of determining the target database statement from the third number of database statements to be analyzed according to the data-dependent analysis result may further be: if the data dependency analysis results of the database sentences to be analyzed are the second data dependency analysis results, determining the database sentences to be analyzed as target database sentences; according to the arrangement sequence, determining a third number of new database sentences to be analyzed from the database sentences to be executed except for the first database sentences to be executed in the database sentence queue, wherein the first database sentences to be executed refer to the database sentences to be executed which are determined to be target database sentences; carrying out parallel analysis processing on a third number of new database sentences to be analyzed, and determining the data dependent analysis result of each new database sentence to be analyzed; and determining the target database statement from the third number of new database statements to be analyzed according to the data-dependent analysis result of each new database statement to be analyzed. If the data dependency analysis result of each database statement to be analyzed is the second data dependency analysis result in the third number of database statements to be analyzed, the third number of database statements to be analyzed is indicated to be database statements which can be executed in parallel.

The first to-be-executed database statement refers to a to-be-executed database statement determined as a target database statement, and a third number of new to-be-analyzed database statements can be determined from the to-be-executed database statements except the first to-be-executed database statement in the database statement queue according to the arrangement sequence; the third number of new database sentences to be analyzed can be subjected to parallel analysis processing, and the data dependence analysis result of each new database sentence to be analyzed is determined; the target database statement may be determined from the third number of new database statements to be analyzed based on the data dependent analysis result of each new database statement to be analyzed. For example: the database statement queue contains 100 database statements to be executed, and if the data dependency analysis results of 10 database statements to be analyzed arranged in the 1 st bit to the 10 th bit are determined to be the second data dependency analysis results, the 10 database statements to be analyzed are determined to be target database statements. According to the arrangement sequence, determining a third number of new database sentences to be analyzed from the database sentences to be executed except for the 1 st to 10 th database sentences in the database sentence queue, namely determining the 11 th to 20 th database sentences to be analyzed, carrying out parallel analysis processing on the 11 th to 20 th database sentences to be analyzed, determining the data dependency analysis result of each database sentence to be analyzed, determining the data dependency analysis result of the 11 th to 13 th database sentences to be a second data dependency analysis result, and determining the data dependency analysis result of the 14 th database sentences to be analyzed to be a first data dependency analysis result, wherein the 11 th to 13 th database sentences to be analyzed are also target database sentences, namely the target database sentences comprise the 1 st to 13 th database sentences in the database sentence queue.

If the data-dependent analysis result of each new database statement to be analyzed is the second data-dependent analysis result, the steps can be repeated until the data-dependent analysis result of the database statement to be analyzed is the first data-dependent analysis result or the database statement queue does not contain any database statement to be executed except the first database statement to be executed, that is, the steps are repeated until the data-dependent analysis result of a certain database statement to be analyzed is the first data-dependent analysis result or all the database statements to be executed in the database statement queue are target database statements. By the method provided by the embodiment of the application, the target database statement in the database statement queue can be accurately determined, so that the subsequent parallel sending of the target database statement is facilitated, and the data processing efficiency is improved.

In an embodiment, determining the implementation manner of the target database statement from the first number of database statements to be executed included in the database statement queue may further be: according to the arrangement data of the database sentences to be executed in the database sentence queue, sequentially carrying out parallel analysis processing on the database sentences to be executed, which are included in the database sentence queue, and determining the data dependency analysis results of the 1 st to the K th database sentences to be executed, wherein the data dependency analysis results of the 1 st to the K th database sentences to be executed are all second data dependency analysis results, the data dependency analysis results of the K+1st database sentences to be executed are first data dependency analysis results, K is an integer greater than 1, and K is smaller than or equal to the first quantity; and determining the target database statement from the 1 st to the K th database statement to be executed. According to the method provided by the embodiment of the invention, parallel analysis processing can be directly and sequentially carried out on the database sentences to be executed in the database sentence queue, and according to the data dependency analysis results of each database sentence to be executed, when the data dependency analysis result of a certain database sentence to be executed is determined to be the first data dependency analysis result, the processing is stopped, and the database sentences to be executed before the database sentence to be executed are arranged to be determined to be target database sentences. For example: the database statement queue comprises 100 database statements to be executed, each database statement to be executed in the database statement queue can be directly and sequentially analyzed in parallel, the data dependency analysis result of each database statement to be executed is determined, the data dependency analysis result of the 1 st to 20 th database statements to be executed is a second data dependency analysis result, the data dependency analysis result of the 21 st database statement to be executed is a first data dependency analysis result, and then the 1 st to 20 th database statements to be executed can be determined as target database statements. By the method provided by the embodiment of the application, the target database statement in the whole database statement queue can be directly determined, and the efficiency of data processing is improved.

In an embodiment, the parallel analysis processing is performed on the third number of database statements to be analyzed, and the implementation manner of the data-dependent analysis result of each database statement to be analyzed may be: determining whether a dependent database statement of a third database statement to be analyzed exists or not from database statements to be executed, wherein the arrangement sequence of the database statement queue is positioned before the third database statement to be analyzed, the third database statement to be analyzed is any one database statement to be analyzed in the third number of database statements to be analyzed, and the execution of the third database statement to be analyzed depends on the execution result of the corresponding dependent database statement; if the dependent database statement of the third database statement to be analyzed exists and the execution result of the dependent database statement is not obtained, determining that the data dependent analysis result of the third database statement to be analyzed is the first data dependent analysis result. When parallel analysis processing is performed on the third number of database sentences to be analyzed, determining that the data dependency analysis result of each database sentence to be analyzed is obtained, the third database sentence to be analyzed can be any one of the third number of database sentences to be analyzed, and the execution of the third database sentence to be analyzed depends on the execution result of the corresponding dependent database sentence, and whether the dependent database sentence of the third database sentence to be analyzed exists or not can be determined from the database sentences to be analyzed, the arrangement sequence of which is included in the database sentence queue and is located before the third database sentence to be analyzed, if the dependent database sentence of the third database sentence to be analyzed exists, and the execution result of the dependent database sentence is not obtained, the data dependency analysis result of the third database sentence to be analyzed can be determined to be the first data dependency analysis result, namely, the third database sentence to be analyzed is determined to have a data dependency relationship, and cannot be executed in parallel.

For example: the third to-be-analyzed database statement is the 7 th to-be-executed database statement in the data source statement queue, it can be determined that the 1 st to 6 th to-be-executed database statements included in the database statement queue contain dependent database statements of the third to-be-analyzed database statement, the dependent database statement is the 4 th to-be-executed database statement, and the coordination node does not acquire an execution result of the 4 th to-be-executed database statement, it can be determined that the data dependent analysis result of the 7 th to-be-executed database statement is the first data dependent analysis result, namely, the 7 th to-be-executed database statement cannot be transmitted to the corresponding data node in parallel with the 4 th to-be-executed database statement. By the method provided by the embodiment of the application, the automatic analysis of the dependence result of the database statement can be realized, the perception-free optimization effect can be realized, the data transmission delay can be reduced to the greatest extent, and the normal execution of the database statement is ensured.

In an embodiment, the parallel analysis processing is performed on the third number of database statements to be analyzed, and the implementation manner of the data-dependent analysis result of each database statement to be analyzed may be further determined as follows: if the dependent database statement of the third database statement to be analyzed exists and the execution result of the dependent database statement is obtained, determining that the data dependent analysis result of the third database statement to be analyzed is a second data dependent analysis result; or if the dependent database statement of the third database statement to be analyzed does not exist, determining the data dependent analysis result of the third database statement to be analyzed as the second data dependent analysis result. If the arrangement sequence included in the database statement queue is in the database statement to be executed before the third database statement to be analyzed, there is a dependent database statement of the third database statement to be analyzed, and the execution result of the dependent database statement is obtained, it can be determined that the data dependent analysis result of the third database statement to be analyzed is the second data dependent analysis result, that is, the third database statement to be analyzed has no data dependency, and can be executed in parallel with other target database statements. If the arrangement sequence included in the database statement queue is in the database statement to be executed before the third database statement to be analyzed, the dependent database statement of the third database statement to be analyzed does not exist, the data dependent analysis result of the third database statement to be analyzed can be determined to be the second data dependent analysis result, namely, the database statement to be analyzed has no data dependence. By the method provided by the embodiment of the application, the data dependency analysis result of the database statement to be analyzed can be accurately determined, and the data dependency relationship of the database statement can be accurately determined, so that the subsequent determination of the target database statement is facilitated, the data transmission delay can be reduced, and the data processing efficiency is improved.

It should be noted that, the method provided in the embodiment of the present application adopts a method of deferred evaluation, and automatically identifies whether there is a data dependency in a database statement to be executed, specifically, the database statement to be executed in the database statement queue does not need to wait for the execution of the database statement to be relied on in sequence, and assigns a value after the execution result is obtained, but can delay the completion of the execution result until the variable is evaluated. For example: the database statement queue comprises three database statements to be executed: the first to-be-executed database statement is count=execute 'select count (t) from test' (i.e., determines the value of count), the second to-be-executed database statement is sum=execute 'select sum (t) from test' (i.e., determines the value of sum), and the third to-be-executed database statement is avg=sum/count (i.e., determines the value of avg); the coordination node generally sends the first database statement to be executed to the corresponding data node, receives the execution result returned by the data node, so as to determine the value of count, sends the second database statement to be executed to the corresponding data node, receives the execution result returned by the data node, determines the value of sum, and further determines the value of avg. In some cases, if the coordination node does not acquire the execution results of the first to-be-executed database statement and the second to-be-executed database statement in the whole processing process, the coordination node waits for the execution of the other to-be-executed database statements in the database statement queue to be completed, and then performs the processing of the third to-be-executed database statement. By the method provided by the embodiment of the application, the data dependency relationship of the database statement can be automatically identified, the effect of automatic pipeline processing is achieved, the data transmission time delay can be reduced as much as possible, and the use experience is improved.

In an embodiment, the implementation manner of determining the target database statement from the third number of database statements to be analyzed according to the data-dependent analysis result may further be: determining the sentence category of each database sentence to be analyzed; if the third number of to-be-analyzed database sentences has to-be-analyzed database sentences with the data dependency analysis result being the first data dependency analysis result or to-be-analyzed database sentences with the sentence category being the serial category, determining that the target to-be-analyzed database sentences with the forefront arrangement sequence in the to-be-determined database sentences comprise to-be-analyzed database sentences with the data dependency analysis result being the first data dependency analysis result and to-be-analyzed database sentences with the sentence category being the serial category, wherein the database sentences with the sentence category being the serial category cannot be executed in parallel; and determining the database statement to be analyzed, of the third number of database statements to be analyzed, which are arranged in sequence before the target database statement to be analyzed, as the target database statement. In some cases, a particular class of database statement may only be executed serially, not in parallel with other database statements, for example: the database statement of the commitment (commit) category and the database statement of the exception (exception) category, which imply the semantics of the completion of the previous statement, need to wait for the complete execution of all the database statements arranged before the previous statement before the complete execution, and then execute the previous statement separately, i.e. the two types of database statements are database statements of the serial category.

When determining the target database statement, determining the statement class of each to-be-analyzed database statement in the third number, and if the data dependency analysis result is the first data dependency analysis result (namely, the data dependency relation exists) or the to-be-analyzed database statement with the statement class being the serial class exists in the third number of to-be-analyzed database statements, determining the target to-be-analyzed database statement with the forefront arrangement sequence in the to-be-determined database statement. For example: the third number of the to-be-analyzed database sentences are the 1 st to the 10 th database sentences in the database sentence queue, and according to analysis, the data dependency analysis result of the 6 th to-be-analyzed database sentence is the first database dependency analysis result, and the sentence types of the 5 th to-be-analyzed database sentence and the 8 th to-be-analyzed database sentence are serial types, so that the target to-be-analyzed database sentence can be determined to be the 5 th to-be-analyzed database sentence. And determining the database statement to be analyzed, of the third number of database statements to be analyzed, which are arranged in sequence before the target database statement to be analyzed, as the target database statement. According to the method provided by the embodiment of the application, the statement types of the database statements and the data dependency analysis results can be comprehensively considered, and the target database statements are determined, so that the determined target database statements are more reasonable, the parallel execution of the database statements can be ensured to be normally executed, and the stability of a distributed database system is ensured.

S504, determining target data nodes corresponding to the target database sentences from a plurality of data nodes included in the distributed database system.

In the embodiment of the application, the coordination node comprises node description information of each data node in the distributed database system, and the node description information can comprise related information (such as table names, attributes in the table, object numbers and the like) of a data table stored by the data node. The coordinating node may determine a target data node corresponding to each target database statement from a plurality of data nodes included in the distributed database system. By the method provided by the embodiment of the application, the target data node corresponding to the target database statement can be determined, so that the subsequent target data node can execute the target database statement in parallel, and the data processing efficiency is improved.

S505, the second number of target database sentences are sent to the corresponding target data nodes in parallel, so that the target data nodes process stored data according to the received target database sentences.

In this embodiment of the present application, after determining the target data node corresponding to each target database statement, the coordinating node may send the second number of target database statements to the corresponding target data nodes in parallel, so that the target data nodes may process the stored data according to the received target database statement. Compared with the method shown in fig. 1, the method provided by the embodiment of the application has smaller data processing time delay than the method of sending database sentences one by one. By the method provided by the embodiment of the application, the data transmission time delay can be effectively reduced, so that the data processing efficiency is effectively improved.

In an embodiment, the distributed database system includes a plurality of data nodes, each data node includes a sending queue and a receiving queue, the sending queue of the data node may receive the target database statement sent by the coordinating node, and the database statements in the sending queue are arranged according to the receiving time sequence. The data node can acquire a database statement from the head of a sending queue, execute the database statement to obtain an execution result of the database statement, and store the execution result into a receiving queue; the data node may send the execution result in the receive queue to the coordinator node. If the data node receives a plurality of target database sentences which are transmitted in parallel by the coordination node, the data node stores the target database sentences into a transmission queue according to the arrangement sequence of the target database sentences in the database sentence queue, sequentially acquires the database sentences from the transmission queue and executes the database sentences, and stores the execution results of the database sentences into a receiving queue according to the time sequence. By the method provided by the embodiment of the application, the data transmission time delay of the database statement can be effectively saved, and the processing efficiency of the database statement is improved.

In an embodiment, when the second number is smaller than the first number, after the second number of target database statements are sent to various corresponding target data nodes, a third number of to-be-executed database statements (a sum of the third number and the second number is the first number) are further included in the database statement queue, and for the third number of to-be-executed database statements, the method provided by the embodiment of the present application may be adopted to determine the target database statement from the third number of to-be-executed database statements, and send the target database statement to the corresponding target data node. Repeating the steps until the database statement queue does not contain the database statement to be executed, namely, executing the database statement to be executed.

For example: the database statement queue comprises 15 database statements to be executed, the 1 st to 5 th database statements to be executed can be determined to be target database statements by adopting the method provided by the embodiment of the application for the first time, the 5 target database statements are sent to the corresponding target data nodes, and the execution results returned by the target data nodes are received; the method provided by the embodiment of the application can determine the 6 th to 14 th database sentences to be executed as target database sentences, send the 9 target database sentences to the corresponding target data nodes, and receive the execution results returned by the target data nodes; by adopting the method provided by the embodiment of the application for the third time, the 15 th database statement to be executed can be determined as the target database statement, the target database statement is sent to the corresponding target data node, and the execution result returned by the target data node is received. The coordination node can determine the processing result information according to the execution result of each database statement to be executed, and the processing result information is associated with the data processing request because the database statement queue is determined according to the data processing request, if the terminal equipment sends the data processing request to the coordination node, the coordination node can send the processing result information corresponding to the data processing request to the terminal equipment. By the method provided by the embodiment of the application, the pipeline processing of the database sentences in the database sentence queue can be realized, the data transmission delay is effectively reduced, the data processing efficiency is improved, and meanwhile, the perception-free optimization can be realized, and the use experience is improved.

It should be noted that, in some cases, after all database sentences that can be executed in parallel in the database sentence queue are determined, all database sentences that can be executed in parallel are sent to the corresponding data nodes, so that the delay of data transmission can be reduced to the greatest extent, for example: the database statement queue comprises 100 database statements to be executed, wherein 90 database statements can be executed in parallel, and after the 90 database statements which can be executed in parallel are determined, the 90 database statements are sent to corresponding data nodes. In other cases, after determining a part of database sentences which can be executed in parallel in the database sentence queue, the part of database sentences which can be executed in parallel is sent to the corresponding data nodes, so that the smoothness of the execution of the database sentences is ensured, for example: the database statement queue comprises 100 database statements to be executed, wherein 90 database statements can be executed in parallel, 10 continuous database statements in the database statement queue can be taken for parallel analysis, and if it is determined that 5 database statements to be executed in parallel in the 10 database statements can be executed, the 5 database statements can be sent to corresponding data nodes. Different transmission modes can be adopted according to different application requirements. By the method provided by the embodiment of the application, different application requirements can be met, and universality of the method is ensured.

Please refer to fig. 7, which is a schematic diagram of a data processing method according to an embodiment of the present application. The distributed database system shown in fig. 7 includes a coordination node and two data nodes (i.e., a data node a and a data node B), where the two data nodes include a processing module, a sending queue and a receiving queue, and the processing module is configured to execute a database statement and process data stored in the data node; the sending queue is used for storing the received database statement; the receiving queue is used for storing the execution result of the database statement generated by the processing module. The coordination node can determine a database statement queue according to a data processing request, wherein the data processing request is used for requesting to process data stored in the data node, and the database statement queue comprises 6 database statements to be executed (the 6 database statements to be executed are assumed to be SQL1-SQL6 respectively); the coordination node can sequentially determine 4 database sentences to be analyzed from the database sentence queue according to the arrangement sequence of the database sentences to be executed in the database sentence queue, wherein the 4 database sentences to be analyzed are SQL1-SQL4. The coordination node may perform parallel analysis processing on the 4 database statements to be analyzed to determine a data dependency analysis result of each database statement to be analyzed, where the data dependency analysis results of SQL1-SQL3 are all second data dependency analysis results, that is, no data dependency relationship exists in any of SQL1-SQL3, and if the data dependency analysis result of SQL4 is the first data dependency analysis result, it may be determined that SQL1-SQL3 is a database statement that can be executed in parallel (as shown in fig. 7, three database statements may be determined from a database statement queue in the coordination node, that is, SQL1, SQL2, and SQL3, respectively, and the three database statements are target database statements). The coordination node can determine SQL1-SQL3 as a target database statement, and can determine that a target data node corresponding to SQL1 is a target data node corresponding to data node A, SQL2 and a target data node corresponding to data node A, SQL is a target data node B from two data nodes; the coordinator node may send SQL1, SQL2, and SQL3 in parallel to each corresponding target data node.

The data node A stores the received SQL1 and SQL2 into a sending queue, a processing module in the data node A acquires a database statement SQL1 from the sending queue, executes the database statement, and processes the stored data to obtain an execution result of the database statement SQL1 (namely result1 in fig. 7); the processing module in the data node a stores the execution result of the database statement SQL1 in the receive queue, and the data node a may send the execution result of the database statement SQL1 in the receive queue to the coordinator node. The processing module in the data node A acquires the database statement SQL2 from the sending queue, executes the database statement, and processes the stored data to obtain an execution result of the database statement SQL2 (namely result2 in FIG. 7); the processing module in the data node a stores the execution result of the database statement SQL2 in the receive queue, and the data node a may send the execution result of the database statement SQL2 in the receive queue to the coordinator node.

The data node B stores the received SQL3 into a transmission queue, a processing module in the data node B acquires a database statement SQL3 from the transmission queue and executes the database statement, and the stored data is processed to obtain an execution result of the database statement SQL3 (namely result3 in fig. 7); the processing module in the data node B stores the execution result of the database statement SQL3 into the receiving queue, and the data node B can send the execution result of the database statement SQL3 in the receiving queue to the coordination node.

The coordination node receives the execution result returned by the data node A and the data node B, processes the rest data sentences to be executed in the data database sentence queue, determines target database sentences, and sends the target database sentences to the corresponding target database nodes in parallel.

Compared with a method for sending database sentences one by one, the method provided by the embodiment of the application can automatically judge the data dependency relationship of the database sentences to be executed, so that a plurality of database sentences which can be executed in parallel are determined; the method has the advantages that a plurality of database sentences which can be executed in parallel can be transmitted to corresponding target data nodes in parallel, so that the data transmission time delay generated by transmitting the plurality of database sentences is the same as the data transmission time delay generated by transmitting one database sentence, and the data transmission time delay is greatly reduced; the method can enable a plurality of data nodes in the distributed database system to execute received database sentences simultaneously, fully utilizes the computing resources of the distributed database system, effectively improves the data processing efficiency and improves the use experience.

The data processing method provided by the embodiment of the application can determine the database statement queue according to the data processing request, so that the method provided by the application can be applied to various different scenes, has good universality, can achieve the effect of non-perception optimization without manually modifying the data processing process, and greatly improves the use experience; the target database statement which can be executed in parallel can be determined from the database statement queue, so that the pipeline processing mode of the database statement can be realized, and the greatly improved sending efficiency of the database statement is facilitated; the data dependency relationship of the database statement can be automatically determined, and the database statement which can be executed in parallel can be determined as much as possible, so that the data transmission delay can be reduced to the maximum extent, and the data processing performance is improved; the method has the advantages that a plurality of database sentences which can be executed in parallel can be transmitted to corresponding target data nodes in parallel, so that the data transmission time delay generated by transmitting the plurality of database sentences is the same as the data transmission time delay generated by transmitting one database sentence, and the data transmission time delay is greatly reduced; the method can enable a plurality of data nodes in the distributed database system to execute received database sentences simultaneously, fully utilizes the computing resources of the distributed database system, effectively improves the data processing efficiency and improves the use experience.

Referring to fig. 8, fig. 8 is a block diagram of a data processing apparatus according to an embodiment of the present application. The device comprises:

a determining unit 801, configured to determine a database statement queue according to a data processing request, where the data processing request is used to request processing of data stored in a data node, the database statement queue includes a first number of database statements to be executed, and the data node is included in a distributed database system;

a processing unit 802, configured to determine a target database statement from a first number of database statements to be executed included in the database statement queue; the number of the target database sentences is a second number, the second number is smaller than or equal to the first number, and the second number of the target database sentences can be executed in parallel;

the determining unit 801 is further configured to determine a target data node corresponding to each target database statement from a plurality of data nodes included in the distributed database system;

and a sending unit 803, configured to send the second number of target database statements to respective corresponding target data nodes in parallel, so that the target data nodes process the stored data according to the received target database statements.

In an embodiment, the processing unit 802 is specifically configured to, when determining the target database statement from the first number of database statements to be executed included in the database statement queue: sequentially determining a third number of database sentences to be analyzed from the database sentence queue according to the arrangement sequence of the database sentences to be executed in the database sentence queue, wherein the arrangement sequence of the third number of database sentences to be analyzed is continuous; carrying out parallel analysis processing on the third number of database sentences to be analyzed, and determining the data dependency analysis result of each database sentence to be analyzed; and determining a target database statement from a third number of database statements to be analyzed according to the data dependence analysis result.

In an embodiment, the data-dependent analysis result is a first data-dependent analysis result or a second data-dependent analysis result, where the first data-dependent analysis result is used to indicate that the database statement to be analyzed has a data dependency, and the second data-dependent analysis result is used to indicate that the database statement to be analyzed has no data dependency; the processing unit 802 is specifically configured to, when determining a target database statement from a third number of database statements to be analyzed according to the data-dependent analysis result: if the third number of database sentences to be analyzed has the database sentences to be analyzed with the data dependency analysis result being the first data dependency analysis result, determining a first database sentence to be analyzed with the forefront arrangement sequence from the database sentences to be analyzed with the data dependency analysis result being the first data dependency analysis result; and determining a second database statement to be analyzed, which is arranged in sequence before the first database statement to be analyzed, from the third number of database statements to be analyzed as a target database statement.

In an embodiment, the processing unit 802 is further configured to: if the data dependency analysis result of each database statement to be analyzed is the second data dependency analysis result, determining each database statement to be analyzed as a target database statement; according to the arrangement sequence, determining a third number of new database sentences to be analyzed from the database sentences to be executed except for the first database sentences to be executed in the database sentence queue, wherein the first database sentences to be executed refer to the database sentences to be executed which are determined to be target database sentences; carrying out parallel analysis processing on the third number of new database sentences to be analyzed, and determining the data dependent analysis result of each new database sentence to be analyzed; and determining a target database statement from the third number of new database statements to be analyzed according to the data-dependent analysis result of each new database statement to be analyzed.

In an embodiment, when performing parallel analysis processing on the third number of database statements to be analyzed, the processing unit 802 is specifically configured to: determining whether a dependent database statement of a third database statement to be analyzed exists or not from database statements to be executed, wherein the arrangement sequence of the database statement queue is positioned before the third database statement to be analyzed, the third database statement to be analyzed is any database statement to be analyzed in the third number of database statements to be analyzed, and execution of the third database statement to be analyzed depends on an execution result of the dependent database statement; if the dependent database statement of the third database statement to be analyzed exists and the execution result of the dependent database statement is not obtained, determining that the data dependent analysis result of the third database statement to be analyzed is the first data dependent analysis result.

In an embodiment, the processing unit 802 is further configured to: if a dependent database statement of the third database statement to be analyzed exists and an execution result of the dependent database statement is obtained, determining that a data dependent analysis result of the third database statement to be analyzed is the second data dependent analysis result; or if the dependent database statement of the third database statement to be analyzed does not exist, determining the data dependent analysis result of the third database statement to be analyzed as the second data dependent analysis result.

In an embodiment, the determining unit 801 is specifically configured to, when determining, from a plurality of data nodes included in the distributed database system, a target data node corresponding to each target database statement: determining node description information of each data node in the distributed database system, wherein the node description information is used for describing data stored by the data node; performing data analysis processing on each target database statement to obtain data demand information of each target database statement; and carrying out matching processing on the data demand information of each target database statement and the node description information of each data node, and determining the target data node corresponding to each target database statement from the plurality of data nodes according to a matching result.

In an embodiment, the data processing request includes a data processing identifier and a data processing parameter, the data processing identifier being used to indicate a type of processing performed on data stored in the distributed database system; the determining unit 801 is specifically configured to, when determining a database statement queue according to a data processing request: determining a first number of initial database statements according to a data processing identifier in the data processing request; and carrying out combination processing on the data processing parameters in the data processing request and the first number of initial database sentences to obtain a first number of database sentences to be executed, and adding the first number of database sentences to be executed into a database sentence queue.

It may be understood that the functions of each functional unit of the data processing apparatus in the embodiments of the present application may be specifically implemented according to the data processing method in the embodiments of the method, and the specific implementation process may refer to the relevant description in the embodiments of the data processing method, which is not repeated herein. In the present embodiment, the term "module" or "unit" refers to a computer program or a part of a computer program having a predetermined function, and works together with other relevant parts to achieve a predetermined object, and may be implemented in whole or in part by using software, hardware (such as a processing circuit or a memory), or a combination thereof. Also, a processor (or multiple processors or memories) may be used to implement one or more modules or units. Furthermore, each module or unit may be part of an overall module or unit that incorporates the functionality of the module or unit.

The data processing device provided by the embodiment of the application can determine the database statement queue according to the data processing request, so that the method provided by the application can be applied to various different scenes, has good universality, can achieve the effect of non-perception optimization without manually modifying the data processing process, and greatly improves the use experience; the target database statement which can be executed in parallel can be determined from the database statement queue, so that the pipeline processing mode of the database statement can be realized, and the greatly improved sending efficiency of the database statement is facilitated; the data dependency relationship of the database statement can be automatically determined, and the database statement which can be executed in parallel can be determined as much as possible, so that the data transmission delay can be reduced to the maximum extent, and the data processing performance is improved; the method has the advantages that a plurality of database sentences which can be executed in parallel can be transmitted to corresponding target data nodes in parallel, so that the data transmission time delay generated by transmitting the plurality of database sentences is the same as the data transmission time delay generated by transmitting one database sentence, and the data transmission time delay is greatly reduced; the method can enable a plurality of data nodes in the distributed database system to execute received database sentences simultaneously, fully utilizes the computing resources of the distributed database system, effectively improves the data processing efficiency and improves the use experience.

Referring to fig. 9, fig. 9 is a block diagram of a computer device according to an embodiment of the present application. The computer device described in the embodiment of the present application includes: processor 901, communication interface 902, and memory 903. The processor 901, the communication interface 902, and the memory 903 may be connected by a bus or other means, which is exemplified in the embodiment of the present application.

Among them, the processor 901 (or CPU (Central Processing Unit, central processing unit)) is a computing core and a control core of a computer device, which can parse various instructions in the computer device and process various data of the computer device, for example: the CPU can be used for analyzing a startup and shutdown instruction sent by a user to the computer equipment and controlling the computer equipment to perform startup and shutdown operation; and the following steps: the CPU may transmit various types of interaction data between internal structures of the computer device, and so on. The communication interface 902 may optionally include a standard wired interface, a wireless interface (e.g., wi-Fi, mobile communication interface, etc.), controlled by the processor 901 for transceiving data. The Memory 903 (Memory) is a Memory device in a computer device for storing programs and data. It will be appreciated that the memory 903 here may include both built-in memory of the computer device and extended memory supported by the computer device. The memory 903 provides storage space that stores the operating system of the computer device, which may include, but is not limited to: android systems, iOS systems, windows Phone systems, etc., which are not limiting in this application.

In the present embodiment, the processor 901 performs the following operations by executing executable program code in the memory 903:

In an embodiment, the processor 901 is specifically configured to, when determining a target database statement from a first number of database statements to be executed included in the database statement queue: sequentially determining a third number of database sentences to be analyzed from the database sentence queue according to the arrangement sequence of the database sentences to be executed in the database sentence queue, wherein the arrangement sequence of the third number of database sentences to be analyzed is continuous; carrying out parallel analysis processing on the third number of database sentences to be analyzed, and determining the data dependency analysis result of each database sentence to be analyzed; and determining a target database statement from a third number of database statements to be analyzed according to the data dependence analysis result.

In an embodiment, the data-dependent analysis result is a first data-dependent analysis result or a second data-dependent analysis result, where the first data-dependent analysis result is used to indicate that the database statement to be analyzed has a data dependency, and the second data-dependent analysis result is used to indicate that the database statement to be analyzed has no data dependency; the processor 901 is specifically configured to, when determining a target database statement from a third number of database statements to be analyzed according to the data dependency analysis result: if the third number of database sentences to be analyzed has the database sentences to be analyzed with the data dependency analysis result being the first data dependency analysis result, determining a first database sentence to be analyzed with the forefront arrangement sequence from the database sentences to be analyzed with the data dependency analysis result being the first data dependency analysis result; and determining a second database statement to be analyzed, which is arranged in sequence before the first database statement to be analyzed, from the third number of database statements to be analyzed as a target database statement.

In an embodiment, the processor 901 is further configured to: if the data dependency analysis result of each database statement to be analyzed is the second data dependency analysis result, determining each database statement to be analyzed as a target database statement; according to the arrangement sequence, determining a third number of new database sentences to be analyzed from the database sentences to be executed except for the first database sentences to be executed in the database sentence queue, wherein the first database sentences to be executed refer to the database sentences to be executed which are determined to be target database sentences; carrying out parallel analysis processing on the third number of new database sentences to be analyzed, and determining the data dependent analysis result of each new database sentence to be analyzed; and determining a target database statement from the third number of new database statements to be analyzed according to the data-dependent analysis result of each new database statement to be analyzed.

In an embodiment, when performing parallel analysis processing on the third number of database statements to be analyzed, the processor 901 is specifically configured to: determining whether a dependent database statement of a third database statement to be analyzed exists or not from database statements to be executed, wherein the arrangement sequence of the database statement queue is positioned before the third database statement to be analyzed, the third database statement to be analyzed is any database statement to be analyzed in the third number of database statements to be analyzed, and execution of the third database statement to be analyzed depends on an execution result of the dependent database statement; if the dependent database statement of the third database statement to be analyzed exists and the execution result of the dependent database statement is not obtained, determining that the data dependent analysis result of the third database statement to be analyzed is the first data dependent analysis result.

In an embodiment, the processor 901 is further configured to: if a dependent database statement of the third database statement to be analyzed exists and an execution result of the dependent database statement is obtained, determining that a data dependent analysis result of the third database statement to be analyzed is the second data dependent analysis result; or if the dependent database statement of the third database statement to be analyzed does not exist, determining the data dependent analysis result of the third database statement to be analyzed as the second data dependent analysis result.

In an embodiment, when determining, from a plurality of data nodes included in the distributed database system, a target data node corresponding to each target database statement, the processor 901 is specifically configured to: determining node description information of each data node in the distributed database system, wherein the node description information is used for describing data stored by the data node; performing data analysis processing on each target database statement to obtain data demand information of each target database statement; and carrying out matching processing on the data demand information of each target database statement and the node description information of each data node, and determining the target data node corresponding to each target database statement from the plurality of data nodes according to a matching result.

In an embodiment, the data processing request includes a data processing identifier and a data processing parameter, the data processing identifier being used to indicate a type of processing performed on data stored in the distributed database system; the processor 901 is configured to: determining a first number of initial database statements according to a data processing identifier in the data processing request; and carrying out combination processing on the data processing parameters in the data processing request and the first number of initial database sentences to obtain a first number of database sentences to be executed, and adding the first number of database sentences to be executed into a database sentence queue.

In a specific implementation, the processor 901, the communication interface 902, and the memory 903 described in the embodiments of the present application may execute an implementation manner of a coordination node described in a data processing method provided in the embodiments of the present application, or may execute an implementation manner described in a data processing apparatus provided in the embodiments of the present application, which is not described herein again.

The computer equipment provided by the embodiment of the application can determine the database statement queue according to the data processing request, so that the method provided by the application can be applied to various different scenes, has good universality, can achieve the effect of non-perception optimization without manually modifying the data processing process, and greatly improves the use experience; the target database statement which can be executed in parallel can be determined from the database statement queue, so that the pipeline processing mode of the database statement can be realized, and the greatly improved sending efficiency of the database statement is facilitated; the data dependency relationship of the database statement can be automatically determined, and the database statement which can be executed in parallel can be determined as much as possible, so that the data transmission delay can be reduced to the maximum extent, and the data processing performance is improved; the method has the advantages that a plurality of database sentences which can be executed in parallel can be transmitted to corresponding target data nodes in parallel, so that the data transmission time delay generated by transmitting the plurality of database sentences is the same as the data transmission time delay generated by transmitting one database sentence, and the data transmission time delay is greatly reduced; the method can enable a plurality of data nodes in the distributed database system to execute received database sentences simultaneously, fully utilizes the computing resources of the distributed database system, effectively improves the data processing efficiency and improves the use experience.

Embodiments of the present application also provide a computer-readable storage medium having a computer program stored therein, which when run on a computer, causes the computer to perform the data processing method according to the embodiments of the present application. The specific implementation manner may refer to the foregoing description, and will not be repeated here.

Embodiments of the present application also provide a computer program product comprising a computer program or computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer program or computer instructions from the computer readable storage medium, the processor executing the computer program or computer instructions, causing the computer device to perform a data processing method as described in embodiments of the present application. The specific implementation manner may refer to the foregoing description, and will not be repeated here.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the described order of action, as some steps may take other order or be performed simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments. The technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a computer, a server, or a network device, or may be a processor in a computer device in particular) to perform all or part of the steps of the methods described in the embodiments of the present application. Wherein the aforementioned storage medium may comprise: a U-disk, a removable hard disk, a magnetic disk, an optical disk, a Read-Only Memory (abbreviated as ROM), a random access Memory (abbreviated as Random Access Memory, RAM), or the like.

The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood that: the technical schemes described in the foregoing embodiments may be modified or some of the technical features may be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A method of data processing, the method comprising:

the second number of target database sentences are sent to the corresponding target data nodes in parallel, so that the target data nodes process stored data according to the received target database sentences;

wherein determining the target database statement from the first number of database statements to be executed included in the database statement queue includes:

Sequentially determining a third number of database sentences to be analyzed from the database sentence queue according to the arrangement sequence of the database sentences to be executed in the database sentence queue, wherein the arrangement sequence of the third number of database sentences to be analyzed is continuous; carrying out parallel analysis processing on the third number of database sentences to be analyzed, and determining the data dependency analysis result of each database sentence to be analyzed; the data dependency analysis result is a first data dependency analysis result or a second data dependency analysis result, the first data dependency analysis result is used for indicating that the database statement to be analyzed has data dependency, and the second data dependency analysis result is used for indicating that the database statement to be analyzed has no data dependency;

if the third number of database sentences to be analyzed has the database sentences to be analyzed with the data dependency analysis result being the first data dependency analysis result, determining a first database sentence to be analyzed with the forefront arrangement sequence from the database sentences to be analyzed with the data dependency analysis result being the first data dependency analysis result; and determining a second database statement to be analyzed, which is arranged in sequence before the first database statement to be analyzed, from the third number of database statements to be analyzed as a target database statement.

2. The method according to claim 1, wherein the method further comprises:

if the data dependency analysis result of each database statement to be analyzed is the second data dependency analysis result, determining each database statement to be analyzed as a target database statement;

3. The method of claim 1, wherein the parallel analysis of the third number of database statements to be analyzed to determine a data dependent analysis result for each database statement to be analyzed comprises:

Determining whether a dependent database statement of a third database statement to be analyzed exists or not from database statements to be executed, wherein the arrangement sequence of the database statement queue is positioned before the third database statement to be analyzed, the third database statement to be analyzed is any database statement to be analyzed in the third number of database statements to be analyzed, and execution of the third database statement to be analyzed depends on an execution result of the dependent database statement;

4. A method according to claim 3, characterized in that the method further comprises:

if a dependent database statement of the third database statement to be analyzed exists and an execution result of the dependent database statement is obtained, determining that a data dependent analysis result of the third database statement to be analyzed is the second data dependent analysis result; or,

5. The method of claim 1, wherein said determining a target data node corresponding to each of said target database statements from a plurality of data nodes comprised by said distributed database system comprises:

determining node description information of each data node in the distributed database system, wherein the node description information is used for describing data stored by the data node;

6. The method of any of claims 1-5, wherein the data processing request includes a data processing identifier and a data processing parameter, the data processing identifier indicating a type of processing performed on data stored in the distributed database system;

the determining a database statement queue according to the data processing request comprises the following steps:

Determining a first number of initial database statements according to a data processing identifier in the data processing request;

7. A data processing apparatus, the apparatus comprising:

The sending unit is used for sending the second number of target database sentences to the corresponding target data nodes in parallel so that the target data nodes process stored data according to the received target database sentences;

the processing unit is specifically configured to, when determining a target database statement from a first number of database statements to be executed included in the database statement queue:

8. A computer device, comprising: the data processing method according to any one of claims 1-6, comprising a processor, a communication interface and a memory, said processor, said communication interface and said memory being interconnected, wherein said memory stores executable program code, said processor being adapted to invoke said executable program code.

9. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein computer instructions, which when run on a computer, cause the computer to implement the data processing method according to any of claims 1-6.

10. A computer program product, characterized in that the computer program product comprises a computer program or computer instructions which, when executed by a processor, implement the data processing method according to any of claims 1-6.