WO2021147815A1

WO2021147815A1 - Data calculation method and related device

Info

Publication number: WO2021147815A1
Application number: PCT/CN2021/072472
Authority: WO
Inventors: 胡梦春; 李茂增
Original assignee: 华为技术有限公司
Priority date: 2020-01-22
Filing date: 2021-01-18
Publication date: 2021-07-29
Also published as: CN111324433A; CN111324433B

Abstract

A data calculation method and a related device. The method comprises: a target data node in a distributed database (100) receiving data sent by other data nodes in the distributed database (100) and related to a query statement (S310); the target data node sorting local data and the data received from the other data nodes (S320); and the target data node sending a plurality of pieces of sorted data to at least one data node in the distributed database (100), so that the at least one data node carries out calculation related to the query statement on data respectively received thereby (S330). By means of the method, full use can be made of a distributed calculation capability, thereby preventing a bottleneck caused by a single data node carrying out data calculation, and improving the calculation efficiency.

Description

A method of data calculation and related equipment

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 22, 2020, the application number is 202010076105.0, and the application name is "a method of data calculation and related equipment", the entire content of which is incorporated herein by reference Applying.

Technical field

This application relates to the field of distributed storage technology, and in particular to a method of data calculation and related equipment.

Background technique

Window function is a special type of function in structured query language (SQL). Similar to aggregate function, the input of window function is also a multi-line record. The window function acts on a window, and the window is a multi-line record defined by an OVER expression. The window function is used together with the OVER expression. The OVER expression is used to group data and sort the elements in the group; and the window function is used to process the values in the group, such as aggregation, generating sequence numbers, and so on.

In a distributed database, data is distributed and stored in various data nodes. In addition, when computing data in a distributed database, a single data node completes data collection, sorting, and calculation. Due to the limited computing resources of a single data node, it will cause a computing bottleneck and reduce computing efficiency.

Therefore, how to avoid the computing bottleneck caused by a single data node and improve the computing efficiency and overall execution efficiency is an urgent problem to be solved at present.

Summary of the invention

The embodiment of the present invention discloses a data calculation method and related equipment, which can make full use of distributed calculation capabilities, avoid calculation bottlenecks caused by a single data node, and improve calculation efficiency.

In a first aspect, the present application provides a data calculation method, the method includes: a target data node in a distributed database receives data related to query sentences sent by other data nodes in the distributed database; the target data The node sorts the local data and the data received from other data nodes; the target data node sends a plurality of sorted data to at least one data node in the distributed database, so that the at least one data node sorts the data received respectively The data performs calculations related to the query sentence.

In the solution provided by this application, the target data node collects and sorts the data related to the query sentence to obtain the sorted data, and sends the sorted ordered data to other data nodes in the distributed database, so that Other data nodes can perform calculations related to query statements in parallel, which can avoid the bottleneck caused by calculation by a single data node and make full use of distributed computing capabilities to improve resource utilization and computing efficiency.

In a possible implementation manner, the other data nodes each sort the local data, and send the sorted data to the target data node.

In the solution provided by this application, other data nodes sort the local data before sending data to the target data node. This can reduce the sorting pressure of the target data node, reduce the memory overhead of the target data node, and improve execution efficiency.

In a possible implementation manner, the target data node sends at least one different data to different data nodes among the multiple data nodes in the distributed database.

In the solution provided by this application, the target data node sends the sorted data to different data nodes, and the data received by each data node is not exactly the same, which can ensure that all data nodes that receive the data are Can participate in calculations related to query statements to improve calculation efficiency.

In a possible implementation manner, the target data node performs query sentence-related calculations on data that is not sent to the at least one data node in the sorted data.

In the solution provided by this application, the target data node can also participate in calculations related to query sentences, which can further improve the calculation efficiency and make full use of the computing resources of the distributed database.

In a possible implementation manner, the target data node determines N partitions based on the sorted data of the target data node, different partitions of the N partitions include at least one different data, N is an integer greater than 1, and N is less than or equal to the number of data nodes in the distributed database; the target data node sends data of one of the N partitions to each of the N data nodes of the distributed database except the target data node.

In the solution provided by this application, the target data node composes the sorted data into N partitions, and sends data from one of the N partitions to each data node participating in the calculation, ensuring that each data node can receive one Partition and calculate.

In a possible implementation manner, the target data node obtains the N partitions based on the sorted data of the target data node according to the total amount of data and the data overlap interval of the sorted data of the target data node.

In the solution provided by this application, when the target data node composes N partitions of sorted data, the two factors of the total amount of data and the data overlap interval can be considered at the same time, so as to improve the rationality and accuracy of composing N partitions.

In a possible implementation manner, the target data node sends multiple data sorted by the target data node to at least one data node according to the number of the physical node, and the physical node corresponding to the number of the physical node includes the distribution At least one data node in the database.

In the solution provided by this application, the target data node sends the sorted data to multiple data nodes according to the number of the physical node, which can avoid sending a large amount of data to the same physical node in a short time and improve the resource utilization of the physical node Rate and the execution efficiency of the entire system.

In a possible implementation manner, the at least one data node performs calculation of the window function of the query sentence on the data respectively received.

In the solution provided by this application, each data node can perform various query sentence-related calculations, such as the calculation of the window function of the query sentence, with respect to the data received by each data node.

In the second aspect, the present application provides a data storage device, including: a receiving unit for receiving data related to query sentences sent by other data nodes in a distributed database; a processing unit for processing local data and slaves The data received by the other data nodes are sorted; a sending unit is configured to send a plurality of sorted data to at least one data node in the distributed database, so that the at least one data node performs the data on the data received by each Calculations related to query statements.

In a possible implementation manner, the receiving unit is specifically configured to: receive data sent after other data nodes sort the local data.

In a possible implementation manner, the sending unit is specifically configured to send at least one different data to different data nodes among the multiple data nodes in the distributed database.

In a possible implementation manner, the processing unit is further configured to perform calculations related to the query sentence on data that is not sent to the at least one data node in the sorted data.

In a possible implementation manner, the processing unit is further configured to determine N partitions based on the sorted data, and different partitions of the N partitions include at least one different data, and the N is greater than An integer of 1, and the N is less than or equal to the number of data nodes in the distributed database; the sending unit is specifically configured to: except the target data node among the N data nodes in the distributed database Each other data node sends data of one of the N partitions.

In a possible implementation manner, the processing unit is specifically configured to obtain the N partitions based on the sorted data according to the total amount of data and the data overlap interval of the sorted data.

In a possible implementation manner, the sending unit is specifically configured to send the sorted multiple data to the at least one data node according to the number of the physical node, and the number of the physical node corresponds to The physical node includes at least one data node in the distributed database.

In a third aspect, the present application provides a computing device. The computing device includes a processor and a memory, and the processor executes computer instructions stored in the memory, so that the computing device executes the first aspect described above and in combination with the first aspect. On the one hand, any one of the implementation methods.

In a fourth aspect, the present application provides a computer storage medium that stores a computer program that, when executed by a computing device, implements any one of the foregoing first aspect and a combination of the foregoing first aspect Way of realization.

In a fifth aspect, the present application provides a computer program product. The computer program product includes computer instructions. When the computer instructions are executed by a computing device, the computing device can execute the above-mentioned first aspect and in combination with the above-mentioned first aspect. Any one of the methods in the implementation.

Description of the drawings

FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application;

Figure 2 is a schematic diagram of a data interaction provided by an embodiment of the present application;

FIG. 3 is a schematic flowchart of a data calculation method provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of forward overlap of data provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of backward overlap of data provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of data provided by an embodiment of the present application in which forward overlap and backward overlap exist at the same time;

FIG. 7 is a schematic diagram of a data interval division provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of determining a data sending order provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a data storage provided by an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a data storage device provided by an embodiment of the present application;

FIG. 11 is a schematic structural diagram of a computing device provided by an embodiment of the present application.

Detailed ways

The following describes the technical solutions in the embodiments of the present application clearly and completely with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments.

First of all, some terms and related technologies involved in this application will be explained in conjunction with the drawings to facilitate the understanding of those skilled in the art.

Stream broadcast is a data transmission method in distributed databases, which means that data is sent from one data node (source data node) to other data nodes (target data node).

Stream redistribute is also a data transmission method in distributed databases, which means that the source data node calculates a hash value according to the connection condition, and sends the data to the corresponding target data node according to the calculated hash value.

Figure 1 shows a possible application scenario of an embodiment of the present application. In this application scenario, the distributed database 100 includes multiple coordinator nodes (CN), such as coordinating node 110 and coordinating node 120, and multiple data nodes (DN), such as data node 130 and data node 140. , The data node 150 and the data node 160, the data node is deployed on a physical node (such as a server), each physical node can be deployed with one or more data nodes, for example, the data node 130 and the data node 140 are deployed on the physical node 170, The data node 150 is deployed on the physical node 180, and the data node 160 is deployed on the physical node 190. All data is distributed on the data nodes, and the data between the data nodes is not shared. When executing the business, the coordinating node receives the query request from the client and generates an execution plan and sends it to each data node. The data node is based on the received plan Perform initialization processing on the operators that need to be used (for example, data operation (stream) operators), and then execute the execution plan issued by the coordination node. The coordination node and the data node, as well as the data nodes in different physical nodes, are connected through a network channel, which can be a scalable transmission control protocol (STCP) and other communication protocols .

In the process of business execution, data interactions involving different data nodes are all performed by the stream operator. As shown in Figure 2, the data node 130 includes a service thread 131 and a stream thread 132, and the data node 140 includes a service thread 141. And stream thread 142. The Stream thread 132 can send the data stored in the data node 130 to the service thread 131 to be further sent to the coordination node 110, or can send the data directly to the service thread 141; in the same way, the stream thread 142 can send the data stored in the data node 140 The data is sent to the service thread 141 and then to the coordination node 110, or the data can be sent directly to the service thread 131.

For the application scenarios shown in Figure 1 and Figure 2 above, the data is distributed and stored in each data node. When the data needs to be calculated for the execution of the business, all the data involved in the business needs to be sorted, and then the sorted data Calculation. At present, data is first aggregated on a data node and then sorted. For example, data node 140, data node 150, and data node 160 aggregate their stored data on data node 130 by broadcasting, and data node 130 is in After completing the data aggregation, sort all the data. Due to the relatively large amount of data and the limited memory resources of the data node 130, the data node 130 may store part of the data in the disk when sorting, which will result in a large amount of data. The input/output (IO) overhead of the system affects the execution efficiency. The data node 130 calculates the sorted data after the sorting is completed. This calculation is only performed in the data node 130, and other data nodes have been idle after completing the broadcast data, which will cause serious uneven load. The computing power of 130 has become a bottleneck for business execution, and the execution efficiency of the entire distributed database 100 will be greatly reduced, and the distributed execution ability cannot be fully utilized.

In order to solve the above problems, this application provides a data calculation method and related equipment, which can redistribute ordered data from a single data node to other data nodes in a distributed database before calculation, so that other data nodes can be parallelized Perform the calculation process, make full use of the computing power of the distributed database, and improve computing efficiency and resource utilization.

The technical solutions of the embodiments of the present application can be applied to various scenarios that require data sorting and calculation in a distributed database.

In combination with the application scenarios shown in FIG. 1 and FIG. 2, refer to FIG. 3. FIG. 3 is a schematic flowchart of a data calculation method provided by an embodiment of the present application. As shown in Figure 3, the method includes but is not limited to the following steps:

S310: The target data node receives data related to the query sentence sent by other data nodes.

The query statement may be a statement expressed in a structured query language (SQL), for example, a SQL statement including a window function and an OVER expression.

Specifically, the target data node may be any data node in the distributed database. Correspondingly, other data nodes are data nodes in the distributed database that store data related to the query statement other than the target data node. Optionally, the target data node may be previously designated by the user, or may be a data node selected when executing the service.

For example, the target data node may be any data node in the distributed database shown in FIG. 1, such as the data node 130. Correspondingly, other data nodes include the data node 140, the data node 150, and the data node 160.

Data is stored in the form of a table in each data node, and other data nodes in the distributed database perform a base table scan on their stored data to determine which rows of data in which table needs to be sent to the target data node.

In a possible implementation manner, other data nodes in the distributed database respectively sort the local data and send the sorted data to the target data node.

Specifically, when the data node 130 is used as the target data node, the data node 140, the data node 150, and the data node 160 may sort the data to be sent to the target data node in advance, and then send the sorted data to the data node 130. . When the data node 140, the data node 150, and the data node 160 are sending data, they can send all sorted data to the data node 130 at one time; or, the data node 140, the data node 150, and the data node 160 are sending data. In the process, the sorted data can be sent to the data node 130 through multiple sending, for example, the data is sent according to a certain amount of data each time, and the amount of data sent each time can be set as needed. This application does not limit this.

It can be understood that the data node 140, the data node 150, and the data node 160 sort the local data before sending it to the data node 130, which can reduce the pressure of the data node 130 to sort the data as a whole, and reduce the memory overhead of the data node 130 , Improve execution efficiency.

S320: The target data node sorts the local data and the data received from other data nodes.

For example, after the data node 130 receives the data sent by the data node 140, the data node 150, and the data node 160, the data node 130 will receive the data from the data node 140, the data node 150, and the data node 160 together with the locally stored data. Perform the overall sorting of the data together. In this way, the data related to the SQL query statements (such as OVER expressions) executed by the distributed database are all in order.

Optionally, the data node 130 receives the ordered data sent by the data node 140, the data node 150, and the data node 160, and the data node 130 then sorts the received data as a whole to realize the global order of the data, thereby reducing IO overhead and memory Overhead, improve sorting efficiency.

S330: The target data node sends a plurality of sorted data to at least one data node, so that the at least one data node performs calculations related to the query sentence on the data respectively received.

Specifically, after the target data node finishes sorting the data related to the query sentence, it forms multiple ordered data sets based on the ordered data. The multiple data in each ordered data set are ordered. In addition, different data sets are ordered. The data in the ordered data set can be partially repeated (it can also be called data overlap, and some data in the two ordered data sets are the same), but not all of them (the data in the two ordered data sets completely overlap).

The target data node sends the data to the data nodes in the distributed database except the target data node according to the ordered data set. For example, if the data node 130 processes the data of an ordered data set, the data node 130 sends each remaining ordered data set to a different data node; if the data node 130 does not process the data of any ordered data set , The data node 130 sends each ordered data set to a different data node. In this way, the number of ordered data sets is equal to the number of data nodes that need to process all ordered data sets, and the data node that receives the ordered data set is responsible for performing query statement-related calculations on the received ordered data set.

If the number of ordered data sets is equal to the number of all data nodes in the distributed database, each data node will obtain an ordered data set. For example, the data node 130 sends data to all data nodes in the distributed database except the data node 130 according to an ordered data set, for example, sends the data to the data node 140, the data node 150, and the data node 160.

If the number of ordered data sets is less than the number of all data nodes in the distributed database, some data nodes will obtain ordered data sets. For example, the data node 130 sends data to some data nodes other than the data node 130 in the distributed database according to an ordered data set, for example, sends the data to the data node 140 and the data node 150.

Optionally, when the data node 130 sends data, it can detect the load condition of the data node or the physical node where the data node is located, so as to decide whether to send the data, so as to avoid sending the data to the data node with excessive load, which will affect the calculation. Efficiency and execution efficiency.

After at least one data node receives the data sent by the target data node, it can perform various calculations on the data, such as window function calculations, aggregate function calculations, and so on. For example, after the data node 140 receives an ordered data set sent by the data node 130, it directly performs a window function (for example, summation) calculation on the data contained in the ordered data set, and the data node 150 receives the data sent by the data node 130. After an ordered data set of, it also starts to perform window function calculation on the data contained in the received ordered data set. At this time, the data node 140 and the data node 150 perform the calculation of the window function in parallel.

It can be understood that the target data node sends ordered data to other data nodes so that other data nodes can perform calculations after receiving the data, which can make full use of the computing power of the distributed database and improve the calculation efficiency of the entire system.

In a possible implementation manner, the target data node sends at least one different data to different data nodes among the multiple data nodes of the distributed database.

Specifically, when the target data node divides the ordered data into an ordered data set, each adjacent data ordered data set is completely connected. For some specific business requirements, for example, in the process of calculating data , The data calculation of the current row depends on the data of multiple rows before or after multiple rows. Therefore, when the target data node divides the ordered data set, there may be partially repeated data in the adjacent ordered data set. For some data, Exist in two ordered data sets at the same time, but two adjacent ordered data sets need to ensure that there is at least one different data to avoid repeated calculations on the same data to improve the utilization of computing resources in the distributed system , Improve calculation efficiency.

Optionally, the target data node can send all sorted data to other data nodes, so that other data nodes can complete the calculation, and the target data node can also keep a part of the sorted data locally. The data is calculated. It is easy to understand that the data node 130 also participates in the data calculation process, which can make full use of the computing power of the distributed database and further improve the computing efficiency.

In a possible implementation manner, the target data node determines N partitions based on the sorted data of the target data node, and different partitions in the N partitions include at least one different data, and N is an integer greater than 1. , And the N is less than or equal to the number of data nodes in the distributed database; the target data node sends to each of the N data nodes in the distributed database except for the target data node in the N partitions Data of a partition.

Specifically, the target data node composes the sorted data into N partitions, and the data in each partition is ordered. The partition here is different from the partition concept in data storage, which means that the sorted data is logically sorted. A part of the data is intercepted to form a partition, the order of the intercepted data does not change, and the number of partitions is less than or equal to the number of data nodes in the distributed database. Optionally, the target data node performs average interception on the sorted data to obtain N partitions, and the amount of data contained in each partition is the same; of course, the average interception may not be performed, and the obtained N partitions The amount of data is not exactly the same. In addition, the number of partitions N can be equal to the number of data nodes in the distributed database. When the target data node sends data, it will send data from one of the N partitions to each data node, and different partitions will send different data. On the node; the number of partitions N can also be less than the number of data nodes in the distributed database. When the target data node sends data, it can select N data nodes with a smaller load from other data nodes and send to each data node Data of one of the N partitions. In particular, there may be repeated partial data between adjacent partitions, and the amount of repeated data may be the same or different, but the data between any two adjacent partitions cannot be completely the same.

In a possible implementation manner, the target data node composes the sorted data into N partitions according to the total amount of data T and the data overlap interval, and the amount of data contained in each partition is calculated by the target data node.

Specifically, when the data overlap interval is 0 and there is no data overlap between two adjacent partitions, the target data node does not need to consider the data overlap of each partition when dividing the data, and directly composes the total amount of data T evenly as N partitions, the amount of data in each partition is T divided by N rows, where the target data node can obtain the total amount of data T when receiving and sorting the data sent by each other data.

When the data overlap interval is not 0 and there is data overlap between two adjacent partitions, the target data node needs to consider the data overlap interval between each partition when dividing the data. According to the different data overlap interval, the obtained N The partitions are also not the same. The following examples provide several implementation examples of forming overlapping intervals.

1. The data overlaps forward, and the overlap interval is x rows.

Specifically, when the target data node is partitioned, in addition to the first partition, other partitions need to consider overlapping x rows with the previous partition. As shown in Figure 4, it can be seen that the data volume of the first partition is larger than The amount of data in other partitions is small, and the amount of data allocated by each partition is calculated according to the following formula 1:

2. The data overlaps backward, and the overlap interval is y rows.

Specifically, when the target data node is partitioned, except for the last partition, other partitions need to consider overlapping y rows with the latter partition. As shown in Figure 5, it can be seen that the data volume of the last partition is larger than that of other partitions. The amount of data is small, and the amount of data allocated by each partition is calculated according to the following formula 2:

3. The data has both forward and backward overlap. The forward overlap interval is x rows and the backward overlap interval is y rows.

Specifically, when the target data node is partitioned, for the first partition, it needs to consider that it overlaps with the next partition by y rows, for the last partition, it needs to consider that it overlaps with the previous partition by x rows, and other partitions need to consider both. As shown in Figure 6, the amount of data allocated by each partition is calculated according to the following formula 3:

The above is for the case where there is only one function. When there are multiple functions at the same time, that is, there are multiple forward overlapping intervals or backward overlapping intervals, and the size of each overlapping interval is inconsistent, the target data node will be selected when partitioning The largest x value or the largest y value is partitioned.

It should be understood that when there is a data overlap interval between adjacent partitions, the data of the same row may need to be sent to multiple other data nodes at the same time. Therefore, it is necessary to divide the data interval, calculate which data interval each row of data falls into, so as to determine the partition corresponding to the row of data, and finally determine which data node should be sent to.

Exemplarily, taking the above figure 5 as an example, assuming that there are 5 data nodes in the distributed database, each partition is [1, T/5+y], [T/5+1, 2T/5+y ], [2T/5+1, 3T/5+y], [3T/5+1, 4T/5+y], [4T/5+1, T], where the overlap interval is [T/5+1 ,T/5+y], [2T/5+1,2T/5+y], [3T/5+1,3T/5+y], [4T/5+1,4T/5+y]. According to the overlap interval, the target data node divides the sorted data into multiple data intervals. As shown in Figure 7, all data is divided into 9 data intervals. For each data interval, the target data node takes the data interval of the data interval. The first row is calculated, and the first row of each data interval is compared with the first row of the overlapping partition. If the value corresponding to the first row of data in the data interval is greater than or equal to the value corresponding to the first row of data in a certain partition, the The data interval is sent to the same data node in the partition. If the value corresponding to the first row of data in the data interval is less than the value corresponding to the first data in the partition, the data interval does not need to be sent to the data node in the same partition. For example, for the data interval 2[T/5+1, T/5+y], the value corresponding to the first row of data is T/5+1, which is the overlapping interval of partition 1 and partition 2, and the value of partition 1 The value corresponding to the first row of data is 1, so the data interval [T/5+1, T/5+y] is sent to the same data node in partition 1, and the value corresponding to the first row of partition 2 is T/5+ 1. The value corresponding to the first row of data in the data interval [T/5+1, T/5+y] is equal, so the data interval [T/5+1, T/5+y] is also sent to the partition 2 to be the same Data node.

It should be understood that for forward overlap, the target data node divides the sorted data into multiple data intervals according to the same method as described above. For each data interval, the target data node overlaps the last row of the data interval with each other. The last rows of the partitions are compared, and if the value corresponding to the last row data of the data interval is less than or equal to the value corresponding to the last row data of a certain partition, the data interval is sent to the same data node of the partition. In the same way, for other situations, such as a situation that includes both a forward overlapping interval and a backward overlapping interval, the target data node can also divide the data interval according to the same method and compare and determine it. For the sake of brevity, it will not be repeated here.

It is worth noting that the value of the overlap interval, that is, the value of x or y mentioned above is much smaller than the result of dividing T by N. If the value of x or y is close to the result of dividing T by N, or even greater than the result of dividing T by N , It will increase the system overhead and network transmission overhead. At this time, it is no longer suitable to send the sorted data to other data nodes for processing. In this case, other solutions can be used to perform the sorting on the sorted data. Calculation, for example, the target data node calculates the sorted data.

In a possible implementation manner, the target data node determines the data sending order according to the number of the physical node; the target data node sends multiple data sorted by the target data node to other data nodes according to the number of the physical node. The physical node corresponding to the code of the physical node includes at least one data node in the distributed database.

Specifically, after the target data node performs partition processing on the sorted data, it needs to further determine the partition sending order to ensure that all partitions can be accurately sent to other data nodes in the determined order.

Because in a distributed database, multiple data nodes are usually deployed in a physical machine. If the target data node sends partitions according to the data node number, it may result in a period of time that the data nodes that receive the partitions sent by the target data node are all deployed Data nodes on the same physical machine will cause the physical machine to be overloaded and slow in execution speed, while other physical machines are in an idle state, which cannot make full use of the resources of the distributed system and affects the execution efficiency of the entire system.

Therefore, when the target data node determines the partition sending order, it is determined according to the number of the physical node. When the target data node sends the partition, it is sent to all other data nodes in the order determined by the number of the physical node.

Exemplarily, as shown in FIG. 8, there are a physical machine 810, a physical machine 820, and a physical machine 830. A data node 811 and a data node 812 are deployed in the physical machine 810, and a data node 821 and a data node 822 are deployed in the physical machine 820. , A data node 831 and a data node 832 are deployed in the physical machine 830. The target data node determines the sending order according to the number of the physical machine. Since it is necessary to ensure the maximum utilization of each physical machine in the distributed system and improve the execution efficiency, the determined sending order is: data node 811, data node 821, data node 831, Data node 812, data node 822, and data node 832. That is, the target data node first sends the partition 1 to the data node 811, and then sends the partition 2 to the data node 821, and sends all the partitions to the corresponding data node in the order determined above.

It should be understood that Figure 8 shows a scenario where data nodes are evenly distributed across physical nodes. When the data nodes are not evenly distributed, some physical nodes are deployed with multiple data nodes, and some physical nodes are deployed with fewer data nodes. When the target data node first sends the partitions to the data nodes deployed in each physical node according to the number of the physical node, when all the data nodes in the physical nodes with fewer data nodes have received the partition sent by the target data node, The target data node continues to send partitions to the data nodes that have not received data among the physical nodes where more data nodes are deployed, until all the partitions are sent. Of course, the partition sending order can also be determined in other ways, which is not limited in this application.

In a possible implementation manner, the target data node respectively sends a plurality of sorted data to other data nodes, so that the other data nodes perform the calculation of the window function of the query statement on the data they respectively receive.

Specifically, after receiving the partition sent by the target data node, other data nodes calculate the window function of the query statement for the data in the partition. The window function may be a sum function (sum), an average function (avg), etc., This application does not limit this.

It can be seen that when there is data overlap, the amount of data received by each data node may be different, but the amount of data involved in the calculation is uniform, and it is the result of dividing T by N in order to make full use of the distribution The computing power of the formula system does not require repeated calculations for the same data. In addition, when there is data overlap, when the window function is running, its corresponding state information will also record the values of the forward overlap interval and the backward overlap interval, that is, the values of x and y. The stream thread in the target data node is When sending the partition to other data nodes, it will also send the identification of the first data node and the last data node corresponding to the determined sending order, and the amount of data that each data node needs to process (that is, T divided by N rows) . For example, when there is a forward data overlap, except for the first data node, after receiving the partition, all other data nodes skip the overlap interval (for example, x rows) and start the calculation. It should be understood that although there is no need to calculate the overlap interval , But the calculation of the following data needs to rely on the forward overlap interval (x rows); when there is backward data overlap, after all data nodes receive the partition, only the previous T/N rows are calculated. For the overlap interval (such as y row) ) Does not need to be calculated, but the calculation of the previous data needs to rely on the backward overlap interval (y row).

In particular, when the distributed database is executing SQL statements, the SQL statements contain multiple window functions at the same time, and the target data node needs to perform sorting on the sorted data according to the forward overlap interval and the backward overlap interval contained in all window functions. Partition processing, after other data nodes receive the partition sent by the target data node, they also need to be processed separately for different window functions.

For example, the SQL statement contains 3 window functions, the forward overlap interval of window function 1 is 2, the backward overlap interval is 0, the forward overlap interval of window function 2 is 5, the backward overlap interval is 0, and the window function 3 The forward overlap interval is 0, and the backward overlap interval is 4. When the target data node is partitioned, it will select the maximum value of the forward overlap interval and the backward overlap interval of the three window functions for partitioning, that is, select the forward overlap interval of window function 2 and the backward overlap of window function 3 The interval is partitioned. The data volume of the first partition is T/N+4 rows, the data volume of the last partition is T/N+5 rows, and the data volume of other partitions is T/N+5+4 rows. After the target data node completes the partitioning, each partition is sent to the corresponding other data nodes, and at the same time, the amount of data that each data node needs to process is also sent to each data node. Each data node starts to calculate the window function after receiving the partition. When calculating the window function 1, the overlap interval is not 2, but 5, so it is necessary to skip 5 lines to start the calculation, and the calculation of T/N line ends, and at the same time In order to ensure the accuracy of subsequent window function calculations, after the calculation of window function 1 is completed, the values of the forward overlap interval and the backward overlap interval are retained when the result is output. When calculating the window function 2, according to the value of the forward overlap interval, skip 5 lines to start the calculation, and calculate the T/N line to end. Since the calculation of the window function does not need to consider the forward overlap interval, after the calculation of the window function 2 is completed , Ignore the value of the forward overlap interval when outputting the result, and only keep the value of the backward overlap interval. When calculating window function 3, since the value of the forward overlap interval has been ignored, the calculation starts from the first row of each partition, and the calculation of T/N row ends. Since there is no subsequent window function to calculate, it is necessary to calculate the window function. After the calculation of function 3 is completed, the value of the backward overlap interval is ignored when outputting the result.

The state information corresponding to the window function will record the changes in the forward overlap interval and the backward overlap interval when the window function is running. In the process of executing the three window functions included in the SQL statement, the state information transformation is as follows As shown in Table 1:

Table 1

It can be seen that the state information corresponding to the window function ensures that each window function can be calculated and executed correctly by recording control information (that is, the forward overlap interval and the backward overlap interval, and whether it is deleted after the calculation is completed).

It should be understood that when the amount of data distributed to each data node for calculation (ie T/N rows) is large, and the no data overlap interval or the data overlap interval (ie x or y) is small, the method shown in Figure 3 is used. Parallel computing can be realized, the computing power and system resources of the distributed system can be fully utilized, and the computing efficiency can be improved. However, when the amount of data distributed to each data node for calculation is small, and the data overlap interval is large, a large amount of additional network transmission overhead will be generated, which may cause the transmission time to exceed the calculation time of each data node, which seriously affects the calculation efficiency . Therefore, before implementing the method shown in FIG. 3, for different practical application scenarios, cost estimation needs to be performed, that is, whether the method provided in this application is better than the existing solution.

Specifically, the execution time is used to characterize the cost. In the existing solution, one data node executes the entire calculation process, and the cost is the time required for a single data node to execute the window function. This application partitions the sorted data and sends it to Multiple data nodes are calculated in parallel, so the cost only needs to consider the time of the target data node's sending partition and the time of other data nodes receiving and calculating the partition. Compare the cost required by the existing scheme with the cost required by this application. When the difference between the cost required by the existing scheme and the cost required by this application is greater than 0, the scheme provided by this application should be selected, otherwise it is required Choose an existing plan. The difference between the costs of the two schemes can be calculated according to the following formula 4:

ΔA=A-B-(C+A)/N Formula 4

Among them, ΔA represents the difference between the costs required by the two schemes, A represents the cost required by the existing scheme (that is, the time required for a single data node to process all the data volume T), and B represents the target data node to send all partitions. Time, C represents the time required for other data nodes to receive the partition, and N represents the number of data nodes contained in the distributed database.

When there are multiple window functions that can share partition data and perform calculations, for example, in the scenario corresponding to Table 1 above, the difference between the costs of the two solutions can be calculated according to the following formula 5:

ΔA=n*A-B-(C+A)/N－(n-1)*A/N Formula 5

Among them, the meanings represented by parameters such as ΔA, A, B, C, and N are consistent with those in the above formula 4, and n represents the number of window functions in the SQL statement.

It can be seen that the above formula 4 and formula 5 can be used to estimate the cost, and an appropriate solution can be selected for calculation according to the estimation result, so that the calculation efficiency of the entire system is guaranteed.

In order to further illustrate the data calculation method described in FIG. 3, the following will be specifically described in conjunction with specific SQL query statements. Assuming that there are two data nodes in the cluster, DN1 and DN2, respectively, deployed on different physical machines, the data table (tt01) and the storage of the data in the data table in DN1 and DN2 are shown in Figure 9. The SQL statement that the distributed database needs to execute is: select a, b, c, sum(b) over (order by b rows 2 preceding) from tt01, that is, to execute the SQL statement, you need to first perform the data in the data table tt01 according to b The columns are sorted in full, and the forward overlap interval is 2, and then the sum of the first two rows of column b to the current row is calculated.

First, DN1 and DN2 scan the data, and perform partial sorting according to column b. The sorting results are shown in Table 2 below:

Table 2

DN1DN1	123123	DN2DN2	267267
To	145145	To	289289

Then, each DN sends the data to the target data node, and randomly selects DN1 as the target data node, then DN2 needs to send the sorted data to DN1, and DN1 merges and sorts the received data so that all the data that needs to participate in the calculation is Is orderly. The results of DN1 sorted by column b are shown in Table 3 below:

table 3

DN1DN1	123123
To	145145
To	267267
To	289289

Next, DN1 needs to partition the sorted data, determine each partition, and send the partitions to the corresponding data nodes in order. Since the total amount of data T is 4 rows, the number of data nodes N is 2, and the forward overlap interval is 2 rows, the amount of data to be calculated for each data node = T/N, which is 2 rows, can be calculated. The data is partitioned to obtain two partitions, partition 1 is the first row and the second row, partition 2 is the first row to the fourth row (that is, all rows), and the overlapping interval is the first row and the second row. After DN1 is partitioned, determine the partition sending order. Since DN1 and DN2 are deployed on different physical machines, the determined sending order is: send partition 1 to DN1, and send partition 2 to DN2.

Due to the forward overlap interval, DN1 needs to determine the data node to which each row of data needs to be sent. DN1 divides all data into two data intervals according to the forward overlap interval, as shown in Table 4 below:

Table 4

For data interval 1, the number corresponding to the first row of data is 1, the number 2 corresponding to the last row of data smaller than partition 1, and the number 4 corresponding to the last row of data smaller than partition 2, so data interval 1 is sent to DN1 and DN2 For data interval 2, the number corresponding to the first row of data is 3, which is greater than the number 2 corresponding to the last row of partition 1 but less than the number 4 corresponding to the last row of partition 2, so the data interval 2 is sent to DN2.

After DN1 finishes sending data, the data received by each DN is shown in Table 5 below:

table 5

DN1 and DN2 perform parallel calculations on the received data, and calculate the sum from the first two rows of column b to the current row. The calculation results are shown in Table 6 below:

Table 6

DN1DN1	123123	22	DN2DN2	267267	1212
To	145145	66	To	289289	1818

It can be seen that for DN2, the data in the forward overlap interval is referred to in the calculation process, but the data in the forward overlap interval is not calculated, so as to avoid double calculation and waste the computing resources of the distributed system. Computational efficiency.

Finally, after all the DNs are calculated, the calculation results can be sent to the coordinating node CN to obtain the final execution results of the SQL statement, as shown in Table 7 below:

Table 7

It is easy to understand that this application sends globally ordered data to each data node to make full use of the computing power of each data node in the distributed system, which can avoid the bottleneck caused by the calculation of a single data node, so that the calculation of data can be Parallel execution improves calculation and execution efficiency.

The foregoing describes the methods of the embodiments of the present application in detail. In order to facilitate better implementation of the above solutions of the embodiments of the present application, correspondingly, the following also provides related equipment for cooperating with the implementation of the foregoing solutions.

Refer to FIG. 10, which is a schematic structural diagram of a data storage device provided by an embodiment of the present application. As shown in FIG. 10, the data storage device 10 includes a receiving unit 11, a processing unit 12 and a sending unit 13. in,

The receiving unit 11 is configured to receive data related to query sentences sent by other data nodes in the distributed database.

Specifically, the receiving unit 11 shown is configured to perform the foregoing step S310, and optionally perform optional methods in the foregoing steps.

The processing unit 12 is configured to sort the local data and the data received from the other data nodes.

Specifically, the processing unit 12 shown is configured to execute the aforementioned step S320, and optionally execute optional methods in the aforementioned steps.

The sending unit 13 is configured to send a plurality of sorted data to at least one data node in the distributed database, so that the at least one data node performs calculations related to the query sentence on the data respectively received.

Specifically, the sending unit 13 shown is configured to perform the foregoing step S330, and optionally perform optional methods in the foregoing steps.

In a possible implementation manner, the receiving unit 11 is specifically configured to: receive data sent by other data nodes after sorting the local data.

In a possible implementation manner, the sending unit 13 is specifically configured to send at least one different data to different data nodes among the multiple data nodes in the distributed database.

In a possible implementation manner, the processing unit 12 is further configured to perform calculations related to the query sentence on the data that is not sent to the at least one data node in the sorted data.

In a possible implementation manner, the processing unit 12 is further configured to determine N partitions based on the sorted data, different partitions of the N partitions include at least one different data, and N is An integer greater than 1, and the N is less than or equal to the number of data nodes in the distributed database; the sending unit 13 is specifically configured to: except for the target among the N data nodes of the distributed database Each data node other than the data node sends data of one of the N partitions.

In a possible implementation manner, the processing unit 12 is specifically configured to obtain the N partitions based on the sorted data according to the total amount of data and the data overlap interval of the sorted data.

In a possible implementation manner, the sending unit 13 is specifically configured to send the sorted multiple data to the at least one data node according to the number of the physical node, and the number of the physical node corresponds to The physical nodes include at least one data node in the distributed database.

It should be noted that the structure of the above-mentioned data storage device and the process of using the data storage device to redistribute data to achieve parallel calculation of data are only an example, and should not constitute a specific limitation. Each of the data storage devices can be modified as needed. Units are added, reduced or merged. In addition, the operation and/or function of each module in the data storage device is to implement the corresponding process of the method described in FIG. 3 above, and is not repeated here for brevity.

Refer to FIG. 11, which is a schematic structural diagram of a computing device provided by an embodiment of the present application. As shown in FIG. 11, the computing device 20 includes a processor 21, a communication interface 22 and a memory 23. The processor 21, the communication interface 22 and the memory 23 are connected to each other through an internal bus 24. It should be understood that the computing device may be a database server.

The computing device 20 may be the physical node 170 where the data node 130 and the data node 140 are deployed in FIG. 1. The functions performed by the target data node in FIGS. 1, 2 and 3 are actually performed by the processor 21 of the computing device.

The processor 21 may be composed of one or more general-purpose processors, such as a central processing unit (CPU), or a combination of a CPU and a hardware chip. The aforementioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The above-mentioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL), or any combination thereof.

The bus 24 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus 24 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used to represent in FIG. 11, but it does not mean that there is only one bus or one type of bus.

The memory 23 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM); the memory 23 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory). Only memory (ROM), flash memory (flash memory), hard disk drive (HDD), or solid-state drive (SSD); the memory 23 may also include a combination of the above types. The program code may be used to implement the functional units shown in the data storage device 10, or to implement the method steps in the method embodiment shown in FIG. 3 with the target data node as the execution subject.

The embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, it can implement part or all of the steps of any one of the above method embodiments, and realize the above The function of any one of the functional units described in Figure 10.

The embodiments of the present application also provide a computer program product, which when it runs on a computer or a processor, enables the computer or the processor to execute one or more steps in any of the foregoing methods. If each component unit of the aforementioned equipment is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in the computer readable storage medium.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

It should also be understood that in various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not be implemented in this application. The implementation process of the example constitutes any limitation.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions recorded in the embodiments are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present application.

Claims

A method of data calculation, characterized in that it includes:

The target data node in the distributed database receives data related to query sentences sent by other data nodes in the distributed database;

The target data node sorts the local data and the data received from the other data nodes;

The target data node sends a plurality of sorted data to at least one data node in the distributed database, so that the at least one data node performs calculations related to the query sentence on the data respectively received.
The method of claim 1, wherein the method comprises:

Each of the other data nodes sorts the local data, and sends the sorted data to the target data node.
The method according to claim 1 or 2, wherein the sending, by the target data node, a plurality of sorted data to the at least one data node comprises:

The target data node sends at least one different piece of data to different data nodes among the multiple data nodes in the distributed database.
The method according to any one of claims 1 to 3, wherein the method further comprises:

The target data node performs calculations related to the query sentence on data that is not sent to the at least one data node in the sorted data.
The method according to any one of claims 1 to 4, wherein the method comprises:

The target data node determines N partitions based on the sorted data of the target data node, different partitions in the N partitions include at least one different data, the N is an integer greater than 1, and the N is less than Or equal to the number of data nodes in the distributed database;

The sending, by the target data node, the sorted data to the at least one data node includes:

The target data node sends data of one of the N partitions to each of the N data nodes of the distributed database except the target data node.
The method of claim 5, wherein the target data node determining N partitions based on the sorted data of the target data node comprises:

The target data node obtains the N partitions based on the sorted data of the target data node according to the total amount of data and the data overlap interval of the sorted data of the target data node.
The method according to any one of claims 1 to 6, wherein the sending, by the target data node, a plurality of sorted data to the other data node comprises:

The target data node sends a plurality of sorted data of the target data node to the at least one data node according to the number of the physical node, and the physical node corresponding to the number of the physical node includes the distributed database At least one data node.
The method according to any one of claims 1 to 7, wherein the at least one data node performs calculations related to the query sentence on the data respectively received, comprising:

The at least one data node performs calculation of the window function of the query sentence on the data respectively received.
A data storage device is characterized in that it comprises:

The receiving unit is used to receive data related to query sentences sent by other data nodes in the distributed database;

A processing unit for sorting the local data and the data received from the other data nodes;

The sending unit is configured to send a plurality of sorted data to at least one data node in the distributed database, so that the at least one data node performs calculations related to the query sentence on the data respectively received.
9. The data storage device according to claim 9, wherein the receiving unit is specifically configured to:

Receive data sent after other data nodes sort the local data.
The data storage device according to claim 9 or 10, wherein the sending unit is specifically configured to:

At least one different piece of data is sent to different data nodes among the multiple data nodes in the distributed database.
The data storage device according to any one of claims 9 to 11, wherein:

The processing unit is further configured to perform calculations related to the query sentence on the data that is not sent to the at least one data node in the sorted data.
The data storage device according to any one of claims 9 to 12, wherein:

The processing unit is further configured to determine N partitions based on the sorted data, different partitions of the N partitions include at least one different data, the N is an integer greater than 1, and the N is less than Or equal to the number of data nodes in the distributed database;

The sending unit is specifically used for:

Sending the data of one of the N partitions to each of the N data nodes of the distributed database except the target data node.
The data storage device according to claim 13, wherein the processing unit is specifically configured to:

According to the total amount of data and the data overlap interval of the sorted data, the N partitions are obtained based on the sorted data.
The data storage device according to any one of claims 9 to 14, wherein the sending unit is specifically configured to:

Send the sorted multiple data to the at least one data node according to the serial number of the physical node, and the physical node corresponding to the serial number of the physical node includes at least one data node in the distributed database.
The data storage device according to any one of claims 9 to 15, wherein the at least one data node calculates the window function of the query sentence on the data respectively received.
A computing device, wherein the computing device includes a processor and a memory, and the processor executes computer instructions stored in the memory, so that the computing device executes the method according to any one of claims 1 to 8 .
A computer storage medium, wherein the computer storage medium stores a computer program, and the computer program implements the method according to any one of claims 1 to 8 when executed by a computing device.
A computer program product, the computer program product comprising computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute the method according to any one of claims 1 to 8.