WO2021147815A1 - Data calculation method and related device - Google Patents

Data calculation method and related device Download PDF

Info

Publication number
WO2021147815A1
WO2021147815A1 PCT/CN2021/072472 CN2021072472W WO2021147815A1 WO 2021147815 A1 WO2021147815 A1 WO 2021147815A1 CN 2021072472 W CN2021072472 W CN 2021072472W WO 2021147815 A1 WO2021147815 A1 WO 2021147815A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
node
data node
nodes
sorted
Prior art date
Application number
PCT/CN2021/072472
Other languages
French (fr)
Chinese (zh)
Inventor
胡梦春
李茂增
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021147815A1 publication Critical patent/WO2021147815A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Definitions

  • This application relates to the field of distributed storage technology, and in particular to a method of data calculation and related equipment.
  • Window function is a special type of function in structured query language (SQL). Similar to aggregate function, the input of window function is also a multi-line record. The window function acts on a window, and the window is a multi-line record defined by an OVER expression. The window function is used together with the OVER expression. The OVER expression is used to group data and sort the elements in the group; and the window function is used to process the values in the group, such as aggregation, generating sequence numbers, and so on.
  • SQL structured query language
  • a distributed database data is distributed and stored in various data nodes.
  • a single data node completes data collection, sorting, and calculation. Due to the limited computing resources of a single data node, it will cause a computing bottleneck and reduce computing efficiency.
  • the embodiment of the present invention discloses a data calculation method and related equipment, which can make full use of distributed calculation capabilities, avoid calculation bottlenecks caused by a single data node, and improve calculation efficiency.
  • the present application provides a data calculation method, the method includes: a target data node in a distributed database receives data related to query sentences sent by other data nodes in the distributed database; the target data The node sorts the local data and the data received from other data nodes; the target data node sends a plurality of sorted data to at least one data node in the distributed database, so that the at least one data node sorts the data received respectively The data performs calculations related to the query sentence.
  • the target data node collects and sorts the data related to the query sentence to obtain the sorted data, and sends the sorted ordered data to other data nodes in the distributed database, so that Other data nodes can perform calculations related to query statements in parallel, which can avoid the bottleneck caused by calculation by a single data node and make full use of distributed computing capabilities to improve resource utilization and computing efficiency.
  • the other data nodes each sort the local data, and send the sorted data to the target data node.
  • other data nodes sort the local data before sending data to the target data node. This can reduce the sorting pressure of the target data node, reduce the memory overhead of the target data node, and improve execution efficiency.
  • the target data node sends at least one different data to different data nodes among the multiple data nodes in the distributed database.
  • the target data node sends the sorted data to different data nodes, and the data received by each data node is not exactly the same, which can ensure that all data nodes that receive the data are Can participate in calculations related to query statements to improve calculation efficiency.
  • the target data node performs query sentence-related calculations on data that is not sent to the at least one data node in the sorted data.
  • the target data node can also participate in calculations related to query sentences, which can further improve the calculation efficiency and make full use of the computing resources of the distributed database.
  • the target data node determines N partitions based on the sorted data of the target data node, different partitions of the N partitions include at least one different data, N is an integer greater than 1, and N is less than or equal to the number of data nodes in the distributed database; the target data node sends data of one of the N partitions to each of the N data nodes of the distributed database except the target data node.
  • the target data node composes the sorted data into N partitions, and sends data from one of the N partitions to each data node participating in the calculation, ensuring that each data node can receive one Partition and calculate.
  • the target data node obtains the N partitions based on the sorted data of the target data node according to the total amount of data and the data overlap interval of the sorted data of the target data node.
  • the target data node when the target data node composes N partitions of sorted data, the two factors of the total amount of data and the data overlap interval can be considered at the same time, so as to improve the rationality and accuracy of composing N partitions.
  • the target data node sends multiple data sorted by the target data node to at least one data node according to the number of the physical node, and the physical node corresponding to the number of the physical node includes the distribution At least one data node in the database.
  • the target data node sends the sorted data to multiple data nodes according to the number of the physical node, which can avoid sending a large amount of data to the same physical node in a short time and improve the resource utilization of the physical node Rate and the execution efficiency of the entire system.
  • the at least one data node performs calculation of the window function of the query sentence on the data respectively received.
  • each data node can perform various query sentence-related calculations, such as the calculation of the window function of the query sentence, with respect to the data received by each data node.
  • the present application provides a data storage device, including: a receiving unit for receiving data related to query sentences sent by other data nodes in a distributed database; a processing unit for processing local data and slaves The data received by the other data nodes are sorted; a sending unit is configured to send a plurality of sorted data to at least one data node in the distributed database, so that the at least one data node performs the data on the data received by each Calculations related to query statements.
  • the receiving unit is specifically configured to: receive data sent after other data nodes sort the local data.
  • the sending unit is specifically configured to send at least one different data to different data nodes among the multiple data nodes in the distributed database.
  • the processing unit is further configured to perform calculations related to the query sentence on data that is not sent to the at least one data node in the sorted data.
  • the processing unit is further configured to determine N partitions based on the sorted data, and different partitions of the N partitions include at least one different data, and the N is greater than An integer of 1, and the N is less than or equal to the number of data nodes in the distributed database; the sending unit is specifically configured to: except the target data node among the N data nodes in the distributed database Each other data node sends data of one of the N partitions.
  • the processing unit is specifically configured to obtain the N partitions based on the sorted data according to the total amount of data and the data overlap interval of the sorted data.
  • the sending unit is specifically configured to send the sorted multiple data to the at least one data node according to the number of the physical node, and the number of the physical node corresponds to The physical node includes at least one data node in the distributed database.
  • the at least one data node performs calculation of the window function of the query sentence on the data respectively received.
  • the present application provides a computing device.
  • the computing device includes a processor and a memory, and the processor executes computer instructions stored in the memory, so that the computing device executes the first aspect described above and in combination with the first aspect.
  • any one of the implementation methods are possible.
  • the present application provides a computer storage medium that stores a computer program that, when executed by a computing device, implements any one of the foregoing first aspect and a combination of the foregoing first aspect Way of realization.
  • the present application provides a computer program product.
  • the computer program product includes computer instructions.
  • the computing device can execute the above-mentioned first aspect and in combination with the above-mentioned first aspect. Any one of the methods in the implementation.
  • FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application
  • Figure 2 is a schematic diagram of a data interaction provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a data calculation method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of forward overlap of data provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of backward overlap of data provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of data provided by an embodiment of the present application in which forward overlap and backward overlap exist at the same time;
  • FIG. 7 is a schematic diagram of a data interval division provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of determining a data sending order provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a data storage provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a data storage device provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • Stream broadcast is a data transmission method in distributed databases, which means that data is sent from one data node (source data node) to other data nodes (target data node).
  • Stream redistribute is also a data transmission method in distributed databases, which means that the source data node calculates a hash value according to the connection condition, and sends the data to the corresponding target data node according to the calculated hash value.
  • Figure 1 shows a possible application scenario of an embodiment of the present application.
  • the distributed database 100 includes multiple coordinator nodes (CN), such as coordinating node 110 and coordinating node 120, and multiple data nodes (DN), such as data node 130 and data node 140.
  • CN coordinator nodes
  • DN data nodes
  • the data node 150 and the data node 160, the data node is deployed on a physical node (such as a server), each physical node can be deployed with one or more data nodes, for example, the data node 130 and the data node 140 are deployed on the physical node 170,
  • the data node 150 is deployed on the physical node 180, and the data node 160 is deployed on the physical node 190. All data is distributed on the data nodes, and the data between the data nodes is not shared.
  • the coordinating node When executing the business, the coordinating node receives the query request from the client and generates an execution plan and sends it to each data node.
  • the data node is based on the received plan Perform initialization processing on the operators that need to be used (for example, data operation (stream) operators), and then execute the execution plan issued by the coordination node.
  • the coordination node and the data node, as well as the data nodes in different physical nodes, are connected through a network channel, which can be a scalable transmission control protocol (STCP) and other communication protocols .
  • STCP scalable transmission control protocol
  • the data node 130 includes a service thread 131 and a stream thread 132
  • the data node 140 includes a service thread 141.
  • the Stream thread 132 can send the data stored in the data node 130 to the service thread 131 to be further sent to the coordination node 110, or can send the data directly to the service thread 141; in the same way, the stream thread 142 can send the data stored in the data node 140
  • the data is sent to the service thread 141 and then to the coordination node 110, or the data can be sent directly to the service thread 131.
  • the data is distributed and stored in each data node.
  • all the data involved in the business needs to be sorted, and then the sorted data Calculation.
  • data is first aggregated on a data node and then sorted.
  • data node 140, data node 150, and data node 160 aggregate their stored data on data node 130 by broadcasting, and data node 130 is in After completing the data aggregation, sort all the data. Due to the relatively large amount of data and the limited memory resources of the data node 130, the data node 130 may store part of the data in the disk when sorting, which will result in a large amount of data.
  • the input/output (IO) overhead of the system affects the execution efficiency.
  • the data node 130 calculates the sorted data after the sorting is completed. This calculation is only performed in the data node 130, and other data nodes have been idle after completing the broadcast data, which will cause serious uneven load.
  • the computing power of 130 has become a bottleneck for business execution, and the execution efficiency of the entire distributed database 100 will be greatly reduced, and the distributed execution ability cannot be fully utilized.
  • this application provides a data calculation method and related equipment, which can redistribute ordered data from a single data node to other data nodes in a distributed database before calculation, so that other data nodes can be parallelized Perform the calculation process, make full use of the computing power of the distributed database, and improve computing efficiency and resource utilization.
  • FIG. 3 is a schematic flowchart of a data calculation method provided by an embodiment of the present application. As shown in Figure 3, the method includes but is not limited to the following steps:
  • the target data node receives data related to the query sentence sent by other data nodes.
  • the query statement may be a statement expressed in a structured query language (SQL), for example, a SQL statement including a window function and an OVER expression.
  • SQL structured query language
  • the target data node may be any data node in the distributed database.
  • other data nodes are data nodes in the distributed database that store data related to the query statement other than the target data node.
  • the target data node may be previously designated by the user, or may be a data node selected when executing the service.
  • the target data node may be any data node in the distributed database shown in FIG. 1, such as the data node 130.
  • other data nodes include the data node 140, the data node 150, and the data node 160.
  • Data is stored in the form of a table in each data node, and other data nodes in the distributed database perform a base table scan on their stored data to determine which rows of data in which table needs to be sent to the target data node.
  • other data nodes in the distributed database respectively sort the local data and send the sorted data to the target data node.
  • the data node 140, the data node 150, and the data node 160 may sort the data to be sent to the target data node in advance, and then send the sorted data to the data node 130. .
  • the data node 140, the data node 150, and the data node 160 are sending data, they can send all sorted data to the data node 130 at one time; or, the data node 140, the data node 150, and the data node 160 are sending data.
  • the sorted data can be sent to the data node 130 through multiple sending, for example, the data is sent according to a certain amount of data each time, and the amount of data sent each time can be set as needed. This application does not limit this.
  • the data node 140, the data node 150, and the data node 160 sort the local data before sending it to the data node 130, which can reduce the pressure of the data node 130 to sort the data as a whole, and reduce the memory overhead of the data node 130 , Improve execution efficiency.
  • the target data node sorts the local data and the data received from other data nodes.
  • the data node 130 After the data node 130 receives the data sent by the data node 140, the data node 150, and the data node 160, the data node 130 will receive the data from the data node 140, the data node 150, and the data node 160 together with the locally stored data. Perform the overall sorting of the data together. In this way, the data related to the SQL query statements (such as OVER expressions) executed by the distributed database are all in order.
  • the SQL query statements such as OVER expressions
  • the data node 130 receives the ordered data sent by the data node 140, the data node 150, and the data node 160, and the data node 130 then sorts the received data as a whole to realize the global order of the data, thereby reducing IO overhead and memory Overhead, improve sorting efficiency.
  • the target data node sends a plurality of sorted data to at least one data node, so that the at least one data node performs calculations related to the query sentence on the data respectively received.
  • the target data node After the target data node finishes sorting the data related to the query sentence, it forms multiple ordered data sets based on the ordered data.
  • the multiple data in each ordered data set are ordered.
  • different data sets are ordered.
  • the data in the ordered data set can be partially repeated (it can also be called data overlap, and some data in the two ordered data sets are the same), but not all of them (the data in the two ordered data sets completely overlap).
  • the target data node sends the data to the data nodes in the distributed database except the target data node according to the ordered data set. For example, if the data node 130 processes the data of an ordered data set, the data node 130 sends each remaining ordered data set to a different data node; if the data node 130 does not process the data of any ordered data set , The data node 130 sends each ordered data set to a different data node. In this way, the number of ordered data sets is equal to the number of data nodes that need to process all ordered data sets, and the data node that receives the ordered data set is responsible for performing query statement-related calculations on the received ordered data set.
  • each data node will obtain an ordered data set.
  • the data node 130 sends data to all data nodes in the distributed database except the data node 130 according to an ordered data set, for example, sends the data to the data node 140, the data node 150, and the data node 160.
  • the data node 130 sends data to some data nodes other than the data node 130 in the distributed database according to an ordered data set, for example, sends the data to the data node 140 and the data node 150.
  • the data node 130 when the data node 130 sends data, it can detect the load condition of the data node or the physical node where the data node is located, so as to decide whether to send the data, so as to avoid sending the data to the data node with excessive load, which will affect the calculation. Efficiency and execution efficiency.
  • At least one data node After at least one data node receives the data sent by the target data node, it can perform various calculations on the data, such as window function calculations, aggregate function calculations, and so on. For example, after the data node 140 receives an ordered data set sent by the data node 130, it directly performs a window function (for example, summation) calculation on the data contained in the ordered data set, and the data node 150 receives the data sent by the data node 130. After an ordered data set of, it also starts to perform window function calculation on the data contained in the received ordered data set. At this time, the data node 140 and the data node 150 perform the calculation of the window function in parallel.
  • window function for example, summation
  • the target data node sends ordered data to other data nodes so that other data nodes can perform calculations after receiving the data, which can make full use of the computing power of the distributed database and improve the calculation efficiency of the entire system.
  • the target data node sends at least one different data to different data nodes among the multiple data nodes of the distributed database.
  • each adjacent data ordered data set is completely connected.
  • the data calculation of the current row depends on the data of multiple rows before or after multiple rows. Therefore, when the target data node divides the ordered data set, there may be partially repeated data in the adjacent ordered data set.
  • the target data node can send all sorted data to other data nodes, so that other data nodes can complete the calculation, and the target data node can also keep a part of the sorted data locally.
  • the data is calculated. It is easy to understand that the data node 130 also participates in the data calculation process, which can make full use of the computing power of the distributed database and further improve the computing efficiency.
  • the target data node determines N partitions based on the sorted data of the target data node, and different partitions in the N partitions include at least one different data, and N is an integer greater than 1. , And the N is less than or equal to the number of data nodes in the distributed database; the target data node sends to each of the N data nodes in the distributed database except for the target data node in the N partitions Data of a partition.
  • the target data node composes the sorted data into N partitions, and the data in each partition is ordered.
  • the partition here is different from the partition concept in data storage, which means that the sorted data is logically sorted.
  • a part of the data is intercepted to form a partition, the order of the intercepted data does not change, and the number of partitions is less than or equal to the number of data nodes in the distributed database.
  • the target data node performs average interception on the sorted data to obtain N partitions, and the amount of data contained in each partition is the same; of course, the average interception may not be performed, and the obtained N partitions The amount of data is not exactly the same.
  • the number of partitions N can be equal to the number of data nodes in the distributed database.
  • the target data node When the target data node sends data, it will send data from one of the N partitions to each data node, and different partitions will send different data.
  • the number of partitions N can also be less than the number of data nodes in the distributed database.
  • the target data node When the target data node sends data, it can select N data nodes with a smaller load from other data nodes and send to each data node Data of one of the N partitions.
  • there may be repeated partial data between adjacent partitions and the amount of repeated data may be the same or different, but the data between any two adjacent partitions cannot be completely the same.
  • the target data node composes the sorted data into N partitions according to the total amount of data T and the data overlap interval, and the amount of data contained in each partition is calculated by the target data node.
  • the target data node when the data overlap interval is 0 and there is no data overlap between two adjacent partitions, the target data node does not need to consider the data overlap of each partition when dividing the data, and directly composes the total amount of data T evenly as N partitions, the amount of data in each partition is T divided by N rows, where the target data node can obtain the total amount of data T when receiving and sorting the data sent by each other data.
  • the target data node needs to consider the data overlap interval between each partition when dividing the data. According to the different data overlap interval, the obtained N
  • the partitions are also not the same.
  • the following examples provide several implementation examples of forming overlapping intervals.
  • the data overlaps forward, and the overlap interval is x rows.
  • the data overlaps backward, and the overlap interval is y rows.
  • the data has both forward and backward overlap.
  • the forward overlap interval is x rows and the backward overlap interval is y rows.
  • the target data node when the target data node is partitioned, for the first partition, it needs to consider that it overlaps with the next partition by y rows, for the last partition, it needs to consider that it overlaps with the previous partition by x rows, and other partitions need to consider both.
  • the amount of data allocated by each partition is calculated according to the following formula 3:
  • the target data node will be selected when partitioning The largest x value or the largest y value is partitioned.
  • each partition is [1, T/5+y], [T/5+1, 2T/5+y ], [2T/5+1, 3T/5+y], [3T/5+1, 4T/5+y], [4T/5+1, T], where the overlap interval is [T/5+1 ,T/5+y], [2T/5+1,2T/5+y], [3T/5+1,3T/5+y], [4T/5+1,4T/5+y].
  • the target data node divides the sorted data into multiple data intervals. As shown in Figure 7, all data is divided into 9 data intervals. For each data interval, the target data node takes the data interval of the data interval.
  • the first row is calculated, and the first row of each data interval is compared with the first row of the overlapping partition. If the value corresponding to the first row of data in the data interval is greater than or equal to the value corresponding to the first row of data in a certain partition, the The data interval is sent to the same data node in the partition. If the value corresponding to the first row of data in the data interval is less than the value corresponding to the first data in the partition, the data interval does not need to be sent to the data node in the same partition.
  • the value corresponding to the first row of data is T/5+1, which is the overlapping interval of partition 1 and partition 2, and the value of partition 1
  • the value corresponding to the first row of data is 1, so the data interval [T/5+1, T/5+y] is sent to the same data node in partition 1, and the value corresponding to the first row of partition 2 is T/5+ 1.
  • the value corresponding to the first row of data in the data interval [T/5+1, T/5+y] is equal, so the data interval [T/5+1, T/5+y] is also sent to the partition 2 to be the same Data node.
  • the target data node divides the sorted data into multiple data intervals according to the same method as described above. For each data interval, the target data node overlaps the last row of the data interval with each other. The last rows of the partitions are compared, and if the value corresponding to the last row data of the data interval is less than or equal to the value corresponding to the last row data of a certain partition, the data interval is sent to the same data node of the partition.
  • the target data node can also divide the data interval according to the same method and compare and determine it. For the sake of brevity, it will not be repeated here.
  • the value of the overlap interval that is, the value of x or y mentioned above is much smaller than the result of dividing T by N. If the value of x or y is close to the result of dividing T by N, or even greater than the result of dividing T by N , It will increase the system overhead and network transmission overhead. At this time, it is no longer suitable to send the sorted data to other data nodes for processing. In this case, other solutions can be used to perform the sorting on the sorted data. Calculation, for example, the target data node calculates the sorted data.
  • the target data node determines the data sending order according to the number of the physical node; the target data node sends multiple data sorted by the target data node to other data nodes according to the number of the physical node.
  • the physical node corresponding to the code of the physical node includes at least one data node in the distributed database.
  • the target data node After the target data node performs partition processing on the sorted data, it needs to further determine the partition sending order to ensure that all partitions can be accurately sent to other data nodes in the determined order.
  • multiple data nodes are usually deployed in a physical machine. If the target data node sends partitions according to the data node number, it may result in a period of time that the data nodes that receive the partitions sent by the target data node are all deployed Data nodes on the same physical machine will cause the physical machine to be overloaded and slow in execution speed, while other physical machines are in an idle state, which cannot make full use of the resources of the distributed system and affects the execution efficiency of the entire system.
  • the target data node determines the partition sending order, it is determined according to the number of the physical node.
  • the target data node sends the partition, it is sent to all other data nodes in the order determined by the number of the physical node.
  • FIG. 8 there are a physical machine 810, a physical machine 820, and a physical machine 830.
  • a data node 811 and a data node 812 are deployed in the physical machine 810, and a data node 821 and a data node 822 are deployed in the physical machine 820.
  • a data node 831 and a data node 832 are deployed in the physical machine 830.
  • the target data node determines the sending order according to the number of the physical machine. Since it is necessary to ensure the maximum utilization of each physical machine in the distributed system and improve the execution efficiency, the determined sending order is: data node 811, data node 821, data node 831, Data node 812, data node 822, and data node 832. That is, the target data node first sends the partition 1 to the data node 811, and then sends the partition 2 to the data node 821, and sends all the partitions to the corresponding data node in the order determined above.
  • Figure 8 shows a scenario where data nodes are evenly distributed across physical nodes.
  • the target data node When the data nodes are not evenly distributed, some physical nodes are deployed with multiple data nodes, and some physical nodes are deployed with fewer data nodes.
  • the target data node first sends the partitions to the data nodes deployed in each physical node according to the number of the physical node, when all the data nodes in the physical nodes with fewer data nodes have received the partition sent by the target data node, The target data node continues to send partitions to the data nodes that have not received data among the physical nodes where more data nodes are deployed, until all the partitions are sent.
  • the partition sending order can also be determined in other ways, which is not limited in this application.
  • the target data node respectively sends a plurality of sorted data to other data nodes, so that the other data nodes perform the calculation of the window function of the query statement on the data they respectively receive.
  • the window function may be a sum function (sum), an average function (avg), etc., This application does not limit this.
  • the stream thread in the target data node is When sending the partition to other data nodes, it will also send the identification of the first data node and the last data node corresponding to the determined sending order, and the amount of data that each data node needs to process (that is, T divided by N rows) .
  • T the amount of data that each data node needs to process
  • all other data nodes skip the overlap interval (for example, x rows) and start the calculation.
  • the overlap interval for example, but the calculation of the following data needs to rely on the forward overlap interval (x rows); when there is backward data overlap, after all data nodes receive the partition, only the previous T/N rows are calculated.
  • the overlap interval (such as y row) ) Does not need to be calculated, but the calculation of the previous data needs to rely on the backward overlap interval (y row).
  • the SQL statements contain multiple window functions at the same time, and the target data node needs to perform sorting on the sorted data according to the forward overlap interval and the backward overlap interval contained in all window functions. Partition processing, after other data nodes receive the partition sent by the target data node, they also need to be processed separately for different window functions.
  • the SQL statement contains 3 window functions, the forward overlap interval of window function 1 is 2, the backward overlap interval is 0, the forward overlap interval of window function 2 is 5, the backward overlap interval is 0, and the window function 3
  • the forward overlap interval is 0, and the backward overlap interval is 4.
  • the target data node it will select the maximum value of the forward overlap interval and the backward overlap interval of the three window functions for partitioning, that is, select the forward overlap interval of window function 2 and the backward overlap of window function 3
  • the interval is partitioned.
  • the data volume of the first partition is T/N+4 rows
  • the data volume of the last partition is T/N+5 rows
  • the data volume of other partitions is T/N+5+4 rows.
  • each partition is sent to the corresponding other data nodes, and at the same time, the amount of data that each data node needs to process is also sent to each data node.
  • Each data node starts to calculate the window function after receiving the partition.
  • the overlap interval is not 2, but 5, so it is necessary to skip 5 lines to start the calculation, and the calculation of T/N line ends, and at the same time
  • the values of the forward overlap interval and the backward overlap interval are retained when the result is output.
  • calculating the window function 2 according to the value of the forward overlap interval, skip 5 lines to start the calculation, and calculate the T/N line to end.
  • the state information corresponding to the window function will record the changes in the forward overlap interval and the backward overlap interval when the window function is running.
  • the state information transformation is as follows As shown in Table 1:
  • the state information corresponding to the window function ensures that each window function can be calculated and executed correctly by recording control information (that is, the forward overlap interval and the backward overlap interval, and whether it is deleted after the calculation is completed).
  • the execution time is used to characterize the cost.
  • one data node executes the entire calculation process, and the cost is the time required for a single data node to execute the window function.
  • This application partitions the sorted data and sends it to Multiple data nodes are calculated in parallel, so the cost only needs to consider the time of the target data node's sending partition and the time of other data nodes receiving and calculating the partition. Compare the cost required by the existing scheme with the cost required by this application. When the difference between the cost required by the existing scheme and the cost required by this application is greater than 0, the scheme provided by this application should be selected, otherwise it is required Choose an existing plan.
  • the difference between the costs of the two schemes can be calculated according to the following formula 4:
  • ⁇ A represents the difference between the costs required by the two schemes
  • A represents the cost required by the existing scheme (that is, the time required for a single data node to process all the data volume T)
  • B represents the target data node to send all partitions.
  • Time represents the time required for other data nodes to receive the partition
  • N represents the number of data nodes contained in the distributed database.
  • n the number of window functions in the SQL statement.
  • the SQL statement that the distributed database needs to execute is: select a, b, c, sum(b) over (order by b rows 2 preceding) from tt01, that is, to execute the SQL statement, you need to first perform the data in the data table tt01 according to b
  • the columns are sorted in full, and the forward overlap interval is 2, and then the sum of the first two rows of column b to the current row is calculated.
  • DN1 and DN2 scan the data, and perform partial sorting according to column b.
  • the sorting results are shown in Table 2 below:
  • each DN sends the data to the target data node, and randomly selects DN1 as the target data node, then DN2 needs to send the sorted data to DN1, and DN1 merges and sorts the received data so that all the data that needs to participate in the calculation is Is orderly.
  • Table 3 The results of DN1 sorted by column b are shown in Table 3 below:
  • the data is partitioned to obtain two partitions, partition 1 is the first row and the second row, partition 2 is the first row to the fourth row (that is, all rows), and the overlapping interval is the first row and the second row.
  • determine the partition sending order Since DN1 and DN2 are deployed on different physical machines, the determined sending order is: send partition 1 to DN1, and send partition 2 to DN2.
  • DN1 Due to the forward overlap interval, DN1 needs to determine the data node to which each row of data needs to be sent. DN1 divides all data into two data intervals according to the forward overlap interval, as shown in Table 4 below:
  • the number corresponding to the first row of data is 1, the number 2 corresponding to the last row of data smaller than partition 1, and the number 4 corresponding to the last row of data smaller than partition 2, so data interval 1 is sent to DN1 and DN2
  • the number corresponding to the first row of data is 3, which is greater than the number 2 corresponding to the last row of partition 1 but less than the number 4 corresponding to the last row of partition 2, so the data interval 2 is sent to DN2.
  • DN1 and DN2 perform parallel calculations on the received data, and calculate the sum from the first two rows of column b to the current row.
  • the calculation results are shown in Table 6 below:
  • this application sends globally ordered data to each data node to make full use of the computing power of each data node in the distributed system, which can avoid the bottleneck caused by the calculation of a single data node, so that the calculation of data can be Parallel execution improves calculation and execution efficiency.
  • FIG. 10 is a schematic structural diagram of a data storage device provided by an embodiment of the present application.
  • the data storage device 10 includes a receiving unit 11, a processing unit 12 and a sending unit 13. in,
  • the receiving unit 11 is configured to receive data related to query sentences sent by other data nodes in the distributed database.
  • the receiving unit 11 shown is configured to perform the foregoing step S310, and optionally perform optional methods in the foregoing steps.
  • the processing unit 12 is configured to sort the local data and the data received from the other data nodes.
  • processing unit 12 shown is configured to execute the aforementioned step S320, and optionally execute optional methods in the aforementioned steps.
  • the sending unit 13 is configured to send a plurality of sorted data to at least one data node in the distributed database, so that the at least one data node performs calculations related to the query sentence on the data respectively received.
  • the sending unit 13 shown is configured to perform the foregoing step S330, and optionally perform optional methods in the foregoing steps.
  • the receiving unit 11 is specifically configured to: receive data sent by other data nodes after sorting the local data.
  • the sending unit 13 is specifically configured to send at least one different data to different data nodes among the multiple data nodes in the distributed database.
  • the processing unit 12 is further configured to perform calculations related to the query sentence on the data that is not sent to the at least one data node in the sorted data.
  • the processing unit 12 is further configured to determine N partitions based on the sorted data, different partitions of the N partitions include at least one different data, and N is An integer greater than 1, and the N is less than or equal to the number of data nodes in the distributed database; the sending unit 13 is specifically configured to: except for the target among the N data nodes of the distributed database Each data node other than the data node sends data of one of the N partitions.
  • the processing unit 12 is specifically configured to obtain the N partitions based on the sorted data according to the total amount of data and the data overlap interval of the sorted data.
  • the sending unit 13 is specifically configured to send the sorted multiple data to the at least one data node according to the number of the physical node, and the number of the physical node corresponds to The physical nodes include at least one data node in the distributed database.
  • the at least one data node performs calculation of the window function of the query sentence on the data respectively received.
  • each of the data storage devices can be modified as needed. Units are added, reduced or merged.
  • the operation and/or function of each module in the data storage device is to implement the corresponding process of the method described in FIG. 3 above, and is not repeated here for brevity.
  • FIG. 11 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • the computing device 20 includes a processor 21, a communication interface 22 and a memory 23.
  • the processor 21, the communication interface 22 and the memory 23 are connected to each other through an internal bus 24.
  • the computing device may be a database server.
  • the computing device 20 may be the physical node 170 where the data node 130 and the data node 140 are deployed in FIG. 1.
  • the functions performed by the target data node in FIGS. 1, 2 and 3 are actually performed by the processor 21 of the computing device.
  • the processor 21 may be composed of one or more general-purpose processors, such as a central processing unit (CPU), or a combination of a CPU and a hardware chip.
  • the aforementioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • the bus 24 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus 24 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used to represent in FIG. 11, but it does not mean that there is only one bus or one type of bus.
  • the memory 23 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM); the memory 23 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory). Only memory (ROM), flash memory (flash memory), hard disk drive (HDD), or solid-state drive (SSD); the memory 23 may also include a combination of the above types.
  • the program code may be used to implement the functional units shown in the data storage device 10, or to implement the method steps in the method embodiment shown in FIG. 3 with the target data node as the execution subject.
  • the embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored.
  • the program When the program is executed by a processor, it can implement part or all of the steps of any one of the above method embodiments, and realize the above The function of any one of the functional units described in Figure 10.
  • the embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored.
  • the program When the program is executed by a processor, it can implement part or all of the steps of any one of the above method embodiments, and realize the above The function of any one of the functional units described in Figure 10.
  • the embodiments of the present application also provide a computer program product, which when it runs on a computer or a processor, enables the computer or the processor to execute one or more steps in any of the foregoing methods. If each component unit of the aforementioned equipment is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in the computer readable storage medium.
  • the size of the sequence numbers of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not be implemented in this application.
  • the implementation process of the example constitutes any limitation.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data calculation method and a related device. The method comprises: a target data node in a distributed database (100) receiving data sent by other data nodes in the distributed database (100) and related to a query statement (S310); the target data node sorting local data and the data received from the other data nodes (S320); and the target data node sending a plurality of pieces of sorted data to at least one data node in the distributed database (100), so that the at least one data node carries out calculation related to the query statement on data respectively received thereby (S330). By means of the method, full use can be made of a distributed calculation capability, thereby preventing a bottleneck caused by a single data node carrying out data calculation, and improving the calculation efficiency.

Description

一种数据计算的方法及相关设备A method of data calculation and related equipment
本申请要求于2020年01月22日提交中国专利局、申请号为202010076105.0、申请名称为“一种数据计算的方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 22, 2020, the application number is 202010076105.0, and the application name is "a method of data calculation and related equipment", the entire content of which is incorporated herein by reference Applying.
技术领域Technical field
本申请涉及分布式存储技术领域,尤其涉及一种数据计算的方法及相关设备。This application relates to the field of distributed storage technology, and in particular to a method of data calculation and related equipment.
背景技术Background technique
窗口函数是结构化查询语言(structured query language,SQL)中一类特别的函数,和聚合函数类似,窗口函数的输入也是多行记录。窗口函数作用于一个窗口,窗口是由一个OVER表达式定义的多行记录。窗口函数与OVER表达式一起使用,OVER表达式用于对数据进行分组,并对组内元素进行排序;而窗口函数则用于对组内值进行处理,例如聚集、生成序号等。Window function is a special type of function in structured query language (SQL). Similar to aggregate function, the input of window function is also a multi-line record. The window function acts on a window, and the window is a multi-line record defined by an OVER expression. The window function is used together with the OVER expression. The OVER expression is used to group data and sort the elements in the group; and the window function is used to process the values in the group, such as aggregation, generating sequence numbers, and so on.
在分布式数据库中,数据都分布存储于各个数据节点中。此外,在对分布式数据库的数据进行计算时,是由单个数据节点完成数据的收集、排序和计算,由于单个数据节点的计算资源有限,将会造成计算瓶颈,降低计算效率。In a distributed database, data is distributed and stored in various data nodes. In addition, when computing data in a distributed database, a single data node completes data collection, sorting, and calculation. Due to the limited computing resources of a single data node, it will cause a computing bottleneck and reduce computing efficiency.
因此,如何避免单个数据节点所造成的计算瓶颈,提高计算效率和整体执行效率是目前亟待解决的问题。Therefore, how to avoid the computing bottleneck caused by a single data node and improve the computing efficiency and overall execution efficiency is an urgent problem to be solved at present.
发明内容Summary of the invention
本发明实施例公开了一种数据计算的方法及相关设备,能够充分利用分布式计算能力,避免单数据节点造成的计算瓶颈,提高计算效率。The embodiment of the present invention discloses a data calculation method and related equipment, which can make full use of distributed calculation capabilities, avoid calculation bottlenecks caused by a single data node, and improve calculation efficiency.
第一方面,本申请提供一种数据计算的方法,所述方法包括:分布式数据库中的目标数据节点接收所述分布式数据库中的其它数据节点发送的与查询语句相关的数据;该目标数据节点对本地的数据和从其它数据节点接收的数据排序;该目标数据节点向所述分布式数据库中的至少一个数据节点发送排序后的多个数据,以便所述至少一个数据节点对各自接收的数据进行所述查询语句相关的计算。In a first aspect, the present application provides a data calculation method, the method includes: a target data node in a distributed database receives data related to query sentences sent by other data nodes in the distributed database; the target data The node sorts the local data and the data received from other data nodes; the target data node sends a plurality of sorted data to at least one data node in the distributed database, so that the at least one data node sorts the data received respectively The data performs calculations related to the query sentence.
在本申请提供的方案中,目标数据节点对与查询语句相关的数据进行收集并排序,得到排序后的数据,并将排序后的有序数据发送给分布式数据库中的其它数据节点,以使得其它数据节点可以并行进行查询语句相关的计算,这样可以避免单个数据节点进行计算所造成的瓶颈,充分;利用分布式计算能力,提高资源的利用率和计算效率。In the solution provided by this application, the target data node collects and sorts the data related to the query sentence to obtain the sorted data, and sends the sorted ordered data to other data nodes in the distributed database, so that Other data nodes can perform calculations related to query statements in parallel, which can avoid the bottleneck caused by calculation by a single data node and make full use of distributed computing capabilities to improve resource utilization and computing efficiency.
在一种可能的实现方式中,所述其它数据节点各自对本地的数据进行排序,并向所述目标数据节点发送排序后的数据。In a possible implementation manner, the other data nodes each sort the local data, and send the sorted data to the target data node.
在本申请提供的方案中,其它数据节点在向目标数据节点发送数据之前,先把本地的数据进行排序,这样可以减小目标数据节点的排序压力,减小目标数据节点的内存开销, 提高执行效率。In the solution provided by this application, other data nodes sort the local data before sending data to the target data node. This can reduce the sorting pressure of the target data node, reduce the memory overhead of the target data node, and improve execution efficiency.
在一种可能的实现方式中,目标数据节点向分布式数据库中的多个数据节点中的不同数据节点发送至少一个不同的数据。In a possible implementation manner, the target data node sends at least one different data to different data nodes among the multiple data nodes in the distributed database.
在本申请提供的方案中,目标数据节点将排序后的数据发给不同的数据节点,且每个数据节点接收到的数据都是不完全相同的,这样可以保证所有接收到数据的数据节点都可以参与查询语句相关的计算,提高计算效率。In the solution provided by this application, the target data node sends the sorted data to different data nodes, and the data received by each data node is not exactly the same, which can ensure that all data nodes that receive the data are Can participate in calculations related to query statements to improve calculation efficiency.
在一种可能的实现方式中,目标数据节点对排序后的数据中未向所述至少一个数据节点发送的数据进行查询语句相关的计算。In a possible implementation manner, the target data node performs query sentence-related calculations on data that is not sent to the at least one data node in the sorted data.
在本申请提供的方案中,目标数据节点也可以参与查询语句相关的计算,可以进一步提高计算效率,充分利用分布式数据库的计算资源。In the solution provided by this application, the target data node can also participate in calculations related to query sentences, which can further improve the calculation efficiency and make full use of the computing resources of the distributed database.
在一种可能的实现方式中,目标数据节点基于所述目标数据节点排序后的数据确定N个分区,该N个分区中的不同分区包括至少一个不同的数据,N为大于1的整数,且N小于或等于分布式数据库具有的数据节点的数量;目标数据节点向分布式数据库的N个数据节点中除了目标数据节点以外的每个数据节点发送N个分区中的一个分区的数据。In a possible implementation manner, the target data node determines N partitions based on the sorted data of the target data node, different partitions of the N partitions include at least one different data, N is an integer greater than 1, and N is less than or equal to the number of data nodes in the distributed database; the target data node sends data of one of the N partitions to each of the N data nodes of the distributed database except the target data node.
在本申请提供的方案中,目标数据节点将排序后的数据组成N个分区,给每一个参与计算的数据节点发送N个分区中的一个分区的数据,保证每个数据节点都能够接收到一个分区并进行计算。In the solution provided by this application, the target data node composes the sorted data into N partitions, and sends data from one of the N partitions to each data node participating in the calculation, ensuring that each data node can receive one Partition and calculate.
在一种可能的实现方式中,目标数据节点根据目标数据节点排序后的数据的数据总量和数据重叠区间,基于所述目标数据节点排序后的数据得到所述N个分区。In a possible implementation manner, the target data node obtains the N partitions based on the sorted data of the target data node according to the total amount of data and the data overlap interval of the sorted data of the target data node.
在本申请提供的方案中,目标数据节点在对排序后的数据组成N个分区时,可以同时考虑数据总量和数据重叠区间这两个因素,提高组成N个分区的合理性和准确性。In the solution provided by this application, when the target data node composes N partitions of sorted data, the two factors of the total amount of data and the data overlap interval can be considered at the same time, so as to improve the rationality and accuracy of composing N partitions.
在一种可能的实现方式中,目标数据节点按照物理节点的编号,将目标数据节点排序后的多个数据发送至至少一个数据节点,所述物理节点的编号对应的物理节点中包括所述分布式数据库中的至少一个数据节点。In a possible implementation manner, the target data node sends multiple data sorted by the target data node to at least one data node according to the number of the physical node, and the physical node corresponding to the number of the physical node includes the distribution At least one data node in the database.
在本申请提供的方案中,目标数据节点按照物理节点的编号将排序后的数据发送至多个数据节点,这样可以避免将大量的数据短时间内发送至同一个物理节点,提高物理节点的资源利用率以及整个系统的执行效率。In the solution provided by this application, the target data node sends the sorted data to multiple data nodes according to the number of the physical node, which can avoid sending a large amount of data to the same physical node in a short time and improve the resource utilization of the physical node Rate and the execution efficiency of the entire system.
在一种可能的实现方式中,所述至少一个数据节点对各自接收的数据进行所述查询语句的窗口函数的计算。In a possible implementation manner, the at least one data node performs calculation of the window function of the query sentence on the data respectively received.
在本申请提供的方案中,各个数据节点针对各自接收到的数据可以进行各种查询语句相关的计算,例如查询语句的窗口函数的计算。In the solution provided by this application, each data node can perform various query sentence-related calculations, such as the calculation of the window function of the query sentence, with respect to the data received by each data node.
第二方面,本申请提供了一种数据存储装置,包括:接收单元,用于接收分布式数据库中的其它数据节点发送的与查询语句相关的数据;处理单元,用于对本地的数据和从所述其它数据节点接收的数据排序;发送单元,用于向所述分布式数据库中的至少一个数据节点发送排序后的多个数据,以便所述至少一个数据节点对各自接收的数据进行所述查询语句相关的计算。In the second aspect, the present application provides a data storage device, including: a receiving unit for receiving data related to query sentences sent by other data nodes in a distributed database; a processing unit for processing local data and slaves The data received by the other data nodes are sorted; a sending unit is configured to send a plurality of sorted data to at least one data node in the distributed database, so that the at least one data node performs the data on the data received by each Calculations related to query statements.
在一种可能的实现方式中,所述接收单元,具体用于:接收其它数据节点各自对本地 的数据进行排序之后发送的数据。In a possible implementation manner, the receiving unit is specifically configured to: receive data sent after other data nodes sort the local data.
在一种可能的实现方式中,所述发送单元,具体用于:向所述分布式数据库中的多个数据节点中的不同数据节点发送至少一个不同的数据。In a possible implementation manner, the sending unit is specifically configured to send at least one different data to different data nodes among the multiple data nodes in the distributed database.
在一种可能的实现方式中,所述处理单元,还用于对所述排序后的数据中未向所述至少一个数据节点发送的数据进行所述查询语句相关的计算。In a possible implementation manner, the processing unit is further configured to perform calculations related to the query sentence on data that is not sent to the at least one data node in the sorted data.
在一种可能的实现方式中,所述处理单元,还用于基于所述排序后的数据确定N个分区,所述N个分区中的不同分区包括至少一个不同的数据,所述N为大于1的整数,且所述N小于或等于所述分布式数据库具有的数据节点的数量;所述发送单元,具体用于:向所述分布式数据库的N个数据节点中除了所述目标数据节点以外的每个数据节点发送所述N个分区中的一个分区的数据。In a possible implementation manner, the processing unit is further configured to determine N partitions based on the sorted data, and different partitions of the N partitions include at least one different data, and the N is greater than An integer of 1, and the N is less than or equal to the number of data nodes in the distributed database; the sending unit is specifically configured to: except the target data node among the N data nodes in the distributed database Each other data node sends data of one of the N partitions.
在一种可能的实现方式中,所述处理单元,具体用于:根据所述排序后的数据的数据总量和数据重叠区间,基于所述排序后的数据得到所述N个分区。In a possible implementation manner, the processing unit is specifically configured to obtain the N partitions based on the sorted data according to the total amount of data and the data overlap interval of the sorted data.
在一种可能的实现方式中,所述发送单元,具体用于:按照物理节点的编号,将所述排序后的多个数据发送至所述至少一个数据节点,所述物理节点的编号对应的物理节点中包括所述分布式数据库中的至少一个数据节点。In a possible implementation manner, the sending unit is specifically configured to send the sorted multiple data to the at least one data node according to the number of the physical node, and the number of the physical node corresponds to The physical node includes at least one data node in the distributed database.
在一种可能的实现方式中,所述至少一个数据节点对各自接收的数据进行所述查询语句的窗口函数的计算。In a possible implementation manner, the at least one data node performs calculation of the window function of the query sentence on the data respectively received.
第三方面,本申请提供了一种计算设备,所述计算设备包括处理器和存储器,所述处理器执行所述存储器存储的计算机指令,使得所述计算设备执行上述第一方面以及结合上述第一方面中的任意一种实现方式的方法。In a third aspect, the present application provides a computing device. The computing device includes a processor and a memory, and the processor executes computer instructions stored in the memory, so that the computing device executes the first aspect described above and in combination with the first aspect. On the one hand, any one of the implementation methods.
第四方面,本申请提供了一种计算机存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序在被计算设备执行时实现上述第一方面以及结合上述第一方面中的任意一种实现方式的方法。In a fourth aspect, the present application provides a computer storage medium that stores a computer program that, when executed by a computing device, implements any one of the foregoing first aspect and a combination of the foregoing first aspect Way of realization.
第五方面,本申请提供了一种计算机程序产品,所述计算机程序产品包括计算机指令,当所述计算机指令被计算设备执行时,所述计算设备可以执行上述第一方面以及结合上述第一方面中的任意一种实现方式的方法。In a fifth aspect, the present application provides a computer program product. The computer program product includes computer instructions. When the computer instructions are executed by a computing device, the computing device can execute the above-mentioned first aspect and in combination with the above-mentioned first aspect. Any one of the methods in the implementation.
附图说明Description of the drawings
图1是本申请实施例提供的一种应用场景的示意图;FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application;
图2是本申请实施例提供的一种数据交互的示意图;Figure 2 is a schematic diagram of a data interaction provided by an embodiment of the present application;
图3是本申请实施例提供的一种数据计算的方法的流程示意图;FIG. 3 is a schematic flowchart of a data calculation method provided by an embodiment of the present application;
图4是本申请实施例提供的一种数据前向重叠的示意图;FIG. 4 is a schematic diagram of forward overlap of data provided by an embodiment of the present application;
图5是本申请实施例提供的一种数据后向重叠的示意图;FIG. 5 is a schematic diagram of backward overlap of data provided by an embodiment of the present application;
图6是本申请实施例提供的一种数据同时存在前向重叠和后向重叠的示意图;FIG. 6 is a schematic diagram of data provided by an embodiment of the present application in which forward overlap and backward overlap exist at the same time;
图7是本申请实施例提供的一种数据区间的划分的示意图;FIG. 7 is a schematic diagram of a data interval division provided by an embodiment of the present application;
图8是本申请实施例提供的一种确定数据发送顺序的示意图;FIG. 8 is a schematic diagram of determining a data sending order provided by an embodiment of the present application;
图9是本申请实施例提供的一种数据存储的示意图;FIG. 9 is a schematic diagram of a data storage provided by an embodiment of the present application;
图10是本申请实施例提供的一种数据存储装置的结构示意图;FIG. 10 is a schematic structural diagram of a data storage device provided by an embodiment of the present application;
图11是本申请实施例提供的一种计算设备的结构示意图。FIG. 11 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面结合附图对本申请实施例中的技术方案进行清楚、完整的描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。The following describes the technical solutions in the embodiments of the present application clearly and completely with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments.
首先,结合附图对本申请中所涉及的部分用语和相关技术进行解释说明,以便于本领域技术人员理解。First of all, some terms and related technologies involved in this application will be explained in conjunction with the drawings to facilitate the understanding of those skilled in the art.
广播(stream broadcast)是分布式数据库中的一种数据传输方式,指数据由一个数据节点(源数据节点)发送给其它的数据节点(目标数据节点)。Stream broadcast is a data transmission method in distributed databases, which means that data is sent from one data node (source data node) to other data nodes (target data node).
重分布(stream redistribute)也是分布式数据库中的一种数据传输方式,是指源数据节点根据连接条件计算哈希(hash)值,根据计算得到的hash值将数据发送给相应的目标数据节点。Stream redistribute is also a data transmission method in distributed databases, which means that the source data node calculates a hash value according to the connection condition, and sends the data to the corresponding target data node according to the calculated hash value.
图1示出了本申请实施例的一种可能的应用场景。在该应用场景中,分布式数据库100包括多个协调节点(coordinator,CN),例如协调节点110和协调节点120,以及多个数据节点(data node,DN),例如数据节点130、数据节点140、数据节点150和数据节点160,数据节点部署在物理节点(例如服务器)上,每个物理节点可以部署一个或多个数据节点,例如,数据节点130和数据节点140部署在物理节点170上,数据节点150部署在物理节点180上,数据节点160部署在物理节点190上。所有的数据都分布在数据节点上,数据节点之间的数据不共享,在执行业务时,协调节点接收来自客户端的查询请求并生成执行计划下发到各个数据节点,数据节点根据接收到的计划对需要使用的算子(例如数据操作(stream)算子)进行初始化处理,然后执行协调节点下发的执行计划。协调节点与数据节点之间,以及不同物理节点中的数据节点之间都是通过网络通道进行连接,该网络通道可以是可扩展的传输控制协议(scalable transmission control protocol,STCP)等各种通信协议。Figure 1 shows a possible application scenario of an embodiment of the present application. In this application scenario, the distributed database 100 includes multiple coordinator nodes (CN), such as coordinating node 110 and coordinating node 120, and multiple data nodes (DN), such as data node 130 and data node 140. , The data node 150 and the data node 160, the data node is deployed on a physical node (such as a server), each physical node can be deployed with one or more data nodes, for example, the data node 130 and the data node 140 are deployed on the physical node 170, The data node 150 is deployed on the physical node 180, and the data node 160 is deployed on the physical node 190. All data is distributed on the data nodes, and the data between the data nodes is not shared. When executing the business, the coordinating node receives the query request from the client and generates an execution plan and sends it to each data node. The data node is based on the received plan Perform initialization processing on the operators that need to be used (for example, data operation (stream) operators), and then execute the execution plan issued by the coordination node. The coordination node and the data node, as well as the data nodes in different physical nodes, are connected through a network channel, which can be a scalable transmission control protocol (STCP) and other communication protocols .
在业务执行过程中,涉及不同数据节点之间的数据交互都是由stream算子执行,如图2所示,数据节点130中包括服务线程131和stream线程132,数据节点140中包括服务线程141和stream线程142。Stream线程132可以将数据节点130中存储的数据发送至服务线程131从而进一步发送至协调节点110,也可以将数据直接发送至服务线程141;同理,stream线程142可以将数据节点140中存储的数据发送至服务线程141进而发送至协调节点110,也可以将数据直接发送至服务线程131。In the process of business execution, data interactions involving different data nodes are all performed by the stream operator. As shown in Figure 2, the data node 130 includes a service thread 131 and a stream thread 132, and the data node 140 includes a service thread 141. And stream thread 142. The Stream thread 132 can send the data stored in the data node 130 to the service thread 131 to be further sent to the coordination node 110, or can send the data directly to the service thread 141; in the same way, the stream thread 142 can send the data stored in the data node 140 The data is sent to the service thread 141 and then to the coordination node 110, or the data can be sent directly to the service thread 131.
对于上述图1和图2所示的应用场景中,数据都是分布存储于各个数据节点中,在执行业务而需要计算数据时,需要对业务涉及的所有数据进行排序,然后对排序后的数据进行计算。目前,都是将数据先汇聚到一个数据节点上之后再进行排序,例如数据节点140、数据节点150和数据节点160通过广播的方式将自身存储的数据汇聚到数据节点130上,数据节点130在完成数据汇聚之后,对所有数据进行排序,由于数据量比较大,而数据节 点130的内存资源有限,所以数据节点130在进行排序时可能会将部分的数据存入磁盘,这样将会导致产生大量的输入/输出(input and output,IO)开销,从而影响执行效率。数据节点130在排序完成之后,对排序后的数据进行计算,该计算仅在数据节点130中进行,而其它数据节点在完成广播数据之后一直处于空闲状态,这将导致负载严重不均,数据节点130的计算能力成为业务执行的瓶颈,整个分布式数据库100的执行效率将被大大降低,无法充分利用分布式执行能力。For the application scenarios shown in Figure 1 and Figure 2 above, the data is distributed and stored in each data node. When the data needs to be calculated for the execution of the business, all the data involved in the business needs to be sorted, and then the sorted data Calculation. At present, data is first aggregated on a data node and then sorted. For example, data node 140, data node 150, and data node 160 aggregate their stored data on data node 130 by broadcasting, and data node 130 is in After completing the data aggregation, sort all the data. Due to the relatively large amount of data and the limited memory resources of the data node 130, the data node 130 may store part of the data in the disk when sorting, which will result in a large amount of data. The input/output (IO) overhead of the system affects the execution efficiency. The data node 130 calculates the sorted data after the sorting is completed. This calculation is only performed in the data node 130, and other data nodes have been idle after completing the broadcast data, which will cause serious uneven load. The computing power of 130 has become a bottleneck for business execution, and the execution efficiency of the entire distributed database 100 will be greatly reduced, and the distributed execution ability cannot be fully utilized.
为了解决上述问题,本申请提供了一种数据计算的方法及相关设备,可以在计算之前将有序数据从单个数据节点重新分布到分布式数据库中的其它数据节点上,使得其它数据节点能够并行执行计算过程,充分利用分布式数据库的计算能力,提高计算效率和资源利用率。In order to solve the above problems, this application provides a data calculation method and related equipment, which can redistribute ordered data from a single data node to other data nodes in a distributed database before calculation, so that other data nodes can be parallelized Perform the calculation process, make full use of the computing power of the distributed database, and improve computing efficiency and resource utilization.
本申请实施例的技术方案可以应用于分布式数据库中各种需要数据排序并计算的场景。The technical solutions of the embodiments of the present application can be applied to various scenarios that require data sorting and calculation in a distributed database.
结合上述图1和图2所示的应用场景,参阅图3,图3是本申请实施例提供的一种数据计算的方法的流程示意图。如图3所示,该方法包括但不限于以下步骤:In combination with the application scenarios shown in FIG. 1 and FIG. 2, refer to FIG. 3. FIG. 3 is a schematic flowchart of a data calculation method provided by an embodiment of the present application. As shown in Figure 3, the method includes but is not limited to the following steps:
S310:目标数据节点接收其它数据节点发送的与查询语句相关的数据。S310: The target data node receives data related to the query sentence sent by other data nodes.
查询语句,可以是使用结构化查询语言(structured query language,SQL)表达的语句,例如包含窗口函数与OVER表达式的SQL语句。The query statement may be a statement expressed in a structured query language (SQL), for example, a SQL statement including a window function and an OVER expression.
具体地,目标数据节点可以为分布式数据库中的任意一个数据节点,相应地,其它数据节点为分布式数据库中除了目标数据节点以外的、并且存储与查询语句相关的数据的数据节点。可选地,目标数据节点可以是用户先前指定,或者可以是在执行业务时选择的一个数据节点。Specifically, the target data node may be any data node in the distributed database. Correspondingly, other data nodes are data nodes in the distributed database that store data related to the query statement other than the target data node. Optionally, the target data node may be previously designated by the user, or may be a data node selected when executing the service.
举例说明,目标数据节点可以是上述图1所示中的分布式数据库中的任意一个数据节点,例如数据节点130,相应地,其它数据节点包括数据节点140、数据节点150和数据节点160。For example, the target data node may be any data node in the distributed database shown in FIG. 1, such as the data node 130. Correspondingly, other data nodes include the data node 140, the data node 150, and the data node 160.
数据在各个数据节点中是按照表格的形式进行存储的,分布式数据库中的其它数据节点对各自存储的数据进行基表扫描,确定需要向目标数据节点发送哪个表中的哪些行数据。Data is stored in the form of a table in each data node, and other data nodes in the distributed database perform a base table scan on their stored data to determine which rows of data in which table needs to be sent to the target data node.
在一种可能的实现方式中,分布式数据库中的其它数据节点分别对本地的数据进行排序,并向目标数据节点发送排序后的数据。In a possible implementation manner, other data nodes in the distributed database respectively sort the local data and send the sorted data to the target data node.
具体地,在数据节点130作为目标数据节点时,数据节点140、数据节点150和数据节点160可以预先将需要向目标数据节点发送的数据进行排序,将排序完成后的数据再发给数据节点130。数据节点140、数据节点150和数据节点160在发送数据的过程中,可以是一次性将排序好的数据全部发向数据节点130;或者,数据节点140、数据节点150和数据节点160在发送数据的过程中,可以是通过多次发送的方式完成将排序好的数据全部向数据节点130发送,例如每次按照一定的数据量进行数据的发送,每次发送的数据量可以根据需要进行设置,本申请对此不作限定。Specifically, when the data node 130 is used as the target data node, the data node 140, the data node 150, and the data node 160 may sort the data to be sent to the target data node in advance, and then send the sorted data to the data node 130. . When the data node 140, the data node 150, and the data node 160 are sending data, they can send all sorted data to the data node 130 at one time; or, the data node 140, the data node 150, and the data node 160 are sending data. In the process, the sorted data can be sent to the data node 130 through multiple sending, for example, the data is sent according to a certain amount of data each time, and the amount of data sent each time can be set as needed. This application does not limit this.
可以理解,数据节点140、数据节点150和数据节点160对本地的数据进行排序之后再发送给数据节点130,可以减轻数据节点130的做数据的整体排序的压力,减小数据节点130的内存开销,提高执行效率。It can be understood that the data node 140, the data node 150, and the data node 160 sort the local data before sending it to the data node 130, which can reduce the pressure of the data node 130 to sort the data as a whole, and reduce the memory overhead of the data node 130 , Improve execution efficiency.
S320:目标数据节点对本地的数据和从其它数据节点接收的数据排序。S320: The target data node sorts the local data and the data received from other data nodes.
举例说明,数据节点130在接收到数据节点140、数据节点150和数据节点160分别发送的数据之后,数据节点130将从数据节点140、数据节点150和数据节点160接收的数据连同本地存储的数据一起进行数据的整体排序。这样,与分布式数据库所执行的SQL查询语句(例如OVER表达式)相关的数据都是有序的。For example, after the data node 130 receives the data sent by the data node 140, the data node 150, and the data node 160, the data node 130 will receive the data from the data node 140, the data node 150, and the data node 160 together with the locally stored data. Perform the overall sorting of the data together. In this way, the data related to the SQL query statements (such as OVER expressions) executed by the distributed database are all in order.
可选的,数据节点130接收数据节点140、数据节点150和数据节点160发送的有序数据,数据节点130再对接收的数据进行整体排序来实现数据的全局有序,从而降低IO开销和内存开销,提高排序效率。Optionally, the data node 130 receives the ordered data sent by the data node 140, the data node 150, and the data node 160, and the data node 130 then sorts the received data as a whole to realize the global order of the data, thereby reducing IO overhead and memory Overhead, improve sorting efficiency.
S330:目标数据节点向至少一个数据节点发送排序后的多个数据,以便该至少一个数据节点对各自接收的数据进行该查询语句相关的计算。S330: The target data node sends a plurality of sorted data to at least one data node, so that the at least one data node performs calculations related to the query sentence on the data respectively received.
具体地,目标数据节点在完成对该查询语句相关的数据排序之后,基于有序的数据组成多个有序数据集,每个有序数据集中的多个数据是有序的,此外,不同有序数据集中的数据可以局部重复(也可以称为数据重叠,两个有序数据集中存在一部分数据是相同的),但不能全部重复(两个有序数据集中的数据完全重叠)。Specifically, after the target data node finishes sorting the data related to the query sentence, it forms multiple ordered data sets based on the ordered data. The multiple data in each ordered data set are ordered. In addition, different data sets are ordered. The data in the ordered data set can be partially repeated (it can also be called data overlap, and some data in the two ordered data sets are the same), but not all of them (the data in the two ordered data sets completely overlap).
目标数据节点按有序数据集将数据发给分布式数据库中除目标数据节点之外的数据节点。举例说明,如果数据节点130处理一个有序数据集的数据,则数据节点130将剩余的每个有序数据集发给不同的数据节点;如果数据节点130不处理任意一个有序数据集的数据,则数据节点130将每个有序数据集发给不同的数据节点。这样,有序数据集的个数就等于要处理所有有序数据集的数据节点的个数,接收到有序数据集的数据节点负责对接收的有序数据集进行查询语句相关的计算。The target data node sends the data to the data nodes in the distributed database except the target data node according to the ordered data set. For example, if the data node 130 processes the data of an ordered data set, the data node 130 sends each remaining ordered data set to a different data node; if the data node 130 does not process the data of any ordered data set , The data node 130 sends each ordered data set to a different data node. In this way, the number of ordered data sets is equal to the number of data nodes that need to process all ordered data sets, and the data node that receives the ordered data set is responsible for performing query statement-related calculations on the received ordered data set.
如果有序数据集的个数等于分布式数据库的所有数据节点的个数,则每个数据节点都会获取到有序数据集。举例说明,数据节点130按有序数据集将数据发给分布式数据库中除数据节点130之外的其它所有数据节点,例如将数据发送给数据节点140、数据节点150和数据节点160。If the number of ordered data sets is equal to the number of all data nodes in the distributed database, each data node will obtain an ordered data set. For example, the data node 130 sends data to all data nodes in the distributed database except the data node 130 according to an ordered data set, for example, sends the data to the data node 140, the data node 150, and the data node 160.
如果有序数据集的个数小于分布式数据库的所有数据节点的个数,部分数据节点会获取到有序数据集。举例说明,数据节点130按有序数据集将数据发送给分布式数据库中除数据节点130之外的部分数据节点,例如将数据发送给数据节点140和数据节点150。If the number of ordered data sets is less than the number of all data nodes in the distributed database, some data nodes will obtain ordered data sets. For example, the data node 130 sends data to some data nodes other than the data node 130 in the distributed database according to an ordered data set, for example, sends the data to the data node 140 and the data node 150.
可选的,数据节点130在发送数据时,可以对数据节点或数据节点所在的物理节点的负载情况等进行检测,以决定是否发送数据,避免将数据发送给负载过大的数据节点,影响计算效率和执行效率。Optionally, when the data node 130 sends data, it can detect the load condition of the data node or the physical node where the data node is located, so as to decide whether to send the data, so as to avoid sending the data to the data node with excessive load, which will affect the calculation. Efficiency and execution efficiency.
至少一个数据节点在接收到目标数据节点发送的数据之后,可以对数据进行各种计算,例如窗口函数计算、聚合函数计算等。例如,数据节点140在接收到数据节点130发送的一个有序数据集之后,直接对该有序数据集中包含的数据进行窗口函数(例如求和)计算,数据节点150在接收到数据节点130发送的一个有序数据集之后,也开始对接收到的有序数据集中包含的数据进行窗口函数计算,此时数据节点140和数据节点150并行执行窗口函数的计算。After at least one data node receives the data sent by the target data node, it can perform various calculations on the data, such as window function calculations, aggregate function calculations, and so on. For example, after the data node 140 receives an ordered data set sent by the data node 130, it directly performs a window function (for example, summation) calculation on the data contained in the ordered data set, and the data node 150 receives the data sent by the data node 130. After an ordered data set of, it also starts to perform window function calculation on the data contained in the received ordered data set. At this time, the data node 140 and the data node 150 perform the calculation of the window function in parallel.
可以理解,目标数据节点将有序数据发送给其它数据节点,以使得其它数据节点能够在接收到数据之后进行计算,可以充分利用分布式数据库的计算能力,提高整个系统的计 算效率。It can be understood that the target data node sends ordered data to other data nodes so that other data nodes can perform calculations after receiving the data, which can make full use of the computing power of the distributed database and improve the calculation efficiency of the entire system.
在一种可能的实现方式中,目标数据节点向分布式数据库的多个数据节点中的不同数据节点发送至少一个不同的数据。In a possible implementation manner, the target data node sends at least one different data to different data nodes among the multiple data nodes of the distributed database.
具体地,目标数据节点对有序的数据进行有序数据集划分时,各个相邻的数据有序数据集之间是完全连接的,针对某些具体的业务需求,例如在计算数据的过程中,当前行的数据计算依赖于前面多行或后面多行的数据,因此,目标数据节点在划分有序数据集时,相邻有序数据集可能存在局部重复的数据,对于部分数据而言,同时存在于两个有序数据集内,但是,两个相邻有序数据集中需要保证至少存在一个不同的数据,避免对相同的数据进行重复计算,以提高分布式系统的计算资源的利用率,提高计算效率。Specifically, when the target data node divides the ordered data into an ordered data set, each adjacent data ordered data set is completely connected. For some specific business requirements, for example, in the process of calculating data , The data calculation of the current row depends on the data of multiple rows before or after multiple rows. Therefore, when the target data node divides the ordered data set, there may be partially repeated data in the adjacent ordered data set. For some data, Exist in two ordered data sets at the same time, but two adjacent ordered data sets need to ensure that there is at least one different data to avoid repeated calculations on the same data to improve the utilization of computing resources in the distributed system , Improve calculation efficiency.
可选的,目标数据节点可以将排序后的数据全部发送给其它数据节点,以使得其它数据节点完成计算,目标数据节点也可以将排序后的数据保留一部分在本地,对未向其它数据节点发送的数据进行计算。容易理解,数据节点130也参与数据计算的过程,可以更加充分利用分布式数据库的计算能力,进一步提高计算效率。Optionally, the target data node can send all sorted data to other data nodes, so that other data nodes can complete the calculation, and the target data node can also keep a part of the sorted data locally. The data is calculated. It is easy to understand that the data node 130 also participates in the data calculation process, which can make full use of the computing power of the distributed database and further improve the computing efficiency.
在一种可能的实现方式中,目标数据节点基于所述目标数据节点排序后的数据确定N个分区,该N个分区中的不同分区包括至少一个不同的数据,所述N为大于1的整数,且所述N小于或等于分布式数据库具有的数据节点的数量;该目标数据节点向分布式数据库的N个数据节点中除了所述目标数据节点以外的每个数据节点发送N个分区中的一个分区的数据。In a possible implementation manner, the target data node determines N partitions based on the sorted data of the target data node, and different partitions in the N partitions include at least one different data, and N is an integer greater than 1. , And the N is less than or equal to the number of data nodes in the distributed database; the target data node sends to each of the N data nodes in the distributed database except for the target data node in the N partitions Data of a partition.
具体地,目标数据节点将排序后的数据组成为N个分区,每个分区中的数据都是有序的,这里的分区不同于数据存储中的分区概念,是指在逻辑上将排序后的数据截取一部分形成一个分区,截取的数据之间的顺序并不改变,分区数量小于或等于分布式数据库具有的数据节点的数量。可选的,目标数据节点对排序后的数据进行平均截取,得到N个分区,每个分区所包含的数据量是相同的;当然,也可以不进行平均截取,所得到的N个分区中的数据量并不完全相同。此外,分区数量N可以等于分布式数据库具有的数据节点的数量,目标数据节点在发送数据时,将向每一个数据节点发送N个分区中的一个分区的数据,不同的分区发送到不同的数据节点上;分区数量N也可以小于分布式数据库具有的数据节点的数量,目标数据节点在发送数据时,可以从其它数据节点中选择负载较小的N个数据节点,并向每一个数据节点发送N个分区中的一个分区的数据。特别的,相邻分区之间可以存在重复的部分数据,重复的数据量可以是相同的,也可以是不同的,但任意两个相邻分区之间的数据不能完全相同。Specifically, the target data node composes the sorted data into N partitions, and the data in each partition is ordered. The partition here is different from the partition concept in data storage, which means that the sorted data is logically sorted. A part of the data is intercepted to form a partition, the order of the intercepted data does not change, and the number of partitions is less than or equal to the number of data nodes in the distributed database. Optionally, the target data node performs average interception on the sorted data to obtain N partitions, and the amount of data contained in each partition is the same; of course, the average interception may not be performed, and the obtained N partitions The amount of data is not exactly the same. In addition, the number of partitions N can be equal to the number of data nodes in the distributed database. When the target data node sends data, it will send data from one of the N partitions to each data node, and different partitions will send different data. On the node; the number of partitions N can also be less than the number of data nodes in the distributed database. When the target data node sends data, it can select N data nodes with a smaller load from other data nodes and send to each data node Data of one of the N partitions. In particular, there may be repeated partial data between adjacent partitions, and the amount of repeated data may be the same or different, but the data between any two adjacent partitions cannot be completely the same.
在一种可能的实现方式中,目标数据节点根据数据总量T和数据重叠区间,将排序后的数据组成为N个分区,每个分区所包含的数据量由目标数据节点计算得到。In a possible implementation manner, the target data node composes the sorted data into N partitions according to the total amount of data T and the data overlap interval, and the amount of data contained in each partition is calculated by the target data node.
具体地,当数据重叠区间为0,两个相邻分区之间不存在数据重叠时,目标数据节点在进行数据划分时,不需要考虑各个分区的数据重叠,直接将数据总量T均匀组成为N个分区,每个分区的数据量为T除以N行,其中,目标数据节点在接收到各个其它数据接收发送的数据并进行排序时可以获取得到数据总量T。Specifically, when the data overlap interval is 0 and there is no data overlap between two adjacent partitions, the target data node does not need to consider the data overlap of each partition when dividing the data, and directly composes the total amount of data T evenly as N partitions, the amount of data in each partition is T divided by N rows, where the target data node can obtain the total amount of data T when receiving and sorting the data sent by each other data.
当数据重叠区间不为0,两个相邻分区之间存在数据重叠时,目标数据节点在进行数据划分时需要考虑各个分区之间的数据重叠区间,根据数据重叠区间的不同,组成得到的 N个分区也并不相同。下面举例提供组成重叠区间的几种实现举例。When the data overlap interval is not 0 and there is data overlap between two adjacent partitions, the target data node needs to consider the data overlap interval between each partition when dividing the data. According to the different data overlap interval, the obtained N The partitions are also not the same. The following examples provide several implementation examples of forming overlapping intervals.
1、数据前向重叠,其重叠区间为x行。1. The data overlaps forward, and the overlap interval is x rows.
具体地,目标数据节点在进行分区时,除了第一个分区之外,其它分区需要考虑与前一分区重叠x行,如图4所示,可以看出,第一个分区的数据量要比其它分区的数据量少,各个分区分配的数据量按照下述公式1计算得到:Specifically, when the target data node is partitioned, in addition to the first partition, other partitions need to consider overlapping x rows with the previous partition. As shown in Figure 4, it can be seen that the data volume of the first partition is larger than The amount of data in other partitions is small, and the amount of data allocated by each partition is calculated according to the following formula 1:
Figure PCTCN2021072472-appb-000001
Figure PCTCN2021072472-appb-000001
2、数据后向重叠,其重叠区间为y行。2. The data overlaps backward, and the overlap interval is y rows.
具体地,目标数据节点在进行分区时,除了最后一个分区之外,其它分区需要考虑与后一分区重叠y行,如图5所示,可以看出,最后一个分区的数据量要比其它分区的数据量少,各个分区分配的数据量按照下述公式2计算得到:Specifically, when the target data node is partitioned, except for the last partition, other partitions need to consider overlapping y rows with the latter partition. As shown in Figure 5, it can be seen that the data volume of the last partition is larger than that of other partitions. The amount of data is small, and the amount of data allocated by each partition is calculated according to the following formula 2:
Figure PCTCN2021072472-appb-000002
Figure PCTCN2021072472-appb-000002
3、数据同时存在前向重叠和后向重叠,其前向重叠区间为x行,后向重叠区间为y行。3. The data has both forward and backward overlap. The forward overlap interval is x rows and the backward overlap interval is y rows.
具体地,目标数据节点在进行分区时,对于第一个分区需要考虑其与后一分区重叠y行,对于最后一个分区需要考虑其与前一个分区重叠x行,其它分区则需要同时考虑这两个重叠区间,如图6所示,各个分区分配的数据量按照下述公式3计算得到:Specifically, when the target data node is partitioned, for the first partition, it needs to consider that it overlaps with the next partition by y rows, for the last partition, it needs to consider that it overlaps with the previous partition by x rows, and other partitions need to consider both. As shown in Figure 6, the amount of data allocated by each partition is calculated according to the following formula 3:
Figure PCTCN2021072472-appb-000003
Figure PCTCN2021072472-appb-000003
上述是针对只存在一个函数的情况,当同时存在多个函数,即存在多个前向重叠区间或后向重叠区间,且各个重叠区间的大小不一致时,目标数据节点在进行分区时,将选择最大的x值或最大的y值进行分区。The above is for the case where there is only one function. When there are multiple functions at the same time, that is, there are multiple forward overlapping intervals or backward overlapping intervals, and the size of each overlapping interval is inconsistent, the target data node will be selected when partitioning The largest x value or the largest y value is partitioned.
应理解,当相邻分区之间存在数据重叠区间时,同一行的数据可能需要同时发送到多个其它数据节点。因此,需要划分数据区间,计算每一行数据落入在哪一个数据区间,从而确定该行数据对应的分区,进而最终确定应该发送给哪一个数据节点。It should be understood that when there is a data overlap interval between adjacent partitions, the data of the same row may need to be sent to multiple other data nodes at the same time. Therefore, it is necessary to divide the data interval, calculate which data interval each row of data falls into, so as to determine the partition corresponding to the row of data, and finally determine which data node should be sent to.
示例性的,以上述图5为例,假设分布式数据库中存在5个数据节点,则每个分区依次为[1,T/5+y]、[T/5+1,2T/5+y]、[2T/5+1,3T/5+y]、[3T/5+1,4T/5+y]、[4T/5+1,T],其中重叠区间为[T/5+1,T/5+y]、[2T/5+1,2T/5+y]、[3T/5+1,3T/5+y]、[4T/5+1,4T/5+y]。按照重叠区间,目标数据节点将排序后的数据划分为多个数据区间,如图7所示,将所有的数据划分为了9个数据区间,针对每个数据区间,目标数据节点取该数据区间的首行进行计算,将每个数据区间的首行与其重叠的分区的首行进行比较,若该数据区间的首行数据对应的值大于等于某个分区的首行数据对应的值,则将该数据区间发送到该分区相同的数据节点中,若该数据区间的首行数据对应的值小于该分区的首先数据对应的值,则不用将该数据区间发送到该分区相同的数据节点中。例如,对于数据区间2[T/5+1,T/5+y]来说, 其首行数据对应的值为T/5+1,其为分区1和分区2的重叠区间,分区1的首行数据对应的值为1,因此将数据区间[T/5+1,T/5+y]发送给分区1相同的数据节点,此外分区2的首行数据对应的值为T/5+1,与数据区间[T/5+1,T/5+y]的首行数据对应的值相等,所以也将数据区间[T/5+1,T/5+y]发送给分区2相同的数据节点。Exemplarily, taking the above figure 5 as an example, assuming that there are 5 data nodes in the distributed database, each partition is [1, T/5+y], [T/5+1, 2T/5+y ], [2T/5+1, 3T/5+y], [3T/5+1, 4T/5+y], [4T/5+1, T], where the overlap interval is [T/5+1 ,T/5+y], [2T/5+1,2T/5+y], [3T/5+1,3T/5+y], [4T/5+1,4T/5+y]. According to the overlap interval, the target data node divides the sorted data into multiple data intervals. As shown in Figure 7, all data is divided into 9 data intervals. For each data interval, the target data node takes the data interval of the data interval. The first row is calculated, and the first row of each data interval is compared with the first row of the overlapping partition. If the value corresponding to the first row of data in the data interval is greater than or equal to the value corresponding to the first row of data in a certain partition, the The data interval is sent to the same data node in the partition. If the value corresponding to the first row of data in the data interval is less than the value corresponding to the first data in the partition, the data interval does not need to be sent to the data node in the same partition. For example, for the data interval 2[T/5+1, T/5+y], the value corresponding to the first row of data is T/5+1, which is the overlapping interval of partition 1 and partition 2, and the value of partition 1 The value corresponding to the first row of data is 1, so the data interval [T/5+1, T/5+y] is sent to the same data node in partition 1, and the value corresponding to the first row of partition 2 is T/5+ 1. The value corresponding to the first row of data in the data interval [T/5+1, T/5+y] is equal, so the data interval [T/5+1, T/5+y] is also sent to the partition 2 to be the same Data node.
应理解,对于前向重叠来说,目标数据节点也是按照上述相同的方法将排序后的数据划分为多个数据区间,针对每个数据区间,目标数据节点将该数据区间的尾行与其重叠的各个分区的尾行进行比较,若该数据区间的尾行数据对应的值小于等于某个分区的尾行数据对应的值,则将该数据区间发送到该分区相同的数据节点。同理,对于其它情况,例如同时包含前向重叠区间和后向重叠区间的情况,目标数据节点也可以按照相同的方法划分数据区间并进行比较判别,为了简洁,在此不再赘述。It should be understood that for forward overlap, the target data node divides the sorted data into multiple data intervals according to the same method as described above. For each data interval, the target data node overlaps the last row of the data interval with each other. The last rows of the partitions are compared, and if the value corresponding to the last row data of the data interval is less than or equal to the value corresponding to the last row data of a certain partition, the data interval is sent to the same data node of the partition. In the same way, for other situations, such as a situation that includes both a forward overlapping interval and a backward overlapping interval, the target data node can also divide the data interval according to the same method and compare and determine it. For the sake of brevity, it will not be repeated here.
值得说明的是,重叠区间的值,即上述x或y的值是远远小于T除以N的结果,若x或y的值接近T除以N的结果,甚至大于T除以N的结果,那么将会增加系统开销,增大网络传输开销,此时不再适合将排序后的数据再发送给其它数据节点进行处理,在这种情况下,可以采用其它的方案对排序后的数据进行计算,例如由目标数据节点对排序后的数据进行计算。It is worth noting that the value of the overlap interval, that is, the value of x or y mentioned above is much smaller than the result of dividing T by N. If the value of x or y is close to the result of dividing T by N, or even greater than the result of dividing T by N , It will increase the system overhead and network transmission overhead. At this time, it is no longer suitable to send the sorted data to other data nodes for processing. In this case, other solutions can be used to perform the sorting on the sorted data. Calculation, for example, the target data node calculates the sorted data.
在一种可能的实现方式中,目标数据节点根据物理节点的编号确定数据发送顺序;该目标数据节点按照所述物理节点的编号,将目标数据节点排序后的多个数据依次发送至其它数据节点,所述物理节点的编码对应的物理节点中包括所述分布式数据库中的至少一个数据节点。In a possible implementation manner, the target data node determines the data sending order according to the number of the physical node; the target data node sends multiple data sorted by the target data node to other data nodes according to the number of the physical node. The physical node corresponding to the code of the physical node includes at least one data node in the distributed database.
具体地,目标数据节点在对排序后的数据进行分区处理之后,需要进一步确定分区发送顺序,保证所有分区能够按照确定的顺序准确发送到其它数据节点。Specifically, after the target data node performs partition processing on the sorted data, it needs to further determine the partition sending order to ensure that all partitions can be accurately sent to other data nodes in the determined order.
由于在分布式数据库中,通常一个物理机中会部署多个数据节点,若目标数据节点按照数据节点的编号发送分区,可能会导致一段时间内接收目标数据节点发送的分区的数据节点都是部署在同一个物理机上的数据节点,这样将会导致该物理机负载过大,执行速度缓慢,而其它物理机处于空闲状态,不能充分利用分布式系统的资源,影响整个系统执行效率。Because in a distributed database, multiple data nodes are usually deployed in a physical machine. If the target data node sends partitions according to the data node number, it may result in a period of time that the data nodes that receive the partitions sent by the target data node are all deployed Data nodes on the same physical machine will cause the physical machine to be overloaded and slow in execution speed, while other physical machines are in an idle state, which cannot make full use of the resources of the distributed system and affects the execution efficiency of the entire system.
因此,目标数据节点在确定分区发送顺序时,是按照物理节点的编号进行确定,目标数据节点在发送分区时是按照物理节点的编号确定的顺序发送到所有的其它数据节点。Therefore, when the target data node determines the partition sending order, it is determined according to the number of the physical node. When the target data node sends the partition, it is sent to all other data nodes in the order determined by the number of the physical node.
示例性的,如图8所示,存在物理机810、物理机820和物理机830,物理机810中部署了数据节点811和数据节点812,物理机820中部署了数据节点821和数据节点822,物理机830中部署了数据节点831和数据节点832。目标数据节点根据物理机的编号确定发送顺序,由于要保证分布式系统中各个物理机的最大化利用,提高执行效率,因此确定的发送顺序为:数据节点811、数据节点821、数据节点831、数据节点812、数据节点822、数据节点832。即目标数据节点先将分区1发送给数据节点811,然后将分区2发送给数据节点821,按照上述确定的顺序将所有分区发至相应的数据节点中。Exemplarily, as shown in FIG. 8, there are a physical machine 810, a physical machine 820, and a physical machine 830. A data node 811 and a data node 812 are deployed in the physical machine 810, and a data node 821 and a data node 822 are deployed in the physical machine 820. , A data node 831 and a data node 832 are deployed in the physical machine 830. The target data node determines the sending order according to the number of the physical machine. Since it is necessary to ensure the maximum utilization of each physical machine in the distributed system and improve the execution efficiency, the determined sending order is: data node 811, data node 821, data node 831, Data node 812, data node 822, and data node 832. That is, the target data node first sends the partition 1 to the data node 811, and then sends the partition 2 to the data node 821, and sends all the partitions to the corresponding data node in the order determined above.
应理解,图8所示的是数据节点均匀分布在各个物理节点的场景,当数据节点分布不均匀,有些物理节点部署了多个数据节点,有些物理节点部署了较少了的数据节点,此时,目标数据节点先按照物理节点的编号依次向各个物理节点中部署的数据节点发送分区,当 部署较少数据节点的物理节点中的所有数据节点都已经接收到了目标数据节点发送的分区之后,目标数据节点继续向部署了较多的数据节点的物理节点中未接收到数据的数据节点发送分区,直到将所有的分区都发完为止。当然,还可以通过其它方式确定分区发送顺序,本申请对此不作限定。It should be understood that Figure 8 shows a scenario where data nodes are evenly distributed across physical nodes. When the data nodes are not evenly distributed, some physical nodes are deployed with multiple data nodes, and some physical nodes are deployed with fewer data nodes. When the target data node first sends the partitions to the data nodes deployed in each physical node according to the number of the physical node, when all the data nodes in the physical nodes with fewer data nodes have received the partition sent by the target data node, The target data node continues to send partitions to the data nodes that have not received data among the physical nodes where more data nodes are deployed, until all the partitions are sent. Of course, the partition sending order can also be determined in other ways, which is not limited in this application.
在一种可能的实现方式中,目标数据节点向其它数据节点分别发送排序后的多个数据,以使得其它数据节点对各自接收到的数据进行所述查询语句的窗口函数的计算。In a possible implementation manner, the target data node respectively sends a plurality of sorted data to other data nodes, so that the other data nodes perform the calculation of the window function of the query statement on the data they respectively receive.
具体地,其它数据节点在接收到目标数据节点发送的分区之后,针对分区中的数据进行查询语句的窗口函数的计算,窗口函数可以是求和函数(sum)、求平均函数(avg)等,本申请对此不作限定。Specifically, after receiving the partition sent by the target data node, other data nodes calculate the window function of the query statement for the data in the partition. The window function may be a sum function (sum), an average function (avg), etc., This application does not limit this.
可以看出,在存在数据重叠时,每个数据节点所接收到的数据量可能是不一样的,但是参与计算的数据量是均匀的,都为T除以N的结果,以便于充分利用分布式系统的计算能力,不需要对相同的数据进行重复计算。此外,当存在数据重叠时,窗口函数在运行过程中,其对应的状态信息还会记录前向重叠区间和后向重叠区间的值,即x和y的值,目标数据节点中的stream线程在向其它数据节点发送分区的同时还会发送其所确定的发送顺序对应的第一个数据节点和最后一个数据节点的标识,以及每个数据节点需要处理的数据量(即T除以N行)。例如,在存在前向数据重叠时,除了第一个数据节点外,其它所有数据节点在接收到分区之后,跳过重叠区间(例如x行)开始进行计算,应理解,虽然不需要计算重叠区间,但是后面数据的计算需要依赖前向重叠区间(x行);在存在后向数据重叠时,所有数据节点在接收到分区之后,仅计算前面的T/N行,对于重叠区间(例如y行)则不需要进行计算,但是前面数据的计算需要依赖后向重叠区间(y行)。It can be seen that when there is data overlap, the amount of data received by each data node may be different, but the amount of data involved in the calculation is uniform, and it is the result of dividing T by N in order to make full use of the distribution The computing power of the formula system does not require repeated calculations for the same data. In addition, when there is data overlap, when the window function is running, its corresponding state information will also record the values of the forward overlap interval and the backward overlap interval, that is, the values of x and y. The stream thread in the target data node is When sending the partition to other data nodes, it will also send the identification of the first data node and the last data node corresponding to the determined sending order, and the amount of data that each data node needs to process (that is, T divided by N rows) . For example, when there is a forward data overlap, except for the first data node, after receiving the partition, all other data nodes skip the overlap interval (for example, x rows) and start the calculation. It should be understood that although there is no need to calculate the overlap interval , But the calculation of the following data needs to rely on the forward overlap interval (x rows); when there is backward data overlap, after all data nodes receive the partition, only the previous T/N rows are calculated. For the overlap interval (such as y row) ) Does not need to be calculated, but the calculation of the previous data needs to rely on the backward overlap interval (y row).
特别的,当分布式数据库在执行SQL语句时,SQL语句中同时包含多个窗口函数,目标数据节点需要根据所有窗口函数中所包含的前向重叠区间和后向重叠区间对排序后的数据进行分区处理,其它数据节点在接收到目标数据节点发送的分区后,针对不同的窗口函数也需要单独处理。In particular, when the distributed database is executing SQL statements, the SQL statements contain multiple window functions at the same time, and the target data node needs to perform sorting on the sorted data according to the forward overlap interval and the backward overlap interval contained in all window functions. Partition processing, after other data nodes receive the partition sent by the target data node, they also need to be processed separately for different window functions.
例如,SQL语句中包含3个窗口函数,窗口函数1的前向重叠区间为2,后向重叠区间为0,窗口函数2的前向重叠区间为5,后向重叠区间为0,窗口函数3的前向重叠区间为0,后向重叠区间为4。目标数据节点在进行分区处理时,将选择这3个窗口函数中前向重叠区间和后向重叠区间的最大值进行分区,即选择窗口函数2的前向重叠区间和窗口函数3的后向重叠区间进行分区,第一个分区的数据量为T/N+4行,最后一个分区的数据量为T/N+5行,其它分区的数据量为T/N+5+4行。目标数据节点在完成分区之后,将各个分区发送到相应的其它数据节点,同时将各个数据节点需要处理的数据量也发送至各个数据节点。各个数据节点在接收到分区之后开始计算窗口函数,在计算窗口函数1时,重叠区间并不是2,而是5,因此在计算时需要跳过5行开始计算,计算T/N行截止,同时为了保证后续窗口函数计算的准确性,在对窗口函数1计算完成之后,输出结果时保留前向重叠区间和后向重叠区间的值。在计算窗口函数2时,根据前向重叠区间的值,跳过5行开始计算,计算T/N行截止,由于窗口函数的计算无需考虑前向重叠区间,所以在对窗口函数2计算完成之后,输出结果时忽略了前向重叠区间的值,仅保留后向重叠区间的值。在计算窗口函数3时,由于已经忽略了前向重叠区间的值,因此从每个分区的第一行开始计算, 计算T/N行截止,由于后续已经没有窗口函数需要计算,所以在对窗口函数3计算完成之后,输出结果时忽略后向重叠区间的值。For example, the SQL statement contains 3 window functions, the forward overlap interval of window function 1 is 2, the backward overlap interval is 0, the forward overlap interval of window function 2 is 5, the backward overlap interval is 0, and the window function 3 The forward overlap interval is 0, and the backward overlap interval is 4. When the target data node is partitioned, it will select the maximum value of the forward overlap interval and the backward overlap interval of the three window functions for partitioning, that is, select the forward overlap interval of window function 2 and the backward overlap of window function 3 The interval is partitioned. The data volume of the first partition is T/N+4 rows, the data volume of the last partition is T/N+5 rows, and the data volume of other partitions is T/N+5+4 rows. After the target data node completes the partitioning, each partition is sent to the corresponding other data nodes, and at the same time, the amount of data that each data node needs to process is also sent to each data node. Each data node starts to calculate the window function after receiving the partition. When calculating the window function 1, the overlap interval is not 2, but 5, so it is necessary to skip 5 lines to start the calculation, and the calculation of T/N line ends, and at the same time In order to ensure the accuracy of subsequent window function calculations, after the calculation of window function 1 is completed, the values of the forward overlap interval and the backward overlap interval are retained when the result is output. When calculating the window function 2, according to the value of the forward overlap interval, skip 5 lines to start the calculation, and calculate the T/N line to end. Since the calculation of the window function does not need to consider the forward overlap interval, after the calculation of the window function 2 is completed , Ignore the value of the forward overlap interval when outputting the result, and only keep the value of the backward overlap interval. When calculating window function 3, since the value of the forward overlap interval has been ignored, the calculation starts from the first row of each partition, and the calculation of T/N row ends. Since there is no subsequent window function to calculate, it is necessary to calculate the window function. After the calculation of function 3 is completed, the value of the backward overlap interval is ignored when outputting the result.
窗口函数对应的状态信息会在窗口函数运行时记录前向重叠区间和后向重叠区间的变化情况,在上述执行SQL语句中所包含的3个窗口函数的过程中,其状态信息的变换情况如下述表1所示:The state information corresponding to the window function will record the changes in the forward overlap interval and the backward overlap interval when the window function is running. In the process of executing the three window functions included in the SQL statement, the state information transformation is as follows As shown in Table 1:
表1Table 1
Figure PCTCN2021072472-appb-000004
Figure PCTCN2021072472-appb-000004
可以看出,窗口函数所对应的状态信息通过记录控制信息(即前向重叠区间和后向重叠区间,以及是否在完成计算后删除)来保证各个窗口函数能够正确的被计算和执行。It can be seen that the state information corresponding to the window function ensures that each window function can be calculated and executed correctly by recording control information (that is, the forward overlap interval and the backward overlap interval, and whether it is deleted after the calculation is completed).
应理解,当分布到各个数据节点进行计算的数据量(即T/N行)较大,且无数据重叠区间或数据重叠区间(即x或y)较小时,利用上述图3所示的方法可以实现并行计算,能够充分利用分布式系统的计算能力和系统资源,提高计算效率。然而,当分布到各个数据节点进行计算的数据量较小,且数据重叠区间较大时,将会产生大量额外的网络传输开销,可能导致传输时长超过各个数据节点的计算时长,严重影响计算效率。所以在实施图3所示的方法之前,针对不同的实际应用场景,需要进行代价估算,即估算本申请提供的方法是否优于现有方案。It should be understood that when the amount of data distributed to each data node for calculation (ie T/N rows) is large, and the no data overlap interval or the data overlap interval (ie x or y) is small, the method shown in Figure 3 is used. Parallel computing can be realized, the computing power and system resources of the distributed system can be fully utilized, and the computing efficiency can be improved. However, when the amount of data distributed to each data node for calculation is small, and the data overlap interval is large, a large amount of additional network transmission overhead will be generated, which may cause the transmission time to exceed the calculation time of each data node, which seriously affects the calculation efficiency . Therefore, before implementing the method shown in FIG. 3, for different practical application scenarios, cost estimation needs to be performed, that is, whether the method provided in this application is better than the existing solution.
具体地,以执行时间来表征代价,现有方案是由一个数据节点执行整个计算过程,那么其代价为单个数据节点执行窗口函数所需要的时间,本申请是将排序后的数据分区并发送给多个数据节点并行计算,那么其代价仅需考虑目标数据节点的发送分区的时间和其它数据节点接收分区并计算的时间。比较现有方案所需要的代价和本申请所需要的代价,当现有方案所需要的代价与本申请所需要的代价之间的差值大于0,则应该选择本申请提供的方案,否则需要选择现有方案。该两个方案所需代价的差值可以按照下述公式4计算得到:Specifically, the execution time is used to characterize the cost. In the existing solution, one data node executes the entire calculation process, and the cost is the time required for a single data node to execute the window function. This application partitions the sorted data and sends it to Multiple data nodes are calculated in parallel, so the cost only needs to consider the time of the target data node's sending partition and the time of other data nodes receiving and calculating the partition. Compare the cost required by the existing scheme with the cost required by this application. When the difference between the cost required by the existing scheme and the cost required by this application is greater than 0, the scheme provided by this application should be selected, otherwise it is required Choose an existing plan. The difference between the costs of the two schemes can be calculated according to the following formula 4:
ΔA=A-B-(C+A)/N      公式4ΔA=A-B-(C+A)/N Formula 4
其中,ΔA表示两种方案所需代价的差值,A表示现有方案所需要的代价(即单个数据节点处理所有数据量T所需要的时间),B表示目标数据节点发送所有分区所需要的时间,C表示其它数据节点接收分区所需要的时间,N表示分布式数据库中所包含的数据节点的数量。Among them, ΔA represents the difference between the costs required by the two schemes, A represents the cost required by the existing scheme (that is, the time required for a single data node to process all the data volume T), and B represents the target data node to send all partitions. Time, C represents the time required for other data nodes to receive the partition, and N represents the number of data nodes contained in the distributed database.
当存在多个窗口函数可共享分区的数据并进行计算时,例如上述表1所对应的场景中,两种方案所需代价的差值可以按照下述公式5计算得到:When there are multiple window functions that can share partition data and perform calculations, for example, in the scenario corresponding to Table 1 above, the difference between the costs of the two solutions can be calculated according to the following formula 5:
ΔA=n*A-B-(C+A)/N-(n-1)*A/N       公式5ΔA=n*A-B-(C+A)/N-(n-1)*A/N Formula 5
其中,ΔA、A、B、C、N等参数所表示的含义与上述公式4中的一致,n表示SQL语句中窗口函数的个数。Among them, the meanings represented by parameters such as ΔA, A, B, C, and N are consistent with those in the above formula 4, and n represents the number of window functions in the SQL statement.
可以看出,利用上述公式4和公式5可以进行代价估算,并根据估算结果选择合适的方案进行计算,使得整个系统的计算效率得到保证。It can be seen that the above formula 4 and formula 5 can be used to estimate the cost, and an appropriate solution can be selected for calculation according to the estimation result, so that the calculation efficiency of the entire system is guaranteed.
为了进一步说明图3所述的数据计算方法,下面将结合具体的SQL查询语句进行具体阐述。假设集群中存在两个数据节点,分别为DN1和DN2,分别部署在不同的物理机上,数据表(tt01)以及数据表中的数据在DN1和DN2中的存储情况如图9所示。分布式数据库需要执行的SQL语句为:select a,b,c,sum(b)over(order by b rows 2 preceding)from tt01,即执行该SQL语句,需要先对数据表tt01中的数据按照b列进行全量排序,前向重叠区间为2,然后计算b列前两行到当前行的和。In order to further illustrate the data calculation method described in FIG. 3, the following will be specifically described in conjunction with specific SQL query statements. Assuming that there are two data nodes in the cluster, DN1 and DN2, respectively, deployed on different physical machines, the data table (tt01) and the storage of the data in the data table in DN1 and DN2 are shown in Figure 9. The SQL statement that the distributed database needs to execute is: select a, b, c, sum(b) over (order by b rows 2 preceding) from tt01, that is, to execute the SQL statement, you need to first perform the data in the data table tt01 according to b The columns are sorted in full, and the forward overlap interval is 2, and then the sum of the first two rows of column b to the current row is calculated.
首先,DN1和DN2进行数据扫描,并按照b列进行局部排序。其排序结果如下表2所示:First, DN1 and DN2 scan the data, and perform partial sorting according to column b. The sorting results are shown in Table 2 below:
表2Table 2
DN1DN1 123123 DN2DN2 267267
 To 145145  To 289289
然后,各个DN将数据发送到目标数据节点,随机选取DN1为目标数据节点,那么DN2需要将排序后的数据发送给DN1,DN1对接收到的数据进行归并排序,使得所有需要参与计算的数据都是有序的。DN1按b列进行归并排序后的结果如下述表3所示:Then, each DN sends the data to the target data node, and randomly selects DN1 as the target data node, then DN2 needs to send the sorted data to DN1, and DN1 merges and sorts the received data so that all the data that needs to participate in the calculation is Is orderly. The results of DN1 sorted by column b are shown in Table 3 below:
表3table 3
DN1DN1 123123
 To 145145
 To 267267
 To 289289
接着,DN1需要将排序后的数据进行分区处理,确定各个分区,并按顺序发送分区到相应的数据节点。由于数据总量T为4行、数据节点数量N为2、前向重叠区间为2行,则可以计算得到每个数据节点需要计算的数据量=T/N,为2行,对排序后的数据进行分区处理得到两个分区,分区1为第一行和第二行,分区2为第一行至第四行(即所有行),重叠区间为第一行和第二行。DN1在分区完成之后,确定分区发送顺序,由于DN1和DN2分别部署在不同物理机上,所以确定的发送顺序为:将分区1发送给DN1,将分区2发送给DN2。Next, DN1 needs to partition the sorted data, determine each partition, and send the partitions to the corresponding data nodes in order. Since the total amount of data T is 4 rows, the number of data nodes N is 2, and the forward overlap interval is 2 rows, the amount of data to be calculated for each data node = T/N, which is 2 rows, can be calculated. The data is partitioned to obtain two partitions, partition 1 is the first row and the second row, partition 2 is the first row to the fourth row (that is, all rows), and the overlapping interval is the first row and the second row. After DN1 is partitioned, determine the partition sending order. Since DN1 and DN2 are deployed on different physical machines, the determined sending order is: send partition 1 to DN1, and send partition 2 to DN2.
由于存在前向重叠区间,DN1需要确定每行数据需要发往的数据节点。DN1根据前向重叠区间将所有数据划分为两个数据区间,如下述表4所示:Due to the forward overlap interval, DN1 needs to determine the data node to which each row of data needs to be sent. DN1 divides all data into two data intervals according to the forward overlap interval, as shown in Table 4 below:
表4Table 4
Figure PCTCN2021072472-appb-000005
Figure PCTCN2021072472-appb-000005
Figure PCTCN2021072472-appb-000006
Figure PCTCN2021072472-appb-000006
对于数据区间1来说,其首行数据对应的编号为1,小于分区1的尾行数据对应的编号2,以及小于分区2的尾行数据对应的编号4,因此将数据区间1发送至DN1和DN2,对于数据区间2来说,其首行数据对应的编号为3,大于分区1的尾行数据对应的编号2,但小于分区2的尾行数据对应的编号4,因此将数据区间2发送至DN2。For data interval 1, the number corresponding to the first row of data is 1, the number 2 corresponding to the last row of data smaller than partition 1, and the number 4 corresponding to the last row of data smaller than partition 2, so data interval 1 is sent to DN1 and DN2 For data interval 2, the number corresponding to the first row of data is 3, which is greater than the number 2 corresponding to the last row of partition 1 but less than the number 4 corresponding to the last row of partition 2, so the data interval 2 is sent to DN2.
DN1在完成数据发送之后,各个DN接收到的数据如下述表5所示:After DN1 finishes sending data, the data received by each DN is shown in Table 5 below:
表5table 5
Figure PCTCN2021072472-appb-000007
Figure PCTCN2021072472-appb-000007
DN1和DN2对接收到的数据进行并行计算,计算b列前两行到当前行的和,其计算结果如下述表6所示:DN1 and DN2 perform parallel calculations on the received data, and calculate the sum from the first two rows of column b to the current row. The calculation results are shown in Table 6 below:
表6Table 6
DN1DN1 123123 22 DN2DN2 267267 1212
 To 145145 66  To 289289 1818
可以看出,对于DN2来说,在计算过程中会参考前向重叠区间中的数据,但是并不对前向重叠区间中的数据进行计算,以避免重复计算,浪费分布式系统的计算资源,提供计算效率。It can be seen that for DN2, the data in the forward overlap interval is referred to in the calculation process, but the data in the forward overlap interval is not calculated, so as to avoid double calculation and waste the computing resources of the distributed system. Computational efficiency.
最后,在所有的DN完成计算之后,可以将计算结果发送给协调节点CN,得到最终SQL语句的执行结果,如下述表7所示:Finally, after all the DNs are calculated, the calculation results can be sent to the coordinating node CN to obtain the final execution results of the SQL statement, as shown in Table 7 below:
表7Table 7
Figure PCTCN2021072472-appb-000008
Figure PCTCN2021072472-appb-000008
容易理解,本申请通过将全局有序的数据发送至各个数据节点上,以充分利用分布式系统各个数据节点的计算能力,可以避免单个数据节点进行计算所造成的瓶颈,使得对数据的计算可以并行执行,提高计算和执行效率。It is easy to understand that this application sends globally ordered data to each data node to make full use of the computing power of each data node in the distributed system, which can avoid the bottleneck caused by the calculation of a single data node, so that the calculation of data can be Parallel execution improves calculation and execution efficiency.
上述详细阐述了本申请实施例的方法,为了便于更好的实施本申请实施例的上述方案,相应地,下面还提供用于配合实施上述方案的相关设备。The foregoing describes the methods of the embodiments of the present application in detail. In order to facilitate better implementation of the above solutions of the embodiments of the present application, correspondingly, the following also provides related equipment for cooperating with the implementation of the foregoing solutions.
参见图10,图10是本申请实施例提供的一种数据存储装置的结构示意图。如图10所示,该数据存储装置10包括接收单元11、处理单元12和发送单元13。其中,Refer to FIG. 10, which is a schematic structural diagram of a data storage device provided by an embodiment of the present application. As shown in FIG. 10, the data storage device 10 includes a receiving unit 11, a processing unit 12 and a sending unit 13. in,
接收单元11,用于接收分布式数据库中的其它数据节点发送的与查询语句相关的数据。The receiving unit 11 is configured to receive data related to query sentences sent by other data nodes in the distributed database.
具体地,所示接收单元11用于执行前述步骤S310,且可选的执行前述步骤中可选的方法。Specifically, the receiving unit 11 shown is configured to perform the foregoing step S310, and optionally perform optional methods in the foregoing steps.
处理单元12,用于对本地的数据和从所述其它数据节点接收的数据排序。The processing unit 12 is configured to sort the local data and the data received from the other data nodes.
具体地,所示处理单元12用于执行前述步骤S320,且可选的执行前述步骤中可选的方法。Specifically, the processing unit 12 shown is configured to execute the aforementioned step S320, and optionally execute optional methods in the aforementioned steps.
发送单元13,用于向所述分布式数据库中的至少一个数据节点发送排序后的多个数据,以便所述至少一个数据节点对各自接收的数据进行所述查询语句相关的计算。The sending unit 13 is configured to send a plurality of sorted data to at least one data node in the distributed database, so that the at least one data node performs calculations related to the query sentence on the data respectively received.
具体地,所示发送单元13用于执行前述步骤S330,且可选的执行前述步骤中可选的方法。Specifically, the sending unit 13 shown is configured to perform the foregoing step S330, and optionally perform optional methods in the foregoing steps.
在一种可能的实现方式中,所述接收单元11,具体用于:接收其它数据节点各自对本地的数据进行排序之后发送的数据。In a possible implementation manner, the receiving unit 11 is specifically configured to: receive data sent by other data nodes after sorting the local data.
在一种可能的实现方式中,所述发送单元13,具体用于:向所述分布式数据库中的多个数据节点中的不同数据节点发送至少一个不同的数据。In a possible implementation manner, the sending unit 13 is specifically configured to send at least one different data to different data nodes among the multiple data nodes in the distributed database.
在一种可能的实现方式中,所述处理单元12,还用于对所述排序后的数据中未向所述至少一个数据节点发送的数据进行所述查询语句相关的计算。In a possible implementation manner, the processing unit 12 is further configured to perform calculations related to the query sentence on the data that is not sent to the at least one data node in the sorted data.
在一种可能的实现方式中,所述处理单元12,还用于基于所述排序后的数据确定N个分区,所述N个分区中的不同分区包括至少一个不同的数据,所述N为大于1的整数,且所述N小于或等于所述分布式数据库具有的数据节点的数量;所述发送单元13,具体用于:向所述分布式数据库的N个数据节点中除了所述目标数据节点以外的每个数据节点发送所述N个分区中的一个分区的数据。In a possible implementation manner, the processing unit 12 is further configured to determine N partitions based on the sorted data, different partitions of the N partitions include at least one different data, and N is An integer greater than 1, and the N is less than or equal to the number of data nodes in the distributed database; the sending unit 13 is specifically configured to: except for the target among the N data nodes of the distributed database Each data node other than the data node sends data of one of the N partitions.
在一种可能的实现方式中,所述处理单元12,具体用于:根据所述排序后的数据的数据总量和数据重叠区间,基于所述排序后的数据得到所述N个分区。In a possible implementation manner, the processing unit 12 is specifically configured to obtain the N partitions based on the sorted data according to the total amount of data and the data overlap interval of the sorted data.
在一种可能的实现方式中,所述发送单元13,具体用于:按照物理节点的编号,将所述排序后的多个数据发送至所述至少一个数据节点,所述物理节点的编号对应的物理节点中包括所述分布式数据库中的至少一个数据节点。In a possible implementation manner, the sending unit 13 is specifically configured to send the sorted multiple data to the at least one data node according to the number of the physical node, and the number of the physical node corresponds to The physical nodes include at least one data node in the distributed database.
在一种可能的实现方式中,所述至少一个数据节点对各自接收的数据进行所述查询语句的窗口函数的计算。In a possible implementation manner, the at least one data node performs calculation of the window function of the query sentence on the data respectively received.
需要说明的是,上述数据存储装置的结构以及利用数据存储装置进行数据重新分布以实现数据的并行计算的过程仅仅作为一种示例,不应构成具体限定,可以根据需要对数据存储装置中的各个单元进行增加、减少或合并。此外,数据存储装置中的各个模块的操作和/或功能为了实现上述图3所描述的方法的相应流程,为了简洁,在此不再赘述。It should be noted that the structure of the above-mentioned data storage device and the process of using the data storage device to redistribute data to achieve parallel calculation of data are only an example, and should not constitute a specific limitation. Each of the data storage devices can be modified as needed. Units are added, reduced or merged. In addition, the operation and/or function of each module in the data storage device is to implement the corresponding process of the method described in FIG. 3 above, and is not repeated here for brevity.
参见图11,图11是本申请实施例提供的一种计算设备的结构示意图。如图11所示,该计算设备20包括:处理器21、通信接口22以及存储器23,所述处理器21、通信接口22以及存储器23通过内部总线24相互连接。应理解,该计算设备可以是数据库服务器。Refer to FIG. 11, which is a schematic structural diagram of a computing device provided by an embodiment of the present application. As shown in FIG. 11, the computing device 20 includes a processor 21, a communication interface 22 and a memory 23. The processor 21, the communication interface 22 and the memory 23 are connected to each other through an internal bus 24. It should be understood that the computing device may be a database server.
所述计算设备20可以是图1中部署了数据节点130和数据节点140的物理节点170。图1、图2和图3中的目标数据节点所执行的功能实际上是由所述计算设备的处理器21来执行。The computing device 20 may be the physical node 170 where the data node 130 and the data node 140 are deployed in FIG. 1. The functions performed by the target data node in FIGS. 1, 2 and 3 are actually performed by the processor 21 of the computing device.
所述处理器21可以由一个或者多个通用处理器构成,例如中央处理器(central processing unit,CPU),或者CPU和硬件芯片的组合。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC)、可编程逻辑器件(programmable logic device, PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD)、现场可编程逻辑门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合。The processor 21 may be composed of one or more general-purpose processors, such as a central processing unit (CPU), or a combination of a CPU and a hardware chip. The aforementioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The above-mentioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
总线24可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。所述总线24可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条粗线表示,但不表示仅有一根总线或一种类型的总线。The bus 24 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus 24 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used to represent in FIG. 11, but it does not mean that there is only one bus or one type of bus.
存储器23可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM);存储器23也可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM)、快闪存储器(flash memory)、硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器23还可以包括上述种类的组合。程序代码可以是用来实现数据存储装置10所示的功能单元,或者用于实现图3所示的方法实施例中以目标数据节点为执行主体的方法步骤。The memory 23 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM); the memory 23 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory). Only memory (ROM), flash memory (flash memory), hard disk drive (HDD), or solid-state drive (SSD); the memory 23 may also include a combination of the above types. The program code may be used to implement the functional units shown in the data storage device 10, or to implement the method steps in the method embodiment shown in FIG. 3 with the target data node as the execution subject.
本申请实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时,可以实现上述方法实施例中记载的任意一种的部分或全部步骤,以及实现上述图10所描述的任意一个功能单元的功能。The embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, it can implement part or all of the steps of any one of the above method embodiments, and realize the above The function of any one of the functional units described in Figure 10.
本申请实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时,可以实现上述方法实施例中记载的任意一种的部分或全部步骤,以及实现上述图10所描述的任意一个功能单元的功能。The embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, it can implement part or all of the steps of any one of the above method embodiments, and realize the above The function of any one of the functional units described in Figure 10.
本申请实施例还提供了一种计算机程序产品,当其在计算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个步骤。上述所涉及的设备的各组成单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在所述计算机可读取存储介质中。The embodiments of the present application also provide a computer program product, which when it runs on a computer or a processor, enables the computer or the processor to execute one or more steps in any of the foregoing methods. If each component unit of the aforementioned equipment is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in the computer readable storage medium.
在上述实施例中,对各个实施例的描述各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.
还应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should also be understood that in various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not be implemented in this application. The implementation process of the example constitutes any limitation.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions recorded in the embodiments are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present application.

Claims (19)

  1. 一种数据计算的方法,其特征在于,包括:A method of data calculation, characterized in that it includes:
    分布式数据库中的目标数据节点接收所述分布式数据库中的其它数据节点发送的与查询语句相关的数据;The target data node in the distributed database receives data related to query sentences sent by other data nodes in the distributed database;
    所述目标数据节点对本地的数据和从所述其它数据节点接收的数据排序;The target data node sorts the local data and the data received from the other data nodes;
    所述目标数据节点向所述分布式数据库中的至少一个数据节点发送排序后的多个数据,以便所述至少一个数据节点对各自接收的数据进行所述查询语句相关的计算。The target data node sends a plurality of sorted data to at least one data node in the distributed database, so that the at least one data node performs calculations related to the query sentence on the data respectively received.
  2. 如权利要求1所述的方法,其特征在于,所述方法包括:The method of claim 1, wherein the method comprises:
    所述其它数据节点各自对本地的数据进行排序,并向所述目标数据节点发送排序后的数据。Each of the other data nodes sorts the local data, and sends the sorted data to the target data node.
  3. 如权利要求1或2所述的方法,其特征在于,所述目标数据节点向所述至少一个数据节点发送排序后的多个数据,包括:The method according to claim 1 or 2, wherein the sending, by the target data node, a plurality of sorted data to the at least one data node comprises:
    所述目标数据节点向所述分布式数据库中的多个数据节点中的不同数据节点发送至少一个不同的数据。The target data node sends at least one different piece of data to different data nodes among the multiple data nodes in the distributed database.
  4. 如权利要求1至3任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 3, wherein the method further comprises:
    所述目标数据节点对所述排序后的数据中未向所述至少一个数据节点发送的数据进行所述查询语句相关的计算。The target data node performs calculations related to the query sentence on data that is not sent to the at least one data node in the sorted data.
  5. 如权利要求1至4任一项所述的方法,其特征在于,所述方法包括:The method according to any one of claims 1 to 4, wherein the method comprises:
    所述目标数据节点基于所述目标数据节点排序后的数据确定N个分区,所述N个分区中的不同分区包括至少一个不同的数据,所述N为大于1的整数,且所述N小于或等于所述分布式数据库具有的数据节点的数量;The target data node determines N partitions based on the sorted data of the target data node, different partitions in the N partitions include at least one different data, the N is an integer greater than 1, and the N is less than Or equal to the number of data nodes in the distributed database;
    所述目标数据节点向所述至少一个数据节点发送排序后的数据包括:The sending, by the target data node, the sorted data to the at least one data node includes:
    所述目标数据节点向所述分布式数据库的N个数据节点中除了所述目标数据节点以外的每个数据节点发送所述N个分区中的一个分区的数据。The target data node sends data of one of the N partitions to each of the N data nodes of the distributed database except the target data node.
  6. 如权利要求5所述的方法,其特征在于,所述目标数据节点基于所述目标数据节点排序后的数据确定N个分区包括:The method of claim 5, wherein the target data node determining N partitions based on the sorted data of the target data node comprises:
    所述目标数据节点根据所述目标数据节点排序后的数据的数据总量和数据重叠区间,基于所述目标数据节点排序后的数据得到所述N个分区。The target data node obtains the N partitions based on the sorted data of the target data node according to the total amount of data and the data overlap interval of the sorted data of the target data node.
  7. 如权利要求1至6任一项所述的方法,其特征在于,所述目标数据节点向所述其它数据节点发送排序后的多个数据包括:The method according to any one of claims 1 to 6, wherein the sending, by the target data node, a plurality of sorted data to the other data node comprises:
    所述目标数据节点按照物理节点的编号,将所述目标数据节点排序后的多个数据发送 至所述至少一个数据节点,所述物理节点的编号对应的物理节点中包括所述分布式数据库中的至少一个数据节点。The target data node sends a plurality of sorted data of the target data node to the at least one data node according to the number of the physical node, and the physical node corresponding to the number of the physical node includes the distributed database At least one data node.
  8. 如权利要求1至7任一项所述的方法,其特征在于,所述至少一个数据节点对各自接收的数据进行所述查询语句相关的计算,包括:The method according to any one of claims 1 to 7, wherein the at least one data node performs calculations related to the query sentence on the data respectively received, comprising:
    所述至少一个数据节点对各自接收的数据进行所述查询语句的窗口函数的计算。The at least one data node performs calculation of the window function of the query sentence on the data respectively received.
  9. 一种数据存储装置,其特征在于,包括:A data storage device is characterized in that it comprises:
    接收单元,用于接收分布式数据库中的其它数据节点发送的与查询语句相关的数据;The receiving unit is used to receive data related to query sentences sent by other data nodes in the distributed database;
    处理单元,用于对本地的数据和从所述其它数据节点接收的数据排序;A processing unit for sorting the local data and the data received from the other data nodes;
    发送单元,用于向所述分布式数据库中的至少一个数据节点发送排序后的多个数据,以便所述至少一个数据节点对各自接收的数据进行所述查询语句相关的计算。The sending unit is configured to send a plurality of sorted data to at least one data node in the distributed database, so that the at least one data node performs calculations related to the query sentence on the data respectively received.
  10. 如权利要求9所述的数据存储装置,其特征在于,所述接收单元,具体用于:9. The data storage device according to claim 9, wherein the receiving unit is specifically configured to:
    接收其它数据节点各自对本地的数据进行排序之后发送的数据。Receive data sent after other data nodes sort the local data.
  11. 如权利要求9或10所述的数据存储装置,其特征在于,所述发送单元,具体用于:The data storage device according to claim 9 or 10, wherein the sending unit is specifically configured to:
    向所述分布式数据库中的多个数据节点中的不同数据节点发送至少一个不同的数据。At least one different piece of data is sent to different data nodes among the multiple data nodes in the distributed database.
  12. 如权利要求9至11任一项所述的数据存储装置,其特征在于,The data storage device according to any one of claims 9 to 11, wherein:
    所述处理单元,还用于对所述排序后的数据中未向所述至少一个数据节点发送的数据进行所述查询语句相关的计算。The processing unit is further configured to perform calculations related to the query sentence on the data that is not sent to the at least one data node in the sorted data.
  13. 如权利要求9至12任一项所述的数据存储装置,其特征在于,The data storage device according to any one of claims 9 to 12, wherein:
    所述处理单元,还用于基于所述排序后的数据确定N个分区,所述N个分区中的不同分区包括至少一个不同的数据,所述N为大于1的整数,且所述N小于或等于所述分布式数据库具有的数据节点的数量;The processing unit is further configured to determine N partitions based on the sorted data, different partitions of the N partitions include at least one different data, the N is an integer greater than 1, and the N is less than Or equal to the number of data nodes in the distributed database;
    所述发送单元,具体用于:The sending unit is specifically used for:
    向所述分布式数据库的N个数据节点中除了所述目标数据节点以外的每个数据节点发送所述N个分区中的一个分区的数据。Sending the data of one of the N partitions to each of the N data nodes of the distributed database except the target data node.
  14. 如权利要求13所述的数据存储装置,其特征在于,所述处理单元,具体用于:The data storage device according to claim 13, wherein the processing unit is specifically configured to:
    根据所述排序后的数据的数据总量和数据重叠区间,基于所述排序后的数据得到所述N个分区。According to the total amount of data and the data overlap interval of the sorted data, the N partitions are obtained based on the sorted data.
  15. 如权利要求9至14任一项所述的数据存储装置,其特征在于,所述发送单元,具体用于:The data storage device according to any one of claims 9 to 14, wherein the sending unit is specifically configured to:
    按照物理节点的编号,将所述排序后的多个数据发送至所述至少一个数据节点,所述 物理节点的编号对应的物理节点中包括所述分布式数据库中的至少一个数据节点。Send the sorted multiple data to the at least one data node according to the serial number of the physical node, and the physical node corresponding to the serial number of the physical node includes at least one data node in the distributed database.
  16. 如权利要求9至15任一项所述的数据存储装置,其特征在于,所述至少一个数据节点对各自接收的数据进行所述查询语句的窗口函数的计算。The data storage device according to any one of claims 9 to 15, wherein the at least one data node calculates the window function of the query sentence on the data respectively received.
  17. 一种计算设备,其特征在于,所述计算设备包括处理器和存储器,所述处理器执行所述存储器存储的计算机指令,使得所述计算设备执行权利要求1至8任一项所述的方法。A computing device, wherein the computing device includes a processor and a memory, and the processor executes computer instructions stored in the memory, so that the computing device executes the method according to any one of claims 1 to 8 .
  18. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有计算机程序,所述计算机程序在被计算设备执行时实现权利要求1至8任一项所述的方法。A computer storage medium, wherein the computer storage medium stores a computer program, and the computer program implements the method according to any one of claims 1 to 8 when executed by a computing device.
  19. 一种计算机程序产品,所述计算机程序产品包括计算机指令,当所述计算机指令被计算设备执行时,所述计算设备可以执行权利要求1至8任一项所述的方法。A computer program product, the computer program product comprising computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute the method according to any one of claims 1 to 8.
PCT/CN2021/072472 2020-01-22 2021-01-18 Data calculation method and related device WO2021147815A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010076105.0 2020-01-22
CN202010076105.0A CN111324433B (en) 2020-01-22 2020-01-22 Data calculation method and related equipment

Publications (1)

Publication Number Publication Date
WO2021147815A1 true WO2021147815A1 (en) 2021-07-29

Family

ID=71172843

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/072472 WO2021147815A1 (en) 2020-01-22 2021-01-18 Data calculation method and related device

Country Status (2)

Country Link
CN (1) CN111324433B (en)
WO (1) WO2021147815A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324433B (en) * 2020-01-22 2023-11-10 华为云计算技术有限公司 Data calculation method and related equipment
CN112257859A (en) * 2020-10-30 2021-01-22 地平线(上海)人工智能技术有限公司 Characteristic data processing method and device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110145404A1 (en) * 2007-06-27 2011-06-16 Computer Associates Think, Inc. Autonomic Control of a Distributed Computing System Using Finite State Machines
CN105740264A (en) * 2014-12-10 2016-07-06 北大方正集团有限公司 Distributed XML database sorting method and apparatus
CN109032766A (en) * 2018-06-14 2018-12-18 阿里巴巴集团控股有限公司 A kind of transaction methods, device and electronic equipment
CN111324433A (en) * 2020-01-22 2020-06-23 华为技术有限公司 Data calculation method and related equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9852184B2 (en) * 2014-11-03 2017-12-26 Sap Se Partition-aware distributed execution of window operator
US11249973B2 (en) * 2018-05-03 2022-02-15 Sap Se Querying partitioned tables in a distributed database

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110145404A1 (en) * 2007-06-27 2011-06-16 Computer Associates Think, Inc. Autonomic Control of a Distributed Computing System Using Finite State Machines
CN105740264A (en) * 2014-12-10 2016-07-06 北大方正集团有限公司 Distributed XML database sorting method and apparatus
CN109032766A (en) * 2018-06-14 2018-12-18 阿里巴巴集团控股有限公司 A kind of transaction methods, device and electronic equipment
CN111324433A (en) * 2020-01-22 2020-06-23 华为技术有限公司 Data calculation method and related equipment

Also Published As

Publication number Publication date
CN111324433A (en) 2020-06-23
CN111324433B (en) 2023-11-10

Similar Documents

Publication Publication Date Title
WO2021147815A1 (en) Data calculation method and related device
CN105550318B (en) A kind of querying method based on Spark big data processing platforms
US20140188906A1 (en) Hash Table and Radix Sort Based Aggregation
CN106156159B (en) A kind of table connection processing method, device and cloud computing system
WO2019219010A1 (en) Data migration method and device and computer readable storage medium
WO2021052169A1 (en) Equalization processing method and device for distributed data, computing terminal and storage medium
US10127281B2 (en) Dynamic hash table size estimation during database aggregation processing
US9235621B2 (en) Data-aware scalable parallel execution of rollup operations
US11221890B2 (en) Systems and methods for dynamic partitioning in distributed environments
CN114237908A (en) Resource arrangement optimization method and system for edge computing
US20170371892A1 (en) Systems and methods for dynamic partitioning in distributed environments
US11544260B2 (en) Transaction processing method and system, and server
EP4246340A1 (en) System, method, and apparatus for data query using network device
JP5108011B2 (en) System, method, and computer program for reducing message flow between bus-connected consumers and producers
CN112423041B (en) Video stream processing method and system based on QoS constraint under distributed computing platform
TW201627873A (en) Method and Apparatus of Processing Retransmission Request in Distributed Computing
WO2017113865A1 (en) Method and device for big data increment calculation
WO2022161081A1 (en) Training method, apparatus and system for integrated learning model, and related device
US10268727B2 (en) Batching tuples
JP2021508867A (en) Systems, methods and equipment for querying databases
CN114064706A (en) Data storage method and device and server
CN111813761A (en) Database management method and device and computer storage medium
WO2020211718A1 (en) Data processing method, apparatus and device
CN113204602B (en) Data processing method, device, equipment and storage medium
CN112506955B (en) Query processing method, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21745172

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21745172

Country of ref document: EP

Kind code of ref document: A1