WO2022033696A1

WO2022033696A1 - Apparatus and method for distributing data over a plurality of database nodes

Info

Publication number: WO2022033696A1
Application number: PCT/EP2020/072815
Authority: WO
Inventors: Victor Alvarez; Alexander NOZDRIN
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2020-08-13
Filing date: 2020-08-13
Publication date: 2022-02-17

Abstract

An apparatus (101) for distributing a plurality of data elements to a plurality of nodes (103a-c) of a distributed database system (100) is disclosed. The apparatus (100) is configured to partition an-dimensional space into a plurality of n-dimensional sub-spaces, wherein each data element is associated with a position in the n-dimensional space (201) and the position of each data element is associated with one of the plurality of n-dimensional sub-spaces. Moreover, the apparatus (101) is configured to determine a space-filling curve extending through the plurality of sub-spaces and defining a linear order of the plurality of sub-spaces. The apparatus (101) is further configured to distribute the data elements to the plurality of nodes (103a-c) based on the linear order of the plurality of sub-spaces defined by the space-filling curve.

Description

APPARATUS AND METHOD FOR DISTRIBUTING DATA OVER A PLURALITY OF DATABASE NODES

TECHNICAL FIELD

The present disclosure relates to distributed database systems. More specifically, the present disclosure relates to an apparatus and method for distributing data over a plurality of database nodes of a distributed database system.

BACKGROUND

In a conventional distributed system of relational databases, data is often distributed among database nodes using a statement such as the known "DISTRIBUTE BY" statement. This statement works usually on the identifier of a data element ignoring completely any geographic or geometric property that might be associated with the data element, such as the location where the data element was recorded. In this manner, it can be ensured that the data is fairly distributed among the different data nodes of the distributed system, but any sort of spatial locality is lost. As a result, spatial queries have to be issued to all database nodes, because the requested data may be stored on any of the database nodes.

For non-relational database clusters, it is known to use a global spatial index to distribute the data among the database nodes. This global spatial index helps to improve spatial locality, but it is prone to produce empty partitions. As a result, there could be database nodes that are more loaded than others.

SUMMARY

It is an objective of the present disclosure to provide an improved device, system, and method for distributing data over a plurality of database nodes of a distributed database system.

The foregoing and other objectives are achieved by the subject matter of the independent claims. Further implementation forms are apparent from the dependent claims, the description, and the figures. According to a first aspect, an apparatus for distributing a plurality of data elements to a plurality of nodes of a distributed database system is provided. The apparatus is configured to partition a finite n-dimensional space into a plurality of n-dimensional subspaces, wherein each data element is associated with a position in the finite n- dimensional space with n

2 and each node of the plurality of nodes stores a subset of the plurality of data elements. The position of each data element in the n-dimensional space is associated with one of the plurality of n-dimensional sub-spaces. Moreover, the apparatus is configured to determine a space-filling curve extending through the plurality of sub-spaces, wherein the space-filling curve defines a linear order of the plurality of subspaces. Moreover, the apparatus is configured to distribute data elements of the plurality of data elements to nodes of the plurality of nodes based on the linear order of the plurality of sub-spaces defined by the space-filling curve. Advantageously, the apparatus allows populating a distributed system of database nodes handling spatial data in a manner such that a certain spatial locality is kept, while also distributing the data among the database nodes in a fair manner. By exploiting geometric/spatial locality of the data in the database nodes, a query could potentially be issued only to the database nodes in charge of the queried region, as opposed to executing the query at every single data node.

In a further possible implementation form, the apparatus is configured to partition the n- dimensional space into the plurality of n-dimensional sub-spaces using a tree data structure comprising a plurality of nodes, wherein each node of the tree data structure has 2ⁿ children, i.e. child nodes.

In a further possible implementation form, the apparatus is configured to partition the n- dimensional space into the plurality of n-dimensional sub-spaces using the tree data structure, as long as a number of data elements associated with one of the plurality of sub-spaces is larger than a threshold number. Advantageously, this allows reducing the storage node of each database node to a desired storage load associated with the threshold number.

In a further possible implementation form, the n-dimensional space is a two-dimensional space and the tree data structure is a quadtree data structure.

In a further possible implementation form, the apparatus is further configured to randomly select a subset of the plurality of data elements and generate the tree data structure based on the randomly selected subset of the plurality of data elements. Advantageously, this allows the apparatus generating the tree data structure in a very efficient manner.

In a further possible implementation form, the space-filling curve is a Hilbert curve or a Z- order curve.

In a further possible implementation form, the apparatus is further configured to distribute a first subset of the plurality of data elements associated with one or more first sub-spaces of the plurality of sub-spaces to a first node of the plurality of nodes and a second subset of the plurality of data elements associated with one or more second sub-spaces of the plurality of sub-spaces to a second node of the plurality of nodes, wherein, based on the linear order of the plurality of sub-spaces defined by the space-filling curve, each of the one or more first sub-spaces of the plurality of sub-spaces is arranged before each of the one or more second sub-spaces of the plurality of sub-spaces.

The apparatus may be configured to apply the same scheme to one or more further nodes, e.g. to additionally distribute a third subset of the plurality of data elements associated with one or more third sub-spaces of the plurality of sub-spaces to a third node of the plurality of nodes, wherein based on the linear order of the plurality of sub-spaces defined by the space-filling curve each of the one or more first sub-spaces of the plurality of sub-spaces and each of the one or more second sub-spaces of the plurality of subspaces is arranged before each of the one or more third sub-spaces of the plurality of subspaces, and do on for a fourth and further nodes.

In a further possible implementation form, the apparatus is configured to distribute the second subset of the plurality of data elements associated with the one or more second sub-spaces of the plurality of sub-spaces to the second node of the plurality of nodes, after a data storage capacity of the first node of the plurality of nodes has been reached. In case of more than two nodes, the apparatus is configured to distribute the third subset of the plurality of data elements associated with the one or more third sub-spaces of the plurality of sub-spaces to the third node of the plurality of nodes, after a data storage capacity of the second node of the plurality of nodes has been reached, and so on for a fourth and further nodes.

In a further possible implementation form, at least two of the plurality of sub-spaces have different sizes. According to a second aspect, a distributed database system comprising an apparatus according to the first aspect and a plurality of nodes, wherein each node of the plurality of nodes is configured to store a subset of the plurality of data elements, is provided.

In a further possible implementation form of the second aspect, at least two of the plurality of nodes have different data storage capacities. Some of the nodes may have substantially the same data storage capacities.

According to a third aspect, a method is provided for distributing a plurality of data elements to a plurality of nodes of a distributed database system. The method comprises the steps of: partitioning a n-dimensional space into a plurality of n-dimensional subspaces, wherein each data element of the plurality of data elements is associated with a position in the n-dimensional space, where n > 2, each node of the plurality of nodes storing a subset of the plurality of data elements, and wherein the position of each data element in the n-dimensional space is associated with one of the plurality of n-dimensional sub-spaces; determining a space-filling curve extending through the plurality of subspaces, wherein the space-filling curve defines a linear order of the plurality of subspaces; and distributing data elements of the plurality of data elements to nodes of the plurality of nodes based on the linear order of the plurality of sub-spaces defined by the space-filling curve.

In a further possible implementation form of the third aspect, the step of partitioning the n- dimensional space into the plurality of n-dimensional sub-spaces comprises using a tree data structure comprising a plurality of nodes for partitioning the n-dimensional space into the plurality of n-dimensional sub-spaces, wherein each internal node of the tree data structure has 2ⁿ children, i.e. child nodes.

In a further possible implementation form of the third aspect, the step of distributing comprises distributing a first subset of the plurality of data elements associated with one or more first sub-spaces of the plurality of sub-spaces to a first node and a second subset of the plurality of data elements associated with one or more second sub-spaces of the plurality of sub-spaces to a second node, wherein based on the linear order of the plurality of sub-spaces defined by the space-filling curve each of the one or more first sub-spaces of the plurality of sub-spaces is arranged before each of the one or more second subspaces of the plurality of sub-spaces. The data distribution method according to the third aspect of the present disclosure can be performed by the apparatus according to the first aspect of the present disclosure and the distributed database system according to the second aspect of the present disclosure. Thus, further features of the data distribution method according to the third aspect of the present disclosure result directly from the functionality of the apparatus according to the first aspect of the present disclosure and/or the distributed database system according to the second aspect of the present disclosure as well as their different implementation forms described above and below.

According to a fourth aspect, a computer program product comprising a non-transitory computer-readable storage medium for storing program code which causes a computer or a processor to perform the method according to the third aspect, when the program code is executed by the computer or the processor, is provided.

Details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following, embodiments of the present disclosure are described in more detail with reference to the attached figures and drawings, in which:

Fig. 1 is a schematic diagram illustrating a distributed database system according to an embodiment, including an apparatus for distributing data over the nodes of the distributed database system;

Fig. 2 is a schematic diagram illustrating different aspects implemented by an apparatus according to an embodiment for an exemplary two-dimensional data space; and

Fig. 3 is a flow diagram illustrating different steps of a method for distributing data over the nodes of a distributed database according to an embodiment.

In the following, identical reference signs refer to identical or at least functionally equivalent features. DETAILED DESCRIPTION OF THE EMBODIMENTS

In the following description, reference is made to the accompanying figures, which form part of the disclosure, and which show, by way of illustration, specific aspects of embodiments of the present disclosure or specific aspects in which embodiments of the present disclosure may be used. It is understood that embodiments of the present disclosure may be used in other aspects and comprise structural or logical changes not depicted in the figures. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims.

For instance, it is to be understood that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if one or a plurality of specific method steps are described, a corresponding device may include one or a plurality of units, e.g. functional units, to perform the described one or plurality of method steps (e.g. one unit performing the one or plurality of steps, or a plurality of units each performing one or more of the plurality of steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a specific apparatus is described based on one or a plurality of units, e.g. functional units, a corresponding method may include one step to perform the functionality of the one or plurality of units (e.g. one step performing the functionality of the one or plurality of units, or a plurality of steps each performing the functionality of one or more of the plurality of units), even if such one or plurality of steps are not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless specifically noted otherwise.

Figure 1 is a schematic diagram illustrating a distributed database system 100 according to an embodiment. The distributed database system 100 comprises a plurality of database nodes (or short "nodes") 103a-c configured to store data, wherein each database node 103a-c is configured to store a subset of the data. By way of example, three different database nodes 103a-c are shown in figure 1. As will be appreciated, however, the distributed database system 100 may comprise more or less than the three different nodes 103a-c shown in figure 1. In an embodiment, the different nodes 103a-c of the distributed database system 100 may have different data storage capacities. For instance, the first node 103a may have a data storage capacity that is about twice as large as the data storage capacity of the second node 103b and/or the third node 103c.

The distributed database system 100 further comprises an apparatus 101 for distributing data over the different nodes 103a-c of the distributed database system 100. In an embodiment, the apparatus 101 may be a database node itself, i.e. the apparatus 101 may store a portion of the data itself.

As will be described in more detail below, the data comprises a plurality of data elements or records, wherein each data element is associated with a position in a space having two or more dimensions. An example for such a space is the two-dimensional space 201 shown in figure 2 on the left. Although a detailed embodiment will be described in the following in the context of the two-dimensional space 201 shown in figure 2 on the left, the person skilled in the art will appreciate that the concepts described herein can be applied to a space having more dimensions than the two-dimensional space 201 shown in figure 2.

In an embodiment, the exemplary two-dimensional space 201 shown in figure 2 may be, for instance, a square-shaped or rectangular portion of a map and each data element of the plurality of data elements may be associated with a respective position on this map 201. In an embodiment, each data element may be associated, for instance, with coordinates defining the respective position on the map 201 , such as x, y coordinates, or latitude and longitude coordinates.

As illustrated in figure 2, the apparatus 101 is configured to partition the two-dimensional space, i.e. the map 201 into a plurality of two-dimensional sub-spaces 203a-q, i.e. portions or cells q of the map 201. For the sake of clarity, not all of the two-dimensional sub-spaces of the map 201 have been provided with reference signs. However, as will be appreciated, the following description may apply to all of the two-dimensional sub-spaces of the map 201 and not only to the two-dimensional sub-spaces 203a-q provided with a reference sign in figure 2.

In the embodiment shown in figure 2, the plurality of two-dimensional sub-spaces 203a-q have the same shape as the two-dimensional space, i.e. the map 201 , namely a square shape. In other embodiments, the shape of the space 201 and/or its sub-spaces 203a-q may be different, for instance, rectangular. As will be further appreciated from figure 2, in this embodiment the apparatus 101 is configured to partition the two-dimensional space, i.e. the map 201 into the plurality of two-dimensional sub-spaces 203a-q such that there is no overlap between any of the two-dimensional sub-spaces 203a-q. This allows the apparatus 101 to uniquely assign any position on the map 201 to one of the plurality of two-dimensional sub-spaces 203a-q. In other words, the apparatus 101 is configured to partition the two-dimensional space, i.e. the map 201 into the plurality of two-dimensional sub-spaces 203a-q such that the position of each data element can be associated with, i.e. assigned to one of the plurality of sub-spaces 203a-q.

As can be taken from figure 2, the apparatus 101 is further configured to determine a space-filling curve 205 (illustrated by the dashed line) starting at one of the plurality of sub-spaces 203a-q (e.g. at a center thereof) and extending through the plurality of other sub-spaces 203a-q. For instance, in the exemplary embodiment shown in figure 2, the space-filling curve 205 generated by the apparatus 101 starts at the sub-space 203j in the southwest corner of the map 201 , extends to the adjacent sub-space 203 i to the north and so on to the other sub-spaces before finally ending at the sub-space 203q in the southeast corner of the map 201 . Along the way all of the sub-spaces of the map 201 are traversed by the space-filling curve 205. In doing so, the space-filling curve 205 defines a linear order of all of the sub-spaces of the map 201 , i.e. starting with the first sub-space 203j and ending with the last sub-space 203q. For instance, in the exemplary scenario shown in figure 2, the space-filling curve 205 defines the following linear order: (First) sub-space 203j; (Second) sub-space 203i; (Third) sub-space 203k; (Fourth) sub-space 203I; (Fifth) sub-space 203m; (Sixth) sub-space 203n; ...; (Second to last) sub-space 203p; and (Last) sub-space 203q. In an embodiment, the space-filling curve 205 may be a Hilbert curve or a Z-order curve.

In an embodiment, for the general n-dimensional case, the apparatus 101 is configured to partition the n-dimensional space into the plurality of sub-spaces using a tree data structure comprising a plurality of nodes, wherein each internal node of the tree data structure has 2ⁿ child nodes. Moreover, the apparatus 101 may be configured to partition the n-dimensional space into the plurality of sub-spaces using the tree data structure, as long as the number of data elements associated with one of the plurality of sub-spaces is larger than a threshold number. In other words, the apparatus 101 may adjust the partitioning level in each region of the space 201 based on the number of data elements associated with each respective region of the space. As illustrated in figure 2 for the case of a two-dimensional space 201 , this may lead to sub-spaces 203a-q having different sizes. In an embodiment, the apparatus 101 is configured to randomly select a subset of the plurality of data elements and to generate the tree data structure on the basis of the randomly selected subset of the plurality of data elements. In other words, for generating the tree data structure the apparatus 101 does not have to use all of the data elements, but may use a randomly selected subset thereof. This results in a more efficient generation of the tree data structure.

For the example of the two-dimensional space, i.e. map 201 shown in figure 2 on the left, the tree data structure is a quadtree data structure. A portion of this quadtree structure 210 is shown in figure 2 on the right. The quadtree structure 210 comprises a root node 212 and a plurality of child nodes, including a plurality of internal nodes 211. For the two- dimensional case shown in figure 2, the root node 212 and each internal node 211 of the quadtree data structure has 4 child nodes. As will be appreciated, the quadtree structure 210 can be considered to define the partitioning level for each region of the map 201.

The apparatus 101 is further configured to distribute the data elements to the plurality of nodes 103a-c based on the linear order of the plurality of sub-spaces 203a-q defined by the space-filling curve 205. For instance, the apparatus 101 may distribute the data elements associated with positions within the first eight sub-spaces of the map 201 (illustrated by the grey background in figure 2) to the first database node 103a, and the data elements associated with positions within the ninth and the following sub-spaces of the map 201 to the second and third database nodes 103b, c. In other words, in an embodiment, the apparatus 101 is configured to distribute a first subset of the plurality of data elements associated with one or more first sub-spaces (e.g. the first eight subspaces) of the plurality of sub-spaces 203a-q to the first node 103a and a second subset of the plurality of data elements associated with one or more second sub-spaces of the plurality of sub-spaces 203a-q to the second node 103b. Based on the linear order of the plurality of sub-spaces defined by the space-filling curve 205, each of the one or more first sub-spaces of the plurality of sub-spaces 203a-q is arranged before each of the one or more second sub-spaces of the plurality of sub-spaces 203a-q.

In an embodiment, the apparatus 101 is configured to distribute the second subset of the plurality of data elements associated with the one or more second sub-spaces of the plurality of sub-spaces 203a-q to the second node 103b, once a data storage capacity of the first node 103a has been reached. In other words, in the example just described in the context of figure 2, once the first node 103a has been filled up with data elements associated with positions within the first eight sub-spaces (illustrated with the grey background in figure 2), the apparatus 101 will switch to the next node, e.g. the second node 103b.

In an embodiment, the apparatus 101 may be configured to implement the following algorithm:

1. Perform a (uniform) random sampling of the data elements.

2. Construct the quadtree data structure T 211 over the randomly sampled data elements. During the construction of the quadtree data structure T 211 , its cells/nodes are refined until no cell contains more than a > 0 data elements in expectation.

3. Once the quadtree data structure T 211 has been generated, its cells are traversed in the linear order given by the space-filling curve C 205 and the data elements associated with each cell are assigned to the database nodes 103a-c in the following way:

3a. The cells of the quadtree data structure T 211 using the space-filling curve C 205 are enumerated as C₁, C₂, C₃, ... , C_k. Because of the above stopping conditions, for the refinement of the cells of the quadtree data structure T 211 , no cell contains (in expectation) more than a > 0 data elements.

3b. Let m > 1 be the number of database nodes in the distributed system. Let DN₇ denote data node j with 1 < j < m, and let denote its capacity (the amount of data elements the database node can handle). Starting from cell C_x, assign all data points of as many consecutive cells (following the labeling of the cells given by C) of T to the first database node DN_X such that the capacity r_x of DN_X is fulfilled. Do so consecutively for the rest of the database nodes until all data has been loaded into the system of database nodes.

That is:

• Let s «- 1

• For i «- 1, ... , m do

1 . Assign data points in cells C_s u ••• u C_r, for some r > 0, to DNj such that capacity H is fulfilled

2. Let s «- r + 1 For every visited cell, starting with C_x, all its data elements are gathered before assigning them to the current database node. In this manner, it can be determined how many data element are exactly assigned and, thus, it can be avoided to overload a respective database node.

If the exact number of data elements in cell strongly deviates from what is expected (for instance, by 5% of the expected value) and this amount of data elements does not fit into the current database node DN,, then the cell C, is further refined into several smaller cells C--, cf, cf, ... , C- for some > 4 such that the capacity F _t of the current database node DN _£ is sufficient to store the data elements from the smaller cells.

As already described above, by using the combination of the quadtree data structure T 211 with the space-filling curve C 205, the apparatus 101 allows maintaining certain geometric/spatial information associated with the data elements assigned to every database node 103a-c, while fulfilling also the capacities of the database nodes 103a-c.

Figure 3 is a flow diagram of a corresponding method 300 for distributing a plurality of data elements over the plurality of database nodes 103a-c of the distributed database system 100. As already described above, each data element is associated with a position in a n-dimensional space 201 with n

2, such as the two-dimensional space 201 shown in figure 2, wherein each database node 103a-c is configured to store a subset of the plurality of data elements. The method 300 comprises the steps of: partitioning, at 301 , the n-dimensional space 201 into a plurality of sub-spaces 203a-q, wherein the position of each data element is associated with one of the plurality of subspaces 203a-q; generating, at 303, a space-filling curve 205 extending through the plurality of sub-spaces 203a-q, wherein the space-filling curve 205 defines a linear order of the plurality of subspaces 203a-q; and distributing, at 305, the data elements to the plurality of nodes 103a-c based on the linear order of the plurality of sub-spaces 203a-q defined by the space-filling curve 205.

The person skilled in the art will understand that the "blocks" ("units") of the various figures (method and apparatus) represent or describe functionalities of embodiments of the present disclosure (rather than necessarily individual "units" in hardware or software) and thus describe equally functions or features of apparatus embodiments as well as method embodiments (unit = step).

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described embodiment of an apparatus is merely exemplary. For example, the unit division is merely logical function division and may be another division in an actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, functional units in the embodiments of the invention may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.

Claims

1. An apparatus (101) for distributing a plurality of data elements to a plurality of nodes (103a-c) of a distributed database system (100), wherein the apparatus (101) is configured to: partition a n-dimensional space (201) into a plurality of sub-spaces (203a-q), wherein each data element of the plurality of data elements is associated with a position in the n- dimensional space (201), where n > 2, each node of the plurality of nodes (103a-c) storing a subset of the plurality of data elements, and wherein the position of each data element in the n-dimensional space (201) is associated with one of the plurality of sub-spaces (203a-q); determine a space-filling curve (205) extending through the plurality of sub-spaces (203a- q), wherein the space-filling curve (205) defines a linear order of the plurality of subspaces (203a-q); and distribute data elements of the plurality of data elements to nodes of the plurality of nodes (103a-c) based on the linear order of the plurality of sub-spaces (203a-q) defined by the space-filling curve (205).

2. The apparatus (101) of claim 1 , wherein the apparatus (101) is configured to partition the n-dimensional space (201) into the plurality of sub-spaces (203a-q) using a tree data structure (210) comprising a plurality of nodes (211 , 212), wherein each node (211) of the tree data structure (210) has 2ⁿ child nodes (211).

3. The apparatus (101) of claim 2, wherein the apparatus (101) is configured to partition the n-dimensional space (201) into the plurality of sub-spaces (203a-q) using the tree data structure (210), as long as a number of data elements associated with one of the plurality of sub-spaces (203a-q) is larger than a threshold number.

4. The apparatus (101) of claim 2 or 3, wherein the n-dimensional space (201) is a two-dimensional space (201) and the tree data structure (210) is a quadtree data structure (210).

5. The apparatus (101) of any one of claims 2 to 4, wherein the apparatus (101) is further configured to randomly select a subset of the plurality of data elements and generate the tree data structure (210) based on the randomly selected subset of the plurality of data elements.

6. The apparatus (101) of any one of the preceding claims, wherein the space-filling curve (205) is a Hilbert curve or a Z-order curve.

7. The apparatus (101) of any one of the preceding claims, wherein the apparatus (101) is further configured to distribute a first subset of the plurality of data elements associated with one or more first sub-spaces of the plurality of sub-spaces (203a-q) to a first node (103a) of the plurality of nodes (103a-c) and a second subset of the plurality of data elements associated with one or more second sub-spaces of the plurality of subspaces (203a-q) to a second node (103b) of the plurality of nodes (103a-c), wherein, based on the linear order of the plurality of sub-spaces (203a-q) defined by the space-filling curve (205), each of the one or more first sub-spaces of the plurality of sub-spaces (203a-e) is arranged before each of the one or more second sub-spaces of the plurality of sub-spaces (203a-q).

8. The apparatus (101) of claim 7, wherein the apparatus (101) is configured to distribute the second subset of the plurality of data elements associated with the one or more second sub-spaces of the plurality of sub-spaces (203a-q) to the second node (103b) of the plurality of nodes (103a-c), after a data storage capacity of the first node (103a) of the plurality of nodes (103a-c) has been reached.

9. The apparatus (101) of any one of the preceding claims, wherein at least two of the plurality of sub-spaces (203a-q) have different sizes.

10. A distributed database system (100) comprising an apparatus (101) according to any one of the preceding claims and a plurality of nodes (103a-c), wherein each node of the plurality of nodes (103a-c) is configured to store a subset of the plurality of data elements.

11 . The distributed database system (100) of claim 10, wherein at least two of the plurality of nodes (103a-c) have different data storage capacities.

12. A method (300) for distributing a plurality of data elements to a plurality of nodes (103a-c) of a distributed database system (100), wherein the method (300) comprises: partitioning (301) a n-dimensional space (201) into a plurality of sub-spaces (203a-q), wherein each data element of the plurality of data elements is associated with a position in the n-dimensional space (201), where n > 2, each node of the plurality of nodes (103a-c) storing a subset of the plurality of data elements, and wherein the position of each data element in the n-dimensional space (201) is associated with one of the plurality of subspaces (203a-q); determining (303) a space-filling curve (205) extending through the plurality of sub-spaces (203a-q), wherein the space-filling curve (205) defines a linear order of the plurality of subspaces (203a-q); and distributing (305) data elements of the plurality of data elements to nodes of the plurality of nodes (103a-c) based on the linear order of the plurality of sub-spaces (203a-q) defined by the space-filling curve (205).

13. The method (300) of claim 12, wherein the step of partitioning (301) the n- dimensional space (201) into the plurality of sub-spaces (203a-q) comprises using a tree data structure (210) comprising a plurality of nodes (211 , 212) for partitioning the n- dimensional space (201) into the plurality of sub-spaces (203a-q), wherein each node (211) of the tree data structure (210) has 2ⁿ child nodes (211).

14. The method (300) of claim 12 or 13, wherein the step of distributing (305) comprises distributing a first subset of the plurality of data elements associated with one or more first sub-spaces of the plurality of sub-spaces (203a-q) to a first node (103a) of the plurality of nodes (103a-c) and a second subset of the plurality of data elements associated with one or more second sub-spaces of the plurality of sub-spaces (203a-q) to a second node (103b) of the plurality of nodes (103a-c), wherein, based on the linear order of the plurality of sub-spaces (203a-q) defined by the space-filling curve (205), each of the one or more first sub-spaces of the plurality of sub-spaces (203a-q) is arranged before each of the one or more second sub-spaces of the plurality of sub-spaces (203a- q).

15

15. A computer program product comprising a computer-readable storage medium for storing program code which causes a computer or a processor to perform the method (300) of any one of claims 12 to 14, when the program code is executed by the computer or the processor.

16