WO2020034818A1 - Partition merging method and database server - Google Patents

Partition merging method and database server Download PDF

Info

Publication number
WO2020034818A1
WO2020034818A1 PCT/CN2019/097559 CN2019097559W WO2020034818A1 WO 2020034818 A1 WO2020034818 A1 WO 2020034818A1 CN 2019097559 W CN2019097559 W CN 2019097559W WO 2020034818 A1 WO2020034818 A1 WO 2020034818A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage unit
data storage
partition
information
unit information
Prior art date
Application number
PCT/CN2019/097559
Other languages
French (fr)
Chinese (zh)
Inventor
谢晓芹
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201811147298.3A external-priority patent/CN110825794B/en
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP19849439.5A priority Critical patent/EP3825866A4/en
Publication of WO2020034818A1 publication Critical patent/WO2020034818A1/en
Priority to US17/171,706 priority patent/US11762881B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Definitions

  • the present application relates to the field of information technology, and more particularly, to a partition consolidation method and a database server.
  • partitions are usually used to manage the data entries of the data tables.
  • Each partition is provided by a certain database server, such as storage management.
  • the ownership relationship between the partition and the database server is dynamically specified by the management server of the distributed database system.
  • Distributed database systems require dynamic load balancing based on cluster size, load, or other policies. Therefore, in some cases, it is necessary to merge two adjacent partitions into a new partition.
  • the merge scheme of adjacent partitions needs to read out data entries in one of the partitions and then write it to the other partition.
  • the amount of data entries held in a partition is usually large. Therefore, the large number of data entries read and written during the partition merge process will increase the overhead significantly.
  • This application provides a partition consolidation method and a database server, which can reduce the amount of data read and write.
  • an embodiment of the present application provides a method for merging partitions in a distributed database system.
  • the distributed database system includes a first database server, a second database server, and a management server.
  • the first database server runs a first partition.
  • the second database server runs a second partition; the method includes: the first database server receives a merge instruction sent by a management server, the merge instruction being used to implement merging the first partition and the second partition into a third partition, wherein the The first partition and the second partition are adjacent partitions; the merge instruction includes the identifier of the current file of the first partition and the identifier of the current file of the second partition; the current file of the first partition records that the first partition is stored A file identifier of a file of metadata of a partition; a file identifier of a file storing metadata of the second partition is recorded in a current file of the second partition; and the first database server according to the identifier of the current file of the first partition Obtaining metadata of the first partition; the first database server according to the current
  • the above technical solution can directly merge the first partition and the second partition into the third partition according to the metadata of the first partition and the metadata of the second partition, without reading and merging the data in the first partition and the second partition to the merge. After partitioning, this can reduce the amount of data read and write, and improve the speed of partition consolidation.
  • the business write operations of the two partitions will be temporarily frozen until the partition merge is completed.
  • the embodiment of the present application also reduces the freeze time of the business write operations of the partitions.
  • the load of the first database server is lighter than the load of the second database server.
  • the method of combining the metadata of the first partition and the metadata of the second partition to generate metadata of the third partition is to implement combining the first partition and the second partition into the third partition, that is, to access the third partition.
  • the metadata of the partition can access the data entry of the first partition and the data entry of the second partition. That is, the metadata of the third partition is generated by merging the metadata of the first partition and the metadata of the second partition.
  • the metadata of the third partition corresponds to the data entry of the first partition and the data entry of the second partition.
  • the data entry of the second partition and the data entry of the second partition serve as the data entries of the third partition.
  • the metadata of the first partition includes data storage unit information of a secondary column cluster of the first partition
  • the metadata of the second partition includes the The data storage unit information of the second-level column cluster of the second partition
  • the first database server merges the metadata of the first partition and the metadata of the second partition to generate metadata of the third partition, specifically including: the first The database server merges the data storage unit information of the second-level column cluster of the first partition and the data storage unit information of the second-level column cluster of the second partition to generate the data storage unit information of the target second-level column cluster; according to the target second-level column
  • the data storage unit information of the cluster determines the data storage unit information of the secondary column cluster of the third partition.
  • the distributed database system is a database system using a long-structured merged-tree (LSM-tree) algorithm with a log structure, and the data storage unit information is Sorted String Table (SSTable) information.
  • the data storage unit is an SSTable.
  • the file storing the metadata of the first partition and the file storing the metadata of the second partition are both manifest files.
  • the first database server creates a current file for the third partition, and the current file of the third partition records metadata storing the third partition The file ID of the file.
  • the data storage unit information of the second-level column clusters of the first partition includes P 1 layer data storage unit information, where P 1 is greater than or equal to 2 Positive integer; the data storage unit information of the second-level column cluster of the second partition includes P 2 layer data storage unit information, where P 2 is a positive integer greater than or equal to 2; the data storage unit information of the target second-level column cluster includes Q layer data storage unit information, the Q layer data storage unit information includes data storage unit information of the second column cluster of the first partition and data storage unit information of the second column cluster of the second partition, wherein the Q layer One layer of data storage unit information of the data storage unit information includes the data storage unit information of the second-level column cluster of the first partition, and one layer of data storage unit information in the P 1 layer data storage unit information and the second partition of the second partition.
  • P 2 layer data hierarchy data information storing unit stage column data storage cluster includes the cell information in the cell information storage, Q-1 Q layer data storage unit of the information layer data storage unit
  • Column two hierarchy data P 1 layer data storage unit of the information storage layer of the data information of the information unit includes two columns of the first cluster storing data partition includes a cell information storage unit in the information or the second partition
  • the data storage unit information of the cluster includes a layer of data storage unit information in the P 2 layer data storage unit information, where Q is equal to P 1 + P 2 -1.
  • the 0th-level data storage unit information in the Q-layer data storage unit information includes the data storage unit information of the second-level column cluster of the first partition.
  • Level 0 data storage unit information and level 2 data storage unit information of the secondary column cluster data storage unit information of the second partition, and layer 2 ⁇ q-1 layer data storage units of the Q layer data storage unit information information includes two columns of the first cluster storing data partition.
  • the 0th-level data storage unit information in the Q-layer data storage unit information includes the data storage unit information of the second-level column cluster of the first partition. Tier 0 data storage unit information and Tier 0 data storage unit information of the secondary column cluster data storage unit information of the second partition.
  • the unit information includes the layer 2 data storage unit information in the layer 2 data storage unit information and the layer 1 data storage unit information in the layer P data storage unit information, where the value of P is P 1 and P 2 Minus one minus one.
  • the method further includes: determining, by the database server, the second column cluster of the third partition according to the data storage unit information of the target second column cluster.
  • the first-level data storage unit information is the data storage unit information of the data storage unit obtained by merging and rearranging the data storage unit corresponding to the 0th-level data storage unit information of the Q-layer data storage unit information.
  • Each layer of data storage unit information in the layer 2 to P-1 layer data storage unit information of the data storage unit information of the second-level column cluster of the second-level column cluster is the first layer of the Q-layer data storage unit information.
  • Data storage unit information corresponding to at least two layers of data storage unit information in layer 1 data storage unit information to layer Q-1 layer data storage unit information is merged Data storage units obtained after row information storage means.
  • a prefix of an entry key value in each data storage unit information in the data storage unit information of the secondary column cluster is a non-partitioned key value.
  • the metadata of the first partition includes data storage unit information of a secondary column cluster of the first partition
  • the metadata of the second partition includes the The data storage unit information of the second-level column cluster of the second partition
  • the first database server merges the metadata of the first partition and the metadata of the second partition to generate metadata of the third partition, specifically including: the first The database server merges the data storage unit information of the second-level column clusters of the first partition and the data storage unit information of the second-level column clusters of the second partition to generate the data storage unit information of the second-level column clusters of the third partition.
  • the manner of generating the data storage unit information of the second-level column clusters of the third partition reference may be made to the manner of generating the data storage unit information of the target second-level column clusters in the foregoing various implementations of the first aspect.
  • the metadata of the first partition further includes a set of write-ahead log information of the first partition
  • the metadata of the second partition further includes the second A partitioned write-ahead log information set
  • the method further comprising: the database server merging the write-ahead log information set of the first partition and the write-ahead log information set of the second partition to generate a write-ahead log information set of the third partition
  • the write-ahead log information set of the third partition includes the write-ahead log information in the write-ahead log information set of the first partition and the write-ahead log information in the write-ahead log information set of the second partition, where N is A positive integer greater than or equal to 2, N 1 and N 2 are positive integers greater than or equal to 1, and the sum of N 1 and N 2 is N.
  • the metadata of the first partition further includes data storage unit information of a main column cluster of the first partition
  • the metadata of the second partition further includes The data storage unit information of the main column cluster of the second partition
  • the method further comprises: the database server merging the data storage unit information of the main column cluster of the first partition and the data storage unit information of the main column cluster of the second partition Generate data storage unit information of the main column cluster of the third partition.
  • the data storage unit information of the main column cluster of the first partition includes K 1 layer data storage unit information, where K 1 is a positive value greater than or equal to 1 Integer, the data storage unit information of the main column cluster of the second partition includes K 2 layer data storage unit information, where K 2 is a positive integer greater than or equal to 1;
  • the data storage unit information of the main column cluster of the third partition includes K-layer data storage unit information, where the data storage unit information of the main column cluster of the third partition includes the K-th data storage unit information in the K-layer data storage unit information including the K-layer of the K 1- level data storage unit information k-th layer data storage layer data unit information K information data storage unit storing the cell information in the k-th layer data storage unit of the information layer and the data storage unit K 2 information, wherein K is the minimum value of K 1 and K 2 wherein the k-th layer data entry key according to any of the k-th layer data storage layer data K 1 cell information storage
  • a prefix of an entry key value in each data storage unit information in the data storage unit information of the main column cluster is a partition key value.
  • a database server in a second aspect, includes a unit for executing the first aspect or any possible implementation manner of the first aspect.
  • a database server in a third aspect, includes a processor and a communication interface.
  • the processor combines the communication interface to implement the first aspect or any possible implementation manner of the first aspect.
  • an embodiment of the present application provides a computer storage medium, and the database server runs a computer instruction to implement the first aspect or any possible implementation manner of the first aspect.
  • the present application provides a computer program product containing instructions, and when the computer instructions in the computer program product are run on a database server, the database server is caused to execute the first aspect or any one of the first aspect above. Implementation.
  • the present application provides a method for merging partitions in a distributed database system.
  • the distributed database system includes a first database server, a second database server, and a management server.
  • the first database server runs the first partition.
  • the two database servers run a second partition;
  • the method includes: the management server creates a third partition and determines to merge the first partition and the second partition into the third partition;
  • the management server sends a merge instruction to the first database server ,
  • the merge instruction is used to implement merging the first partition and the second partition into a third partition, wherein the first partition and the second partition are adjacent partitions;
  • the merge instruction includes a current file of the first partition An identifier and an identifier of a current file of the second partition; a file identifier of a file storing metadata of the first partition is recorded in the current file of the first partition; and a storage of the second is recorded in a current file of the second partition
  • the file ID of the partitioned metadata file is used to implement merging the first
  • the management server receives a response message sent by the first database, and the response message includes a current file identifier of the third partition.
  • the management server establishes a mapping relationship between the third partition and the first database server. After the management server creates the third partition, the partition routing table is updated.
  • the partition routing table includes the mapping relationship between the third partition and the first database server.
  • the specific implementation may be the mapping relationship between the identifier of the third partition and the address of the database server.
  • the management server determines, according to the load of the first database server and the second database server, that the first database server divides the first partition and the first database server. The two partitions are merged into the third partition; wherein the load of the first database server is lighter than the load of the second database server.
  • a management server includes a unit for executing the sixth aspect or any one of the possible implementation manners of the sixth aspect.
  • a management server includes a processor and a communication interface.
  • the processor implements the sixth aspect or any possible implementation manner of the sixth aspect in combination with the communication interface.
  • an embodiment of the present application provides a computer storage medium, and the management server runs a computer instruction to implement the sixth aspect or any possible implementation manner of the sixth aspect.
  • the present application provides a computer program product containing instructions.
  • the management server implements any of the foregoing sixth aspect or the sixth aspect. Implementation.
  • the present application provides a distributed database system, where the distributed database system includes a first database server, a second database server, and a management server; the first database server is configured to implement the first aspect or the first Any possible implementation manner of one aspect, and the management server is configured to implement the foregoing sixth aspect or any possible implementation manner of the sixth aspect.
  • an embodiment of the present application provides a method for merging partitions in a distributed database system.
  • the distributed database system includes a first database server, a second database server, and a management server.
  • the first database server runs a first partition.
  • the second database server runs a second partition; the method includes: the first database server receives a merge instruction sent by a management server, the merge instruction is used to implement merging the first partition and the second partition into a third partition, where The first partition and the second partition are adjacent partitions; the first database server obtains metadata of the first partition and metadata of the second partition according to the first partition, and combines metadata of the first partition and The metadata of the second partition generates metadata of the third partition.
  • the merge instruction includes an identifier of the current file of the first partition and an identifier of the current file of the second partition;
  • a file identifier of a file storing metadata of the first partition is recorded in the file;
  • a file identifier of a file storing metadata of the second partition is recorded in a current file of the second partition;
  • the first database server obtains the first
  • the metadata of a partition and the metadata of the second partition specifically include: obtaining the metadata of the first partition according to the identifier of the current file of the first partition, and obtaining the second metadata according to the identifier of the current file of the second partition Partition metadata.
  • Figure 1 is a schematic diagram of a partition.
  • FIG. 2 is a schematic diagram of the KVDB architecture.
  • FIG. 3 is a schematic flowchart of a method for processing a partition according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a layer merging process.
  • FIG. 5 is a structural block diagram of a database server according to an embodiment of the present application.
  • FIG. 6 is a structural block diagram of a database server according to an embodiment of the present invention.
  • FIG. 7 is a structural block diagram of a management server according to an embodiment of the present application.
  • FIG. 8 is a structural block diagram of a management server according to an embodiment of the present invention.
  • At least one means one or more, and “multiple” means two or more.
  • “And / or” describes the association relationship of related objects, and indicates that there can be three kinds of relationships, for example, A and / or B can represent: the case where A exists alone, A and B exist simultaneously, and B alone exists, where A, B can be singular or plural.
  • the character “/” generally indicates that the related objects are an "or” relationship.
  • “At least one or more of the following” or similar expressions refers to any combination of these items, including any combination of single or plural items.
  • At least one item (a), a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, and c may be single or multiple.
  • the words “first”, “second” and the like do not limit the number and execution order.
  • KVDB is a database that uses a key-value store. The data in the database is organized, indexed, and stored in the form of key values. Common KVDB includes: RockesDB, LevelDB, etc.
  • Partitioning key value In partitioning technology, the value of a certain column field of the data entry in the data table, or the sequential combination of several column field values is usually used to determine the partition. This value is called the partition key. . Each data entry can uniquely determine the partition in which the data entry is located according to its partition key value.
  • Table 1 shows the five column fields ⁇ bn, on, ver, crt, dlc ⁇ of the three data entries.
  • the values of the five column fields of the first data entry are ⁇ A, d, 1, 1501232852986, 20 ⁇ .
  • the partition key value may be a value of one column field of the above five column fields, or a sequential combination of the values of several column fields.
  • the partition key value is a combined value of the column field ⁇ bn, on ⁇ . It can be seen that in the above three data entries, the column fields ⁇ bn, on ⁇ of the first data entry and the second data entry are both ⁇ A, d ⁇ . In other words, the partition key values for the first and second data entries are both ⁇ A, d ⁇ .
  • the column field ⁇ bn, on ⁇ of the third data entry among the above three data entries is ⁇ B, x ⁇ . In other words, the partition key value for the third data entry is ⁇ B, x ⁇ .
  • the value of the column field On is represented by a letter in Table 1.
  • the value of the column field On may be a specific value, such as 0109, 0208, and the like.
  • FIG. 1 is a schematic diagram of multiple partitions with a partition key value of ⁇ bn, on ⁇ .
  • the first data entry and the second data entry are the data entries in partition 1
  • the third data entry is the partition.
  • Data entry in 2 the data table can be split by using a range partition method. Other data table splitting methods may also cause the data entries in the partition to be arranged in the natural order of the partition key values. Therefore, the embodiment of the present application does not limit the method of splitting the data table, as long as the method of splitting the data table enables the data table to be split, the data entries in the partition are arranged in the natural order of the partition key values.
  • KVDB manages the data entries in the data table with partitions as granularity and is managed by different database servers. That is, the database server provides access to partitioned data entries. .
  • the partition routing table can include the following information: the partition ID and the address of the database server to which it belongs, and it can also include the partition root index file ID, the left boundary of the partition, the right boundary of the partition, and the partition status.
  • the address of the database server may be an Internet Protocol (IP) address, or an identifier of the database server, etc. This is not limited in this embodiment of the present invention, and the partition routing table is also referred to as a partitioned view.
  • Figure 1 is a schematic diagram of a partition.
  • Table 2 is a partition routing table based on multiple partitions shown in FIG.
  • the partition root index file identifier identifies the file name of the inventory file for each partition.
  • the partition status indicates the current status of the partition, such as normal service status, split status, merge status, isolation status, and so on. Normal shown in Table 2 indicates that the current status of the partition is normal service status.
  • the partition key value of the data entry is ⁇ B, x ⁇ .
  • the partition routing table shown in Table 2 it can be determined that the partition to which the data entry belongs is partition 2, and the home database server address is 8.11.234.2:27021.
  • the partition merges referred to in the embodiments of the present application are all merges of adjacent partitions, for example, merge partition 1 and partition 2, merge partition 3 and partition 4, and merge partition 4 and partition 5.
  • the primary index entry includes the partition key value.
  • the main index entry can also include values from several other column fields of the data entry. Partition key values and values from one or more column fields from multiple other column fields can make up the primary index entry.
  • the main index entry may include five column fields ⁇ bn, on, ver, crt, dlc ⁇ as shown in Table 1, where the key value of the main index entry corresponds to the column field ⁇ bn, on, ver ⁇ . ⁇ bn, on ⁇ can be called the prefix of the main index entry. It can be seen that ⁇ bn, on ⁇ is the partition key value. Therefore, the primary index entry is the primary index entry prefixed by the partition key value.
  • An entry key is a unique index into a data entry and can include multiple column fields for that data entry.
  • Secondary index entries can also be called secondary index entries.
  • the secondary index entry consists of the secondary index column field and the column field corresponding to the key value of the primary index entry.
  • the format of the secondary index entry can be the secondary index column field + the primary index entry key value column field, or the secondary index column field and the primary index entry key value column field.
  • the secondary index entry is a secondary index entry that is prefixed with a non-partitioned key value.
  • the format of the secondary index entry can also be the column field corresponding to the key value of the primary index entry + the secondary index column field.
  • the secondary index entry is a secondary index entry prefixed by the partition key value.
  • the secondary index column fields included in different secondary index entries in the multiple secondary index entries are different. For example, suppose each data entry includes two secondary index entries, one of which can be dlc and the other can be crt.
  • Each partition may include a primary index entry set or one or more secondary index entry sets.
  • the main index entry set consists of all the main index entries under the partition.
  • the primary index entries in the primary index entry set are sorted by the primary index entry key value.
  • each secondary index entry set consists of secondary index entries with the same secondary index entry column domain. Assume that each data entry includes two secondary index entries, one of which can be dlc and the other can be crt. This partition can include two secondary index entry sets, one of which includes all secondary index entry key values of the secondary index column domain of the partition under crt, and the other secondary index entry set includes the partition.
  • the secondary index column field is the key value of all secondary index entries of the dlc.
  • a main index entry set can be called a column cluster, and the column cluster can be called a main index column cluster.
  • the secondary index entries included in a secondary index entry set also have the same multiple column fields. Therefore, a secondary index entry can also be used as a column cluster, which can be called a secondary index column cluster.
  • LSM-tree log-structured merged-tree
  • WAL Write-ahead logging
  • MemTable corresponds to a WAL file, which is an ordered organization structure of the contents of the WAL file in memory. MemTable provides a structure for writing, deleting, and reading key-value data (data entries). MemTable internally stores data items in order by entry key value.
  • Immutable MemTable When the memory space occupied by MemTable reaches an upper limit, it is necessary to dump the data items stored in memory in order according to the entry key value to the Sorted String Table (SSTable), and correspondingly WAL files no longer write new data entries. At this time, the MemTable will be frozen into an immutable MemTable (Immutable MemTable), and a new MemTable will be generated at the same time. The new data entry is recorded in the new WAL file and the newly generated MemTable. The data entries in the immutable MemTable are immutable. In other words, the data items in the immutable MemTable can only be read, not written or deleted.
  • SSTable Sorted String Table
  • SSTable is the unit where KVDB data is stored. The entry key values in each SSTable are ordered. After each immutable MemTable is merged, an SSTable is obtained. The process of merging the immutable MemTable to obtain the SSTable can become a minor compaction.
  • KVDB divides the storage of the SSTable file into different levels, level 0 to level n, where n is a positive integer greater than or equal to 1.
  • Level 0 includes multiple SSTable files. Among the multiple SSTables, one SStable is obtained by sub-merging an immutable MemTable. In other words, sub-merging multiple immutable MemTables to obtain corresponding SSTables. The entry keys of different SSTables in the multiple SSTables will overlap. After meeting certain conditions, the SSTable in level 0 and SSTable in level 1 are merged, and the SSTable obtained after the merge is the SSTable stored in level 1.
  • Each level in level 1 to level n maintains the specified number of SSTables, and the entry key values between all SSTables in each level do not overlap.
  • the SSTable in that level can be selected as the Level layer corresponding to the next level (that is, the value of level plus 1). For example, the next level of level 1 is level 2. The next level of level 2 is level 3, and so on). After merging, the selected SSTable is deleted.
  • the merge processing of the SSTable between the two levels can be referred to as a major compaction.
  • the manifest file is used to record WAL file information and SSTable information. More specifically, the WAL file information recorded in the manifest file includes an identification of the WAL file and a time serial number of the WAL file.
  • the SSTable information recorded in the manifest file includes the column cluster to which the SSTable belongs, the level to which the SSTtable belongs, the identity of the SSTable, the time sequence number of the SSTable, the size of the SSTable, the minimum entry key value of the SSTable, and the maximum entry of the SSTable. One or more of the key-values.
  • the KVDB will be described below with reference to FIG. 2.
  • the KVDB 200 shown in FIG. 2 includes a management server 210, a database server 221, a database server 222, and a database server 223.
  • the KVDB 200 shown in FIG. 2 may further include a storage server 231, a storage server 232, and a storage server 233.
  • the database server 221, the database server 222, and the database server 223 may be collectively referred to as a distributed database service cluster.
  • the storage server 231, the storage server 232, and the storage server 233 may provide a distributed shared storage pool for KVDB.
  • KVDB can also be provided with storage resources by centralized storage, for example, storage arrays can provide storage resources for KVDB.
  • the management server 210 is responsible for specifying the ownership relationship between the partition and the database server.
  • the partition routing table is also maintained by the management server 210.
  • the above operations of freezing a MemTable to an immutable MemTable, a secondary merge, and a primary merge can all be performed by a database server.
  • a database server is responsible for the storage management of data items in a partition. Therefore, taking the storage server as an example, the WAL file, SSTable, and manifest file of the corresponding partition generated by the database server can be persisted in the storage server.
  • the database server can access the WAL file, SSTable, and manifest file of the corresponding partition stored in the storage server. MemTable, immutable MemTable are stored in the memory of the database server.
  • FIG. 3 is a schematic flowchart of a method for processing a partition according to an embodiment of the present application.
  • the method shown in FIG. 3 can be applied to a KVDB of a merge tree based on a log structure.
  • partition 1 is served by database server 1
  • partition 2 is served by database server 2.
  • the management server merges adjacent first partitions and second partitions according to a balancing policy, marks partition 1 and partition 2 as merged states in the partition routing table, and persists them.
  • the balancing strategy can be the number of partition data entries, access popularity, etc., or the load of the database server running the partition.
  • the management server notifies the first partition of the database server 1 that it is ready to merge.
  • the management server notifies the second partition of the database server 2 that it is ready to merge.
  • the database server 2 stops the merge task and sets the second partition as read-only, suspends the write request, and sends a successful response to the management server.
  • the database server 2 stops performing the merge task. If the database server 2 is performing a merge task, the database server 2 stops performing the merge task after completing the ongoing merge task. In other words, the database server 2 does not make changes to the content saved in the second partition after receiving the notification of preparation for merging sent by the management server 1.
  • the database server 1 stops the merge task and sets the first partition as read-only, suspends the write request, and sends a successful response to the management server.
  • Step 305 is similar to step 304, and it is unnecessary to repeat it here.
  • the management server creates a third partition in the distributed database system, and marks the third partition as an initial state in the partition routing table.
  • the management server creates a third partition in the distributed database system.
  • the specific implementation may be to generate the identifier of the third partition and add the identifier of the third partition to the partition routing table.
  • the management server may determine that the third partition is run by the lightly loaded database server 1 based on the load of the database server 1 and the database server 2, for example, the load of the database server 1 is small.
  • the third partition is run on the database server 1, and the database server 1 provides services for the third partition.
  • the management server may also instruct other database servers (such as database server 2 or database server 3) to merge the first partition and the second partition into the third partition, that is, in other database servers Run on the third partition.
  • a database server 1 and a database server 2 are randomly selected to run the third partition, or a database is selected between the database server 1 and the database server 2 through a specific algorithm.
  • the server runs the third partition.
  • a database server may perform the third partition based on a partition identifier (such as a partition number) managed by the database server to determine the total number of partitions.
  • the management server determines the database server running the third partition according to the foregoing implementation manner.
  • the management server sends a merge instruction to the database server 1, and the merge instruction is used to implement merging the first partition and the second partition into a third partition.
  • the method of combining the metadata of the first partition and the metadata of the second partition to generate metadata of the third partition is to implement combining the first partition and the second partition into the third partition, that is, to access the third partition.
  • the metadata of the partition can access the data entry of the first partition and the data entry of the second partition.
  • the merge instruction includes an identifier of a current file of the first partition and an identifier of a current file of the second partition. In the current file of the first partition, a file identifier of a file storing metadata of the first partition is recorded. A file identifier of a file storing metadata of the second partition is recorded in the current file of the second partition.
  • the manifest file in the KVDB of the LSM-tree algorithm is used to store partition metadata, so the file identifier of the file in which the metadata of the storage partition is recorded in the current file may be LSM-tree The identification of the manifest file in the algorithm's KVDB.
  • the database server 1 creates a database corresponding to the third partition.
  • the database server 1 creates a database corresponding to the third partition.
  • One implementation is to start a new database process or database instance, and another implementation may be to use the currently running database process or database instance as the database of the third partition.
  • the database server 1 obtains metadata of the first partition and metadata of the second partition.
  • the database server 1 reads the current file of the first partition to obtain metadata of the first partition; reads the current file of the second partition to obtain metadata of the second partition; and combines the metadata of the first partition with the The metadata of the second partition is loaded into the memory of the database server 1.
  • the database server 1 obtains the address of the database server 2 running the second partition from the management server, the database server 1 obtains the metadata of the second partition from the database server 2, or obtains the metadata of the second partition from the database server 2.
  • the database server 1 obtains the metadata of the second partition according to the information of the metadata of the second partition.
  • the information of the metadata of the second partition may be a file identifier of a file storing the metadata of the second partition.
  • the database server 1 merges metadata of the first partition with metadata of the second partition to generate metadata of the third partition.
  • the database server 1 generates metadata of the third partition. Further, the database server 1 creates a current file for the third partition, and the current file of the third partition records a file identifier of a file storing metadata of the third partition.
  • the metadata of the first partition includes data storage unit information of a secondary column cluster of the first partition.
  • the metadata of the second partition includes data storage unit information of the secondary column clusters of the second partition.
  • the data storage unit information is SSTable information
  • the data storage unit is SSTable.
  • the index (including the primary index and the secondary index) prefixed by the partition key value and the secondary index entry prefixed by the non-partition key value are stored in different column clusters, respectively.
  • the data storage unit information of the primary column cluster refers to the related information organized and stored by the index (including the primary index and the secondary index) entries prefixed by the partition key value.
  • the data storage unit information of the secondary column cluster refers to related information organized and stored by the secondary index entries whose prefixes are non-partitioned key values.
  • the database server 1 may combine data storage unit information of the secondary column clusters of the first partition with data storage unit information of the secondary column clusters of the second partition to generate data storage unit information of the target secondary column clusters.
  • the data storage unit information of the target secondary column cluster may be used as the data storage unit information of the secondary column cluster of the third partition.
  • the data server 1 may determine the data storage unit information of the secondary column cluster of the third partition according to the data storage unit information of the target column cluster.
  • the data storage unit information of the second column cluster of the first partition includes data storage unit information of the P 1 layer, where P 1 is a positive integer greater than or equal to 2; the data storage unit information of the second column cluster of the second partition Including the data storage unit information of the P 2 layer, where P 2 is a positive integer greater than or equal to 2.
  • the data storage unit information of the target secondary column cluster includes the data storage unit information of the Q layer, and the data storage unit information of the Q layer
  • the data of the Q-1 layer of the data storage unit information of the Q layer information storage means storing one data unit information includes the number of data of one layer of the data storage layer P 1 in the cell information storage unit of the information or data layer P 2 is stored in the cell information Information storing means,. 1 + wherein Q is equal to P 2 -1 P.
  • the layer 0 data storage unit information in the layer Q data storage unit information includes the layer 0 data storage unit information of the data storage unit information of the second-level column clusters of the first partition and The layer 0 data storage unit information of the data storage unit information of the second-level column cluster of the second partition, and the layer 2 ⁇ q-1 layer data storage unit information in the data storage unit information of the Q layer includes the data storage of the P 1 layer.
  • the q layer data storage unit information of the P layer data storage unit information in the unit information, and the 2 ⁇ q layer data storage unit information in the Q layer data unit storage information includes the P layer data storage in the P 2 layer data storage unit information.
  • P-layer data of the layer data P 1 in the cell information storage unit storing information may be the first layer of the P 1 0 layer data storage unit to the first information layer P-1 data information storage unit .
  • the data storage layer P 2 P layer data unit information storing means that the information may be stored P 2 layer data unit information layer 1 to layer data P stored cell information.
  • layer data of the P 1 P layer data unit information storing means storing information may be a P layer of the data storage unit in the information layer of the reciprocal of the first to the penultimate layer data storage unit of the information P .
  • the data storage layer P 2 P layer data unit information storing means that the information may be the data storage layer P 2 in the cell information to the inverse of the first layer P layer penultimate data cell information is stored.
  • the data storage layer P. 1 P layer data unit information storing means information may be an intermediate layer data of the P P unit. 1 layer data stored in the storage cell information message.
  • the data storage layer P 1 P layer data unit information storing unit of the information layer of the P-2 to P + 1 layer 1 layer data storage means data information storing cell information.
  • the data storage layer P 2 P layer data unit information storing means information may be an intermediate layer data of the P 2 P layer data stored in the cell information storage unit information.
  • the data storage unit information is SSTable information
  • the data storage unit is SSTable.
  • a first partition manifest file includes information SSTable layer P 1 column two clusters, the index entry information SSTable layer P 1 is non-partition key value for the two index entries prefix.
  • the second partition manifest file information comprises P 2 layer SSTable two columns cluster, the index entry information SSTable P 2 layer in a non-partition key value for the two index entries prefix.
  • the target manifest file includes the Q-level SSTable information of the second-level column clusters, and the index entries of the Q-level SSTable information are the second-level index entries prefixed by the non-partition key value.
  • the target manifest file may be a manifest file of the third partition.
  • the target manifest file may be used to determine the manifest file of the third partition.
  • the manifest file of the first partition includes the 2-level SSTable information of the secondary column cluster, which is the SSTable information of the 0th level to the SStable information of the 1st level, respectively.
  • the list file of the second partition includes the second-level SSTable information of the second-level column cluster, which is the SSTable information of the 0th layer to the SStable information of the first layer.
  • the target list file includes the three-level SSTable information of the second-level column clusters, which are the tier-0 SSTable information to the second-tier SStable information.
  • the layer 0 SSTable information in the target manifest file includes the layer 0 SSTable information in the manifest file of the first partition and the layer 0 SSTable information in the manifest file of the second partition.
  • the layer 1 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the first partition.
  • the layer 2 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the second partition.
  • Table 3 shows the two-layer data storage unit information included in the data storage unit information of the second-level column cluster of the first partition.
  • Each layer of data storage unit information in the two-layer data storage unit information includes two data storage unit information.
  • Table 3 is a table obtained by taking KVDB using the LSM-tree algorithm as an example. Therefore, the corresponding data storage unit information can also be called SSTable information.
  • Table 4 shows the two-layer data storage unit information included in the data storage unit information of the second-level column cluster of the second partition.
  • Each layer of data storage unit information in the two-layer data storage unit information includes two data storage unit information.
  • Table 4 is a table obtained by taking KVDB using the LSM-tree algorithm as an example. Therefore, the corresponding data storage unit information can also be called SSTable information.
  • Table 5 shows the three-layer data storage unit information included in the data storage unit information of the target second-level column cluster.
  • the 0-level data storage unit information in the three-layer data storage unit information includes 4 data storage unit information.
  • the layer and second layer data storage unit information includes two data storage unit information, respectively.
  • Table 5 is a table obtained by taking KVDB using the LSM-tree algorithm as an example. Therefore, the corresponding data storage unit information can also be called SSTable information. It can be seen that the SSTable information of level 0 shown in Table 5 includes the SSTable information of level 0 shown in Table 3 and Table 4, and the SSTable information of level 1 shown in Table 5 includes the SSTable information of level 1 shown in Table 3. .
  • the SSTable information of level 2 shown in Table 5 includes the SSTable information of level 2 shown in Table 4.
  • the SSTable information of level 0 in the target manifest file after merging includes the SSTable information of level 0 in the manifest file of the first partition and the SSTable information of level 0 in the manifest file of the second partition;
  • the SSTable information of level 1 in the target manifest file includes the SSTable information of level 1 in the manifest file of the first partition;
  • the SSTable information of level 2 of the target manifest file after the merge includes the SSTable of level 1 in the manifest file of the second partition information.
  • the 2 ⁇ q-1 layer data storage unit information in the Q layer data storage unit information includes the first layer of the P layer data storage unit information in the second-level column cluster of the second partition.
  • the 2 ⁇ q layer data storage unit information in the 2 ⁇ P layer data unit storage information includes the q layer data storage of the P layer data storage unit information of the second-level column cluster of the first partition. Unit information.
  • the manifest file of the first partition includes layer 0 SSTable information and layer 1 SStable information.
  • the manifest file of the second partition includes layer 0 SSTable information and layer 1 SStable information.
  • the target list file includes the three-level SSTable information of the second-level column clusters, which are the tier-0 SSTable information to the second-tier SStable information.
  • the layer 0 SSTable information in the target manifest file includes the layer 0 SSTable information in the manifest file of the first partition and the layer 0 SSTable information in the manifest file of the second partition.
  • the layer 1 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the second partition.
  • the layer 2 SSTable file information in the target manifest file includes the layer 1 SSTable information in the manifest file of the first partition.
  • the SSTable information from the first partition and the SSTable information from the second partition are superimposed on each other.
  • the SSTable information from the first partition and the SStable information from the second partition may be stacked.
  • the layer 0 data storage unit information in the layer Q data storage unit information includes the layer 0 data storage unit information of the first partition and the layer 0 data of the second partition Storage unit information.
  • the first layer data storage unit information to the P-1 layer data storage unit information in the Q layer data storage unit information are the first layer data storage in the P layer data storage unit information of the first partition.
  • the unit information to the P-1 layer data storage unit information; the Q layer data storage unit information from the P layer data storage unit information to the Q-1 layer data storage unit information are the P layer data storage of the second partition, respectively Among the unit information, layer 1 data stores unit information to layer P-1 layer data stores unit information.
  • the manifest file of the first partition includes the 4-level SSTable information of the second-level column cluster, which is the SSTable information of the 0th level to the SStable information of the 3rd level.
  • the manifest file of the second partition includes the 4-level SSTable information of the second-level column clusters, which are the SSTable information of the 0th level to the SStable information of the 3rd level.
  • the target list file includes the 7-level SSTable information of the secondary column cluster, which is the SSTable information of the 0th layer to the SStable information of the 6th layer.
  • the layer 0 SSTable information in the target manifest file includes the layer 0 SSTable information in the manifest file of the first partition and the layer 0 SSTable information in the manifest file of the second partition.
  • the layer 1 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the first partition.
  • the layer 2 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the first partition.
  • the layer 3 SSTable information in the target manifest file includes the layer 3 SSTable information in the manifest file of the first partition.
  • the layer 4 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the second partition.
  • the layer 5 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the second partition.
  • the layer 6 SSTable information in the target manifest file includes the layer 3 SSTable information in the manifest file of the second partition.
  • the layer 0 data storage unit information in the layer Q data storage unit information includes the layer 0 data storage unit information of the first partition and the layer 0 data of the second partition.
  • Storage unit information, the first layer data storage unit information to the P-1 layer data storage unit information in the Q layer data storage unit information are the first layer data in the second layer P layer data storage unit information
  • the storage unit information to the P-1 layer data storage unit information; the P-layer data storage unit information to the Q-1 layer data storage unit information in the Q-layer data storage unit information are the P-layer data of the first partition, respectively Among the storage unit information, layer 1 data storage unit information to layer P-1 layer data storage unit information.
  • the manifest file of the first partition includes layer 0 SSTable information to layer 3 SStable information.
  • the manifest file of the second partition includes layer 0 SSTable information to layer 3 SStable information.
  • the target list file includes the 7-level SSTable information of the secondary column cluster, which is the SSTable information of the 0th layer to the SStable information of the 6th layer.
  • the layer 0 SSTable information in the target manifest file includes the layer 0 SSTable information in the manifest file of the first partition and the layer 0 SSTable information in the manifest file of the second partition.
  • the layer 1 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the second partition.
  • the layer 2 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the second partition.
  • the layer 3 SSTable information in the target manifest file includes the layer 3 SSTable information in the manifest file of the second partition.
  • the layer 4 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the first partition.
  • the layer 5 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the first partition.
  • the layer 6 SSTable information in the target manifest file includes layer 3 SSTable information in the manifest file of the first partition.
  • the tier 0 data storage unit information is merged by directly merging the data storage unit information of the two-level column clusters of the two partitions in the 0th layer.
  • the data storage unit information is merged into the data storage unit information of the same second-level column cluster.
  • the merging method is to directly add the layer 0 data storage unit information in the second partition directly to the layer 0 data storage unit information in the first partition.
  • This merge method can be called append merge.
  • the data storage unit information other than the layer 0 data storage unit information is merged in a superimposed manner.
  • the following merge method can be Overlay merge.
  • the Q layer data storage unit 2 ⁇ P layer data storage unit information may include the P. 1 layer data store P layer data unit information storing cell information and P 2 layer data stored in the cell information P-layer data stores unit information.
  • the Q-2 ⁇ P layer data storage unit information of the Q layer data storage unit information may include P ′ layer data storage unit information, where the value of P ′ is max (P 1 , P 2 ) -min (P 1 , P 2 ), where max (P 1 , P 2 ) represents the maximum value of P 1 and P 2 , and min (P 1 , P 2 ) represents the minimum value of P 1 and P 2 .
  • P ' is equal to 1 and the maximum value of P 2 P 1 and subtracting the minimum value P of the P 2.
  • that is, the value of P' is the absolute value of the difference between P 1 and P 2 .
  • the manifest file of the first partition includes the 5-level SSTable information of the second-level column cluster, which is the SSTable information of the 0th level to the SStable information of the 4th level.
  • the list file of the second partition includes the three-level SSTable information of the second-level column clusters, which are the tier-0 SSTable information to the second-tier SStable information, respectively.
  • the target list file includes the 7-level SSTable information of the secondary column cluster, which is the SSTable information of the 0th layer to the SStable information of the 6th layer.
  • the layer 0 SSTable file information in the target manifest file includes the layer 0 SSTable information in the manifest file of the first partition and the layer 0 SSTable information in the manifest file of the second partition.
  • the layer 1 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the first partition.
  • the layer 2 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the second partition.
  • the layer 3 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the first partition.
  • the layer 4 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the second partition.
  • the layer 5 SSTable information in the target manifest file includes layer 3 SSTable information in the manifest file of the first partition.
  • the layer 6 SSTable information in the target manifest file includes the layer 4 SSTable information in the manifest file of the first partition.
  • the P 'layer data storage unit information may be the last P' layer data storage unit information in the Q layer data storage unit information.
  • the P'-layer data storage unit information may also be the pre-P'-layer data storage unit information in the Q-layer data storage unit information.
  • the manifest file of the first partition includes the 5-level SSTable information of the second-level column cluster, which is the SSTable information of the 0th level to the SStable information of the 4th level.
  • the list file of the second partition includes the three-level SSTable information of the second-level column clusters, which are the tier-0 SSTable information to the second-tier SStable information, respectively.
  • the target list file includes the 7-level SSTable information of the secondary column cluster, which is the SSTable information of the 0th layer to the SStable information of the 6th layer.
  • the layer 0 SSTable information in the target manifest file includes the layer 0 SSTable information in the manifest file of the first partition and the layer 0 SSTable information in the manifest file of the second partition.
  • the layer 1 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the first partition.
  • the layer 2 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the first partition.
  • the layer 3 SSTable information in the target manifest file includes the layer 3 SSTable information in the manifest file of the first partition.
  • the layer 4 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the second partition.
  • the layer 5 SSTable information in the target manifest file includes the layer 4 SSTable information in the manifest file of the first partition.
  • the layer 6 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the second partition.
  • the metadata of the first partition further includes data storage unit information of a main column cluster of the first partition.
  • the metadata of the second partition includes data storage unit information of a main column cluster of the second partition.
  • the database server 1 may combine data storage unit information of the main column cluster of the first partition with data storage unit information of the main column cluster of the second partition to generate data storage unit information of the main column cluster of the third partition.
  • the first column of the primary partition cluster data storage means K 1 information includes cell information storage layer data, wherein K 1 is greater than or equal to a positive integer.
  • the data storage unit information of the main column cluster of the second partition includes K 2 layer data storage unit information, where K 2 is a positive integer greater than or equal to 1.
  • the metadata of the third partition includes data storage unit information of the main column cluster of the third partition, and the data storage unit information of the main column cluster of the third partition includes K-layer data storage unit information, where the main column of the third partition
  • the k-th data storage unit information of the data storage unit information of the cluster includes the k-th data storage unit information in the K-level data storage unit information of the K 1 -level data storage unit information and the K of the K 2- level data storage unit information.
  • the k-th data storage unit information in the layer data storage unit information where K is the minimum value of K 1 and K 2 , and any one of the k-th data storage unit information in the K 1- level data storage unit information is stored.
  • the entry key value of the unit information does not overlap with the entry key value of any data storage unit information in the k-th data storage unit information of the K 2 layer data storage unit information.
  • M k1 represents the number of data storage unit information included in the k-th data storage unit information in the K-layer data storage unit information in the data storage unit information of the main column cluster of the first partition.
  • M k2 indicates the number of data storage unit information included in the k-th data storage unit information in the K-layer data storage unit information in the data storage unit information of the main column cluster of the second partition.
  • the number of data storage unit information included in the k-th data storage unit information in the data storage unit information of the main column cluster of the third partition is the sum of M k1 and M k2 .
  • level 0 in the data storage unit information of the main column cluster of the third partition includes 4 data storage unit information.
  • the SSTable information stored in the manifest file corresponds to the K-layer SSTable information.
  • the following describes the combination of the data storage unit information of the main column cluster with reference to Table 6, Table 7, and Table 8.
  • Table 6 shows that the data storage unit information of the main column cluster of the first partition includes two layers of data storage unit information, and each layer of data storage unit information in the two-layer data storage unit information includes 2 data storage unit information.
  • Table 6 is a table obtained by taking KVDB using the LSM-tree algorithm as an example. Therefore, the corresponding data storage unit information can also be called SSTable information.
  • Table 7 shows that the data storage unit information of the main column cluster of the second partition includes two layers of data storage unit information, and each layer of data storage unit information in the two-layer data storage unit information includes 2 data storage unit information.
  • Table 7 is a table obtained by taking KVDB using the LSM-tree algorithm as an example. Therefore, the corresponding data storage unit information can also be called SSTable information.
  • Table 8 shows that the data storage unit information of the main column cluster of the third partition includes two layers of data storage unit information, and each layer of data storage unit information in the two-layer data storage unit information includes 4 data storage unit information.
  • Table 8 is a table obtained by taking KVDB using the LSM-tree algorithm as an example. Therefore, the corresponding data storage unit information can also be called SSTable information. It can be seen that the SSTable information of level 0 shown in Table 8 includes the SSTable information of level 0 shown in Table 6 and Table 7, and the SSTable information of level 1 shown in Table 8 includes level 1 shown in Table 6 and Table 7. SSTable information.
  • the data storage unit information of the main column cluster of the first partition and the data storage unit information of the main column cluster of the second partition are both an ordered sequence prefixed by the partition key. Because the partition is divided based on the range partition, the main column clusters of the two partitions store all the key values between the layers in the unit information without overlapping the ranges. Therefore, the data storage unit information of each layer of the second partition can be directly added to the data storage unit information of the same layer of the first partition, thereby forming the data storage unit information of the main column cluster of a partition.
  • append merge the following method of merging information of a layer of data storage unit in one partition directly to data storage unit information of the same layer in another partition.
  • the value of K 1 and the value of K 2 may be different.
  • the third data storage unit includes a partition information storage unit in addition to K information data outer layer may further include K 'layer data storage unit, wherein K' is The value is max (K 1 , K 2 ) -min (K 1 , K 2 ), where max (K 1 , K 2 ) represents the maximum value of K 1 and K 2 , and min (K 1 , K 2 ) represents The minimum of K 1 and K 2 .
  • K ' is
  • the manifest file of the first partition includes the three-layer SSTable information of the main column cluster, which are the tier-0 SSTable information to the tier-2 SStable information, respectively.
  • the manifest file of the second partition includes the two-layer SSTable information of the main column cluster, which are the tier-0 SSTable information and the tier-1 SStable information, respectively.
  • the target list file includes 3 layers of SSTable information of the main column cluster, which are layer 0 SSTable information to layer 2 SStable information.
  • the level 0 SSTable information in the target manifest file includes the level 0 SSTable information of the main column cluster in the manifest file of the first partition and the level 0 SSTable information of the main column cluster in the manifest file of the second partition.
  • the first-level SSTable information in the target manifest file includes the first-level SSTable information of the main column cluster in the first partition's manifest file and the first-level SSTable information of the main column cluster in the second partition's manifest file.
  • the layer 2 SSTable information in the target manifest file includes the layer 2 SSTable information of the main column cluster in the manifest file of the first partition.
  • the order of the data storage unit information of the primary / secondary column clusters from the same partition in the combined data storage unit information of the primary / secondary column clusters does not change.
  • the SSTable at level 1 of the data storage unit information of the main column cluster of the first partition and the data storage of the main column cluster of the second partition The order of the SSTable of the level 1 of the unit information is: SSTable f1.1.1, SSTable f1.1.2, SSTable f2.1.1, SSTable f2.1.2.
  • the merged data storage unit information from the main column cluster of another partition may be located before the data storage unit information of the main column cluster of the same partition.
  • the data storage unit information of the main column cluster of the third partition may also be shown in Table 9.
  • the data storage unit information of the main column cluster of the third partition As shown in Table 9, in the data storage unit information of the main column cluster of the third partition, the data storage unit information of the main column cluster of the first partition from level 1 SSTable and the data storage unit of the main column cluster of the second partition
  • the order of the SSTable for the level of information is: SSTable f1.1.1, SSTable f2.1.1, SSTable f1.1.2, SSTable f2.1.2. It can be seen that although SSTable f2.1.1 is located between SSTable f1.1.1 and SSTable f1.1.2, SSTable f1.1.1 is still before SSTable f1.1.2.
  • the sequence of the data storage unit information of the secondary column cluster is similar to the sequence of the data storage unit information of the main column cluster, and it is unnecessary to repeat them here.
  • the metadata of the first partition further includes a set of write-ahead log information of the first partition.
  • the metadata of the second partition includes a set of write-ahead log information of the second partition.
  • the database server 1 may combine the write-ahead log information set of the first partition and the write-ahead log information set of the second partition to generate a write-ahead log information set of the third partition.
  • the third pre-write log partition information set includes N write-ahead log information
  • the first pre-write log partition information set includes the N 1 write-ahead log information of the N write-ahead log information
  • the second partition The set of write-ahead log information includes N 2 pieces of write-ahead log information in the N file instructions, where N is a positive integer greater than or equal to 2, N 1 and N 2 are positive integers greater than or equal to 1 and N 1 The sum with N 2 is N.
  • the database server 1 only compares the pre-write log information included in the first pre-write log information set in the metadata of the first partition with the second pre-write log information of the second partition metadata.
  • the write-ahead log information included in the set is combined into a third write-ahead log information set within the third metadata information.
  • the database server 1 does not read the N 1 write-ahead logs indicated by the N 1 write-ahead log information in the first write-ahead log information set from the first partition, and writes the N 1 write-ahead logs Into the merged partition.
  • the write-ahead log information may include an identifier of the write-ahead log and a time serial number of the write-ahead log. Therefore, the size of the write-ahead log information is usually in the order of KB. The size of the write-ahead log is usually in the order of MB. Therefore, compared to reading and writing the write-ahead log, reading and writing the read-ahead log information can reduce the amount of read and write data, thereby reducing the overhead of the distributed database system.
  • the write-ahead log information can be the WAL file information stored in the manifest file.
  • the WAL file information includes the identification of the WAL file.
  • the time serial number of the WAL file may also be included.
  • WAL file to a serial number identification information includes time and WAL WAL document files as an example
  • the first pre-write log partition information set may comprise identifying each of the N 1 WAL files and the N 1 WAL file time WAL document sequence number
  • write-ahead log information of the second set may include a time partition identifier SEQ ID N 2 th WAL WAL files for each file, and the N 2 th WAL file.
  • Write-ahead log information of the third partition comprises a set of time series of numbers identifying the N 1 WAL WAL files for each file, and the files in the N 1 WAL, and the third partition of the pre-write log information set may further include the time sequence number identifies the file and for each WAL WAL th N 2 N 2 th file WAL file.
  • the order of the first write-ahead log information and the second write-ahead log information in the first write-ahead log information set is the same as the order of the first write-ahead log information and the second write-ahead log information.
  • the sequence in the third write-ahead log information set is the same.
  • the first write-ahead log information and the second write-ahead log information are any two write-ahead log information of N 1 write-ahead logs included in the write-ahead log information set of the first partition.
  • the write-ahead log information set of the first partition if the first write-ahead log information precedes the second write-ahead log information, then in the write-ahead log information set of the third partition, the The first write-ahead log information is still before the second write-ahead log information.
  • the order of the third write-ahead log information and the fourth write-ahead log information in the write-ahead log information set of the second partition is the same as that of the third write-ahead log information and the fourth write-ahead log information in the first
  • the sequence of the three-part write-ahead log information set is the same.
  • the third write-ahead log information and the fourth write-ahead log information are any two write-ahead log information of the N 2 write-ahead logs included in the write-ahead log information set of the second partition.
  • the write-ahead log information set of the second partition if the third write-ahead log information precedes the fourth write-ahead log information, then in the write-ahead log information set of the third partition, the The third write-ahead log information is still before the fourth write-ahead log information.
  • Table 10 shows the combination of WAL file information in the two manifest files.
  • the identification of the WAL file represents the WAL file information
  • the sequence of the identification of the WAL file identifies the time serial number of the WAL file.
  • the order of the identification of the WAL file shown in Table 10 is arranged according to the time serial number of the WAL file.
  • the time serial number of the WAL file on the right is lower than the time serial number of the WAL file on the left.
  • the manifest file 1 of the first partition includes two WAL file information, and the identifiers of the two WAL files are: f1.w.8 and f1.w.9, and f1.w.8's
  • the time serial number is lower than the time serial number of f1.w.9.
  • the manifest file 2 of the second partition includes two WAL file information, the identifiers of the two WAL files are: f2.w.7 and f1.w.11, and the time serial number of f2.w.7 is lower than f1. w.11 time serial number.
  • the manifest file 3 includes 4 WAL file information.
  • the four WAL file information comes from manifest file 1 and manifest file 2, respectively.
  • the sequence of the two WAL file information from the manifest file 1 has not changed, and f1.w.8 precedes f1.w.9.
  • the sequence of the two WAL file information from manifest file 2 has not changed, f2.w.7 precedes f1.w.11.
  • the database server 1 sends response information to the management server, where the response information includes the current file identifier of the third partition.
  • the management server updates the partition routing table.
  • the management server marks the first partition and the second partition as deleted, and records the left and right boundaries of the third partition and the current file identifier, and the status of the third partition is modified to be normal.
  • the partition routing table includes the mapping relationship between the third partition and the database server 1.
  • the specific implementation may be the mapping relationship between the identifier of the third partition and the address of the database server.
  • the management server sends a partition completion message to the database server 1 and the database server 2.
  • the partition completion message is used to instruct the first partition and the second partition to complete the merge.
  • the metadata of the third partition includes metadata of the first partition and metadata of the second partition. Therefore, taking the KVDB of the LSM-tree algorithm as an example, because the storage server stores the WAL files, SSTables, and manifest files of the corresponding partitions, the partition consolidation scheme provided by the embodiment of this application, the database server 1 can access the storage according to the metadata of the third partition.
  • the WAL files, SSTables, and manifest files of the corresponding partitions of the first and second partitions stored in the server can be implemented without reading and writing data entries of the first and second partitions.
  • the first partition and the second partition are merged, thereby reducing the amount of data read and written during the partition merge and improving the speed of partition merge.
  • the business write operations of the two partitions will be temporarily frozen until the partition merge is completed.
  • the embodiment of the present application also reduces the freeze time of the business write operations of the partitions.
  • the database server 1 closes the database of the first partition, deletes the current file of the first partition and the metadata of the first partition, sends a successful response to the management server, and returns the pending write request to the rerouted error response.
  • the client After receiving the reroute error response message, the client sends a request to the management server to update the partition routing table, and then sends the request to the new home database server.
  • the database server 2 closes the database of the second partition, deletes the current file of the second partition and the metadata of the second partition, sends a successful response to the management server, and returns the pending write request to the rerouted error response.
  • the client After receiving the reroute error response message, the client sends a request to the management server to update the partition routing table, and then sends the request to the new home database server.
  • the management server updates the partition routing table, and deletes records of the first partition and the second partition.
  • the database server 1 starts a layer merging task in the background to reduce the number of layers of metadata of the third partition. Until the merge is completed, the third partition will not participate in the new partition merge.
  • the data server 1 may determine the data storage unit information of the secondary column cluster of the third partition according to the data storage unit information of the target column cluster.
  • the data storage unit information of the second-level column clusters of the third partition includes P-layer data storage unit information, and the first-level data storage of the P-level data unit storage information of the data storage unit information of the second-level column clusters of the third partition.
  • the unit information is the data storage unit information of the data storage unit obtained by merging and rearranging the data storage unit corresponding to the 0th-level data storage unit information of the Q-level data storage unit information.
  • Each layer of data storage unit information in the P-layer data unit storage information in the P-layer data unit storage information in the data storage unit information is the first-layer data storage unit information in the Q-layer data storage unit information.
  • the data storage unit information of the data storage unit obtained by merging and rearranging the data storage units corresponding to at least two layers of data storage unit information in the Q-1 layer data storage unit information.
  • the process in which the data server 1 determines the data storage unit information of the second-level column clusters of the third partition according to the data storage unit information of the target column cluster may be referred to as a level compaction process.
  • PT1 layer 0 indicates the data storage unit information of the 0th layer in the data storage unit information of the second column cluster of the first partition
  • PT2 layer 0 indicates the data storage unit information of the second column of the second partition.
  • the layer 0 data stores unit information.
  • PT3 layer 0 represents the layer 0 data storage unit information in the data storage unit information of the secondary column cluster of the third partition, and so on.
  • Layer 0 indicates that the data storage unit information of the second column cluster of the third partition is obtained according to the data storage unit information of the second column cluster of the first partition and the data storage unit information of the second column cluster of the second partition.
  • the layer 0 data storage unit information in the process, and the layer 1 indicates that the third partition is obtained according to the data storage unit information of the second column cluster of the first partition and the data storage unit information of the second column cluster of the second partition.
  • the first-level data stores unit information, and so on. real: 30MB, which means that the actual amount of data in the current level is 30MB, max: 400MB, which means that the maximum data amount set for this layer is 400MB.
  • the layer 0 data storage unit information of the data storage unit information of the second-level column clusters of the first partition and the layer 0 data storage unit information of the data storage unit information of the second-level column clusters of the second partition are added and merged to obtain the target second level.
  • the layer 0 data of the column cluster stores unit information.
  • the layer 1 to layer 3 data storage unit information of the second column cluster data storage unit information of the first partition and the layer 1 to layer 3 data storage unit information of the data storage unit information of the second column cluster of the second partition pass The data storage unit information of layers 1 to 6 of the data storage unit of the target secondary column cluster is superimposed and merged.
  • the layer 0 data storage unit information of the data storage unit information of the target secondary column cluster includes the layer 0 data storage unit information of the data storage unit information of the secondary column cluster of the first partition and the second partition.
  • the layer 0 data of the second-level column cluster stores unit information.
  • the layer 1 data storage unit information of the data storage unit information of the target secondary column cluster includes the layer 1 data storage unit information of the data storage unit information of the secondary column cluster of the first partition.
  • the layer 2 data storage unit information of the data storage unit information of the target secondary column cluster includes the layer 1 data storage unit information of the data storage unit information of the secondary column cluster of the second partition.
  • the layer 3 data storage unit information of the data storage unit information of the target secondary column cluster includes the layer 2 data storage unit information of the data storage unit information of the secondary column cluster of the first partition.
  • the layer 4 data storage unit information of the data storage unit information of the target secondary column cluster includes the layer 2 data storage unit information of the data storage unit information of the secondary column cluster of the second partition.
  • the layer 5 data storage unit information of the data storage unit information of the target secondary column cluster includes the layer 3 data storage unit information of the data storage unit information of the secondary column cluster of the first partition.
  • the layer 6 data storage unit information of the data storage unit information of the target secondary column cluster includes the layer 3 data storage unit information of the data storage unit information of the secondary column cluster of the second partition.
  • the layer 1 data storage unit information referred to below refers to the layer 1 data storage unit information of the target secondary column cluster data storage unit information.
  • the layer 2 data storage unit information refers to the target The layer 2 data of the secondary column cluster stores the unit information, and so on.
  • the KVDB using the LSM-tree algorithm is also described as an example.
  • the data storage unit information is SSTable information.
  • the merge sort process is as follows:
  • the size of the second space is the sum of SSTable1 and SSTable2. This second space is used to store the combined SSTable content.
  • step 4 Repeat step 4 until a pointer reaches one of the last data entries in SSTable 1 and SSTable 2;
  • SSTable 4 ⁇ Key12-Value, Key17-Value, Key19-Value ⁇ .
  • the layer merge process includes the following steps:
  • the first step is to merge and sort all the SSTables indicated by the layer 1 SSTable information and all the SSTables indicated by the layer 2 sstable information, and then rewrite them into the new SSTable, and record the SSTable information corresponding to the newly added SSTable in the layer 2
  • the original SSTable information of layer 1 and layer 2 and the SSTable tag indicated by the original SSTable information are deleted.
  • the total read and write data amount is 200 MB ⁇ 2.
  • Table 11 is an illustration of layer 1 SSTable information and layer 2 SSTable information.
  • time series numbers are meaningless for layers 1 and higher, the time series numbers are not included in Tables 11 and 12.
  • the SSTables corresponding to the layer 1 SSTable information shown in Table 11 are f1a.1.1 and f1a.1.2, and the SSTables corresponding to the layer 2 SSTable information are f2a.1.1 and f2a.1.2.
  • the manifest file only includes newly created SSTable information, and the original SSTable information of Layer 1 and Layer 2 has been deleted.
  • the SSTables indicated by the SSTable information of the original layer 1 and layer 2 are also deleted.
  • the second step after merging and sorting the SSTable at layer 0, rewrite the new SSTable, and record the SSTable information corresponding to the newly added SSTable in the manifest file, and the layer identifier in the SSTable information corresponding to the new SSTable. Is 1.
  • the specific merging and sorting process is similar to step 1, and it is unnecessary to repeat them here. Assume that all SSTables in layer 0 shown in Figure 4 need to read and write, and the total read and write data volume is 30MB ⁇ 4.
  • the third step is to merge and sort all the SSTables indicated by the layer 3SSTable information and all the SSTables indicated by the layer 4SSTable information, and rewrite them into the new SSTable, and record the SSTable information corresponding to the SSTable added by the layer 4 in the manifest file.
  • the original SSTable information of layer 3 and layer 4 and the SSTable tag indicated by the original SSTable information are deleted.
  • Layer 3 adds a read-only flag and no longer writes a new SSTable. Assume that all SSTables indicated by the layer 3SSTable information and all SSTables indicated by the layer 4SSTable information shown in Figure 4 need to read and write, and the total read and write data amount is 2GB ⁇ 2.
  • the fourth step is to merge and sort all the SSTables indicated by the layer 5SSTable information and all the SSTables indicated by the layer 6SSTable information, and then rewrite them into the new SSTable, and record the SSTable information corresponding to the SSTable added by the layer 6 in the manifest file
  • the original SSTable information of layers 5 and 6 and the SSTable tag indicated by the original SSTable information are deleted.
  • Layer 5 adds a read-only flag and no longer writes a new SSTable. Assume that all the SSTables indicated by the layer 5SSTable information and all the SSTables indicated by the layer 6SSTable information shown in FIG. 4 need to read and write, and the total read and write data amount is 2GB ⁇ 2.
  • the fifth step is to delete layers 3 and 5.
  • the specific operation is as follows: the layer ID of the SSTable information with the layer ID of 4 in the manifest file is changed to 3, and the layer ID of the SSTable information with the layer ID of 6 in the manifest file is changed to 4, so as to delete the original layer 3 and the original layer 5.
  • all the SSTables indicated by the SSTable information in the new layer 2 (that is, layer 2 obtained in the first step) and all the SSTables indicated by the SSTable information in the new layer 3 (that is, layer 2 obtained in the fifth step) are sorted. Then, re-write the new SSTable, and record the SSTable information corresponding to the new SSTable in the new layer 3 in the manifest file. At the same time, the original SSTable information of the new layer 2 and new layer 3 and the original SSTable information The indicated SSTable tag is deleted.
  • the new layer 2 is deleted, the layer identifier of the SSTable information with the layer identifier 3 in the manifest file is modified to 2, and the layer identifier of the SSTable information with the layer identifier 4 in the manifest file is modified to 3.
  • all the SSTables indicated by the SSTable information in the new layer 2 (that is, layer 2 obtained in the first step) and the SSTable information in the new layer 3 (that is, layer 2 obtained in the fifth step) need to be read and written as shown in Figure 4.
  • the total amount of read and write data is 400MB + 4GB.
  • the expressions of the first step, the second step, the third step, the fourth step, the fifth step, and the sixth step are only for the convenience of distinguishing different steps, and are not a limitation on the order of the steps. From the description of the above six steps, it can be seen that the order of the first step, the third step, and the fourth step can be changed.
  • the number of layers of the data storage unit can be reduced, which can facilitate querying the data entries stored in the database.
  • the database server 1 may not perform layer merge processing.
  • the distributed database system provided in the embodiments of the present application may be used to store metadata of a distributed object storage system, metadata of a distributed file system, or metadata of a distributed block storage system.
  • FIG. 5 is a structural block diagram of a database server according to an embodiment of the present application.
  • the database server 500 includes a communication unit 501 and a processing unit 502.
  • the communication unit 501 is configured to receive a merge instruction sent by a management server, where the merge instruction is used to implement merging a first partition and a second partition into a third partition, wherein the first partition and the second partition are adjacent partitions;
  • the merge instruction includes the identifier of the current file of the first partition and the identifier of the current file of the second partition;
  • the current file of the first partition records the file identifier of the file storing the metadata of the first partition;
  • the second A file identifier of a file storing metadata of the second partition is recorded in a current file of the partition, the first partition runs on the database server, and the second partition runs on another database server.
  • the processing unit 502 is configured to:
  • the communication unit 501 is configured to receive a merge instruction sent by the management server, and the merge instruction is used to implement merging the first partition and the second partition into a third partition.
  • the processing unit 502 is configured to: obtain metadata of the first partition and metadata of the second partition according to the first partition, and merge the first partition and the second partition; The metadata and the metadata of the second partition generate metadata of the third partition.
  • the database server 500 shown in FIG. 5 can perform various steps performed by the database server 1 shown in FIG. 3. For specific functions and beneficial effects of each unit in the database server 500 shown in FIG. 5, reference may be made to the method shown in FIG. 3, and it is unnecessary to repeat them here.
  • the processing unit 502 may be implemented by a processor, and the communication unit 501 may be implemented by a network interface card. In other embodiments, the communication unit 501 may also be implemented by a bus adapter. The specific implementation of the communication unit 501 may support one or more access protocols, for example, an Ethernet message protocol, an Infiniband protocol, and the like, which are not limited in the embodiment of the present invention. In another implementation, the communication unit 501 and the processing unit 502 may also be implemented by software, or both software and hardware.
  • FIG. 6 is a structural block diagram of a database server according to an embodiment of the present invention.
  • the database server 600 includes a processor 601 and a communication interface 602.
  • the processor 601 may be used to process data, control a database server, execute a software program, process data of the software program, and the like.
  • the communication interface 602 is mainly used for communication, for example, communication with a management server in a distributed database system.
  • a circuit having a transmitting and receiving function may be regarded as a communication interface 602 of a database server, and a processor having a processing function may be regarded as a processor 601 of the database server 600.
  • the communication interface 602 may be implemented by a network interface card. In other embodiments, the communication interface 602 may also be implemented by a bus adapter. The specific implementation of the communication interface 602 may support one or more access protocols, for example, an Ethernet message protocol, an Infiniband protocol, and the like, which are not limited in the embodiment of the present invention.
  • the processing unit may also be called a processor, a processing single board, a processing module, a processing device, and the like.
  • the processor 601 and the communication interface 602 communicate with each other through an internal connection path, and transfer control and / or data signals
  • the method disclosed in the foregoing embodiment of the present invention may be applied to the processor 601, or implemented by the processor 601.
  • the processor 601 may be an integrated circuit chip and has a signal processing capability.
  • each step of the above method may be completed by using an integrated logic circuit of hardware in the processor 601 or an instruction in the form of software.
  • the processor described in the embodiments of the present application may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), and a ready-made programmable gate array (field programmable gate array). , FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in combination with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or may be performed by using a combination of hardware and software modules in the decoding processor.
  • Software modules can be located in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory or electrically erasable programmable memory, registers, etc.
  • Storage media The storage medium is located in the memory, and the processor reads the instructions in the memory and completes the steps of the above method in combination with its hardware.
  • the processor 601 may be a combination of a central processing unit (CPU) and a memory, where the memory may store a method for executing the method executed by the database server 1 in the method shown in FIG. 3 Instructions.
  • the CPU may execute the instructions stored in the memory in combination with other hardware (for example, the communication interface 602) to complete the steps performed by the database server 1 in the method shown in FIG.
  • An embodiment of the present application further provides a chip, and the chip includes a transceiver unit and a processing unit.
  • the transceiver unit may be an input / output circuit or a communication interface;
  • the processing unit is a processor or a microprocessor or an integrated circuit integrated on the chip.
  • the chip can execute the method executed by the database server 1 in the foregoing method embodiment.
  • the embodiment of the present application further provides a computer-readable storage medium having computer instructions stored thereon.
  • the computer instructions When the computer instructions are executed, the method executed by the database server 1 in the foregoing method embodiment is executed.
  • the embodiment of the present application further provides a computer program product containing computer instructions, and when the computer instructions are executed, the method performed by the database server 1 in the foregoing method embodiment is executed.
  • FIG. 7 is a structural block diagram of a management server according to an embodiment of the present application.
  • the management server 700 includes a communication unit 701 and a processing unit 702.
  • the processing unit 702 is configured to create a third partition, and determine to merge the first partition and the second partition into the third partition.
  • the communication unit 701 is configured to send a merge instruction to the first database server, where the merge instruction is used to implement merging the first partition and the second partition into the third partition, where the first partition and the second partition are Adjacent partitions; the merge instruction includes the identifier of the current file of the first partition and the identifier of the current file of the second partition; the current file of the first partition records a file storing a file storing metadata of the first partition Identification; the current file of the second partition records a file identification of a file storing metadata of the second partition, the first partition runs on the first database server, and the second partition runs on the second database server.
  • the processing unit 702 is configured to create a third partition, and determine to merge the first partition and the second partition into the third partition.
  • the communication unit 701 is configured to send a merge instruction to the first database server, where the merge instruction is used to implement merging the first partition and the second partition into the third partition, where the first partition and the second partition are Adjacent partitions; the first partition runs on the first database server, and the second partition runs on the second database server.
  • the management server 700 shown in FIG. 7 may perform various steps performed by the management server shown in FIG. 3. For specific functions and beneficial effects of each unit in the management server 700 shown in FIG. 7, reference may be made to the method shown in FIG. 3, and it is unnecessary to repeat them here.
  • the processing unit 702 may be implemented by a processor, and the communication unit 701 may be implemented by a network interface card. In other embodiments, the communication unit 701 may also be implemented by a bus adapter.
  • the specific implementation of the communication unit 701 may support one or more access protocols, for example, an Ethernet message protocol, an Infiniband protocol, and the like, which are not limited in the embodiment of the present invention.
  • FIG. 8 is a structural block diagram of a management server according to an embodiment of the present invention.
  • the management server 800 includes a processor 801 and a communication interface 802.
  • the processor 801 may be used to process data, control the management server 800, execute software programs, process data of the software programs, and the like.
  • a circuit having a transmitting / receiving function can be regarded as the communication interface 802 of the database server, and a processor having a processing function can be regarded as the processor 801 of the database server.
  • a specific description of the management server 800 reference may be made to the description of the database server 600, and details are not described herein again.
  • An embodiment of the present application further provides a computer-readable storage medium having computer instructions stored thereon.
  • the computer instructions When the computer instructions are executed, the method executed by the management server in the foregoing method embodiment is executed.
  • An embodiment of the present application further provides a computer program product including computer instructions, and when the computer instructions are executed, the method performed by the management server in the foregoing method embodiment is executed.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of this application is essentially a part that contributes to the existing technology or a part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several computer instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
  • the foregoing storage media include: U disks, mobile hard disks, read-only memories (ROM), random access memories (RAM), magnetic disks or compact discs, and other media that can store computer instructions .

Abstract

A partition merging method and a database server. The method comprises: a first database server obtains the metadata of a first partition according to the identifier of the current file of the first partition, obtains the metadata of a second partition according to the identifier of the current file of the second partition, and generates the metadata of a third partition according to the metadata of the first partition and that of the second partition. The aforementioned technical solution can reduce the amount of data that is read and written, and improve a partition merging speed.

Description

分区合并方法和数据库服务器Partition merge method and database server
本申请要求于2018年8月14日提交中国国家知识产权局、申请号为201810919475.9、申请名称为“基于日志结构的合并树的分布式数据库系统分区处理方法”的中国专利申请和2018年9月29日提交中国国家知识产权局、申请号为201811147298.3、申请名称为“分区合并方法和数据库服务器”的优先权,其全部内容通过引用结合在本申请中。This application requires Chinese patent applications filed on August 14, 2018 with the State Intellectual Property Office of China, application number 201810919475.9, and application name "Partition Processing Method for Distributed Database System Based on Log Structure-Merge Tree" and September 2018 The priority filed with the State Intellectual Property Office of the People's Republic of China on September 29, application number 201811147298.3, and application name "Division Merging Method and Database Server" is incorporated herein by reference in its entirety.
技术领域Technical field
本申请涉及信息技术领域,更具体地,涉及分区合并方法和数据库服务器。The present application relates to the field of information technology, and more particularly, to a partition consolidation method and a database server.
背景技术Background technique
在键值数据库(key-value database,KVDB)这一类分布式数据库系统中,为了适应数据条目数量的不断增长的需求,通常采用分区(partition)管理数据表的数据条目。每个分区由某1个数据库服务器来提供服务,如存储管理。分区与数据库服务器间的归属关系由分布式数据库系统的管理服务器动态指定。In a distributed database system such as a key-value database (KVDB), in order to meet the growing demand for data entries, partitions are usually used to manage the data entries of the data tables. Each partition is provided by a certain database server, such as storage management. The ownership relationship between the partition and the database server is dynamically specified by the management server of the distributed database system.
分布式数据库系统要求能够按照集群规模、负载、或其他策略动态进行负载均衡。因此,在一些情况下,需要将两个相邻的分区合并为一个新的分区。Distributed database systems require dynamic load balancing based on cluster size, load, or other policies. Therefore, in some cases, it is necessary to merge two adjacent partitions into a new partition.
目前相邻分区的合并方案需要将其中一个分区内的数据条目遍历读出,然后写入另一个分区中。一个分区中保存的数据条目的量通常很大。因此,分区合并过程中的大量数据条目读写会使得开销大增。At present, the merge scheme of adjacent partitions needs to read out data entries in one of the partitions and then write it to the other partition. The amount of data entries held in a partition is usually large. Therefore, the large number of data entries read and written during the partition merge process will increase the overhead significantly.
发明内容Summary of the Invention
本申请提供一种分区合并方法和数据库服务器,可以减少数据的读写量。This application provides a partition consolidation method and a database server, which can reduce the amount of data read and write.
第一方面,本申请实施例提供一种分布式数据库系统中分区合并方法,该分布式数据库系统包括第一数据库服务器、第二数据库服务器和管理服务器,该第一数据库服务器运行第一分区,该第二数据库服务器运行第二分区;该方法包括:该第一数据库服务器接收管理服务器发送的合并指令,该合并指令用于实现将该第一分区和该第二分区合并为第三分区,其中该第一分区和该第二分区为相邻分区;该合并指令包含该第一分区的当前文件的标识和该第二分区的当前文件的标识;该第一分区的当前文件中记录有存储该第一分区的元数据的文件的文件标识;该第二分区的当前文件中记录有存储该第二分区的元数据的文件的文件标识;该第一数据库服务器根据该第一分区的当前文件的标识获取该第一分区的元数据;该第一数据库服务器根据该第二分区的当前文件的标识获取该第二分区的元数据;该第一数据库服务器合并该第一分区的元数据和该第二分区的元数据生成该第三分区的元数据。上述技术方案可以直接根据第一分区的元数据和第二分区的元数据将第一分区和第二分区合并为第三分区,无需将第一分区和第二分区内的数据遍历读出到合并后的分区,这样可以减少数据读写量,提高分区合并的速度。另外,两个分区合并过程中,这两个分区的业务写入操作会被暂时冻结直到分区合并完成,本申请实施例也减少了分区的业务写入操作冻结时间。可选的,第一数据库服务器的负载轻于第二数据库服务器的负载。 本申请实施例中,通过合并第一分区的元数据和第二分区的元数据生成第三分区的元数据这种方式实现将第一分区和第二分区合并为第三分区,即访问第三分区的元数据即可访问第一分区的数据条目和第二分区的数据条目。也就是说通过合并第一分区的元数据和第二分区的元数据生成第三分区的元数据,第三分区的元数据对应第一分区的数据条目和第二分区的数据条目,第一分区的数据条目和第二分区的数据条目作为第三分区的数据条目。In a first aspect, an embodiment of the present application provides a method for merging partitions in a distributed database system. The distributed database system includes a first database server, a second database server, and a management server. The first database server runs a first partition. The second database server runs a second partition; the method includes: the first database server receives a merge instruction sent by a management server, the merge instruction being used to implement merging the first partition and the second partition into a third partition, wherein the The first partition and the second partition are adjacent partitions; the merge instruction includes the identifier of the current file of the first partition and the identifier of the current file of the second partition; the current file of the first partition records that the first partition is stored A file identifier of a file of metadata of a partition; a file identifier of a file storing metadata of the second partition is recorded in a current file of the second partition; and the first database server according to the identifier of the current file of the first partition Obtaining metadata of the first partition; the first database server according to the current status of the second partition Identification documents for metadata second partition; the database server combining the first metadata and the second metadata partition first partition metadata is generated in the third partition. The above technical solution can directly merge the first partition and the second partition into the third partition according to the metadata of the first partition and the metadata of the second partition, without reading and merging the data in the first partition and the second partition to the merge. After partitioning, this can reduce the amount of data read and write, and improve the speed of partition consolidation. In addition, during the process of merging the two partitions, the business write operations of the two partitions will be temporarily frozen until the partition merge is completed. The embodiment of the present application also reduces the freeze time of the business write operations of the partitions. Optionally, the load of the first database server is lighter than the load of the second database server. In the embodiment of the present application, the method of combining the metadata of the first partition and the metadata of the second partition to generate metadata of the third partition is to implement combining the first partition and the second partition into the third partition, that is, to access the third partition. The metadata of the partition can access the data entry of the first partition and the data entry of the second partition. That is, the metadata of the third partition is generated by merging the metadata of the first partition and the metadata of the second partition. The metadata of the third partition corresponds to the data entry of the first partition and the data entry of the second partition. The data entry of the second partition and the data entry of the second partition serve as the data entries of the third partition.
结合第一方面,在第一方面的一种可能的实现方式中,该第一分区的元数据包括该第一分区的二级列簇的数据存放单元信息,该第二分区的元数据包括该第二分区的二级列簇的数据存放单元信息,该第一数据库服务器合并该第一分区的元数据和该第二分区的元数据生成该第三分区的元数据,具体包括:该第一数据库服务器合并该第一分区的二级列簇的数据存放单元信息和该第二分区的二级列簇的数据存放单元信息生成目标二级列簇的数据存放单元信息;根据该目标二级列簇的数据存放单元信息确定该第三分区的二级列簇的数据存放单元信息。上述技术方案中,在对第一分区和第二分区进行合并时,可以只对第一分区的二级列簇的数据存放单元信息和第二分区的二级列簇的数据存放单元信息进行合并,而无需复制(也称为读写)相应的数据存放单元中存储的数据条目,这样可以减少数据的读写量,从而减少因对分区被冻结的时间,提高分区合并效率,减少业务写入拥塞。可选的,分布式数据库系统是以采用日志结构的合并树(long structured merge-tree,LSM-tree)算法的数据库系统,数据存放单元信息是排序字符串表(Sorted String Table,SSTable)信息,数据存放单元是SSTable,存储第一分区的元数据的文件和存储第二分区的元数据的文件均是清单文件(manifest file)。With reference to the first aspect, in a possible implementation manner of the first aspect, the metadata of the first partition includes data storage unit information of a secondary column cluster of the first partition, and the metadata of the second partition includes the The data storage unit information of the second-level column cluster of the second partition, and the first database server merges the metadata of the first partition and the metadata of the second partition to generate metadata of the third partition, specifically including: the first The database server merges the data storage unit information of the second-level column cluster of the first partition and the data storage unit information of the second-level column cluster of the second partition to generate the data storage unit information of the target second-level column cluster; according to the target second-level column The data storage unit information of the cluster determines the data storage unit information of the secondary column cluster of the third partition. In the above technical solution, when the first partition and the second partition are merged, only the data storage unit information of the second-level column clusters of the first partition and the data storage unit information of the second-level column clusters of the second partition may be merged. Without copying (also known as reading and writing) the data entries stored in the corresponding data storage unit, which can reduce the amount of data read and write, thereby reducing the time that the partition is frozen, improve the efficiency of partition consolidation, and reduce business writes congestion. Optionally, the distributed database system is a database system using a long-structured merged-tree (LSM-tree) algorithm with a log structure, and the data storage unit information is Sorted String Table (SSTable) information. The data storage unit is an SSTable. The file storing the metadata of the first partition and the file storing the metadata of the second partition are both manifest files.
结合第一方面,在第一方面的一种可能的实现方式中,该第一数据库服务器为该第三分区创建当前文件,该第三分区的当前文件中记录有存储该第三分区的元数据的文件的文件标识。With reference to the first aspect, in a possible implementation manner of the first aspect, the first database server creates a current file for the third partition, and the current file of the third partition records metadata storing the third partition The file ID of the file.
结合第一方面,在第一方面的一种可能的实现方式中,该第一分区的二级列簇的数据存放单元信息包括P 1层数据存放单元信息,其中P 1为大于或等于2的正整数;该第二分区的二级列簇的数据存放单元信息包括P 2层数据存放单元信息,其中P 2为大于或等于2的正整数;该目标二级列簇的数据存放单元信息包括Q层数据存放单元信息,该Q层数据存放单元信息包括该第一分区的二级列簇的数据存放单元信息和该第二分区的二级列簇的数据存放单元信息,其中,该Q层数据存放单元信息的一层数据存放单元信息包括该第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中的一层数据存放单元信息和该第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中的一层数据存放单元信息,该Q层数据存放单元信息的Q-1层数据存放单元信息中的一层数据存放单元信息包括该第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中的一层数据存放单元信息或者该第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中的一层数据存放单元信息,其中Q等于P 1+P 2-1。 With reference to the first aspect, in a possible implementation manner of the first aspect, the data storage unit information of the second-level column clusters of the first partition includes P 1 layer data storage unit information, where P 1 is greater than or equal to 2 Positive integer; the data storage unit information of the second-level column cluster of the second partition includes P 2 layer data storage unit information, where P 2 is a positive integer greater than or equal to 2; the data storage unit information of the target second-level column cluster includes Q layer data storage unit information, the Q layer data storage unit information includes data storage unit information of the second column cluster of the first partition and data storage unit information of the second column cluster of the second partition, wherein the Q layer One layer of data storage unit information of the data storage unit information includes the data storage unit information of the second-level column cluster of the first partition, and one layer of data storage unit information in the P 1 layer data storage unit information and the second partition of the second partition. P 2 layer data hierarchy data information storing unit stage column data storage cluster includes the cell information in the cell information storage, Q-1 Q layer data storage unit of the information layer data storage unit Column two hierarchy data P 1 layer data storage unit of the information storage layer of the data information of the information unit includes two columns of the first cluster storing data partition includes a cell information storage unit in the information or the second partition The data storage unit information of the cluster includes a layer of data storage unit information in the P 2 layer data storage unit information, where Q is equal to P 1 + P 2 -1.
结合第一方面,在第一方面的一种可能的实现方式中,该Q层数据存放单元信息中的第0层数据存放单元信息包括该第一分区的二级列簇的数据存放单元信息的第0层数据存放单元信息和该第二分区的二级列簇的数据存放单元信息的第0层数据存放单元信息,该Q层数据存放单元信息中的第2×q-1层数据存放单元信息包括该第一分区的二级列簇的数 据存放单元信息包括的P 1层数据存放单元信息中P层数据存放单元信息的第q层数据存放单元信息,该Q层数据单元存放信息中的第2×q层数据存放单元信息包括该第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中P层数据存放单元信息的第q层数据存放单元信息,q=1,……,P-1,其中P的取值为P 1与P 2中的最小值减1。 With reference to the first aspect, in a possible implementation manner of the first aspect, the 0th-level data storage unit information in the Q-layer data storage unit information includes the data storage unit information of the second-level column cluster of the first partition. Level 0 data storage unit information and level 2 data storage unit information of the secondary column cluster data storage unit information of the second partition, and layer 2 × q-1 layer data storage units of the Q layer data storage unit information information includes two columns of the first cluster storing data partition. 1 P layer data unit storage information comprises a first information layer data unit q P layer data storage unit of the information storage cell information, the Q layer data units stored information q-th layer data storage unit of the information layer P 2 2 × q data of the layer data comprises a storage means information of the second two columns of clusters storing data partition includes a cell information storage unit storing information of the P-layer data unit information, q = 1, ..., P-1, where the value of P is the smallest of P 1 and P 2 minus 1.
结合第一方面,在第一方面的一种可能的实现方式中,该Q层数据存放单元信息中的第0层数据存放单元信息包括该第一分区的二级列簇的数据存放单元信息的第0层数据存放单元信息和该第二分区的二级列簇的数据存放单元信息的第0层数据存放单元信息,该Q层数据存放单元信息中的第1层数据存放单元信息至第P-1层数据存放单元信息分别为该第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中P层数据存放单元信息的中的第1层数据存放单元信息至第P-1层数据存放单元信息;该Q层数据存放单元信息中的第P层数据存放单元信息至第Q-1层数据存放单元信息分别为该第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中P层数据存放单元信息的中的第1层数据存放单元信息至第P-1层数据存放单元信息,其中P的取值为P 1与P 2中的最小值减1。 With reference to the first aspect, in a possible implementation manner of the first aspect, the 0th-level data storage unit information in the Q-layer data storage unit information includes the data storage unit information of the second-level column cluster of the first partition. Tier 0 data storage unit information and Tier 0 data storage unit information of the secondary column cluster data storage unit information of the second partition. Tier 1 data storage unit information in the Q-layer data storage unit information to P P layer data storage unit 1 in the first layer in the data storage unit of the information P layer data storage unit to store cell information 1 layer data unit information storing two columns of data clusters that each first partition comprises information to P-1 layer data storage unit information; the Q-layer data storage unit information from the P-layer data storage unit information to the Q-1 layer data storage unit information are the data storage of the second-level column clusters of the second partition, respectively The unit information includes the layer 2 data storage unit information in the layer 2 data storage unit information and the layer 1 data storage unit information in the layer P data storage unit information, where the value of P is P 1 and P 2 Minus one minus one.
结合第一方面,在第一方面的一种可能的实现方式中,该方法还包括:该数据库服务器根据该目标二级列簇的数据存放单元信息,确定该第三分区的二级列簇的数据存放单元信息,其中该第三分区的二级列簇的数据存放单元信息包括P层数据存放单元信息,其中该第三分区的二级列簇的数据存放单元信息的P层数据单元存放信息中第1层数据存放单元信息是该Q层数据存放单元信息的第0层数据存放单元信息对应的数据存放单元进行归并重排后的得到的数据存放单元的数据存放单元信息,该第三分区的二级列簇的数据存放单元信息的P层数据单元存放信息中的第2层至第P-1层数据存放单元信息中的每层数据存放单元信息是该Q层数据存放单元信息的第1层数据存放单元信息至第Q-1层数据存放单元信息中的至少两层数据存放单元信息对应的数据存放单元进行归并重排后的得到的数据存放单元的数据存放单元信息。上述技术方案中,在对第一分区和第二分区进行合并过程中,只需对进行归并重排的数据存放单元进行读写,数据的读写量小,从而减少因对分区被冻结的时间,提高分区合并效率,减少业务拥塞。进一步,上述技术方案可以压缩第三分区中的数据存放单元信息层数,便于以后对第三分区中的数据的查找等操作。With reference to the first aspect, in a possible implementation manner of the first aspect, the method further includes: determining, by the database server, the second column cluster of the third partition according to the data storage unit information of the target second column cluster. Data storage unit information, where the data storage unit information of the second-level column cluster of the third partition includes P-layer data storage unit information, where the data storage unit information of the second-level column cluster of the third partition is P-layer data unit storage information The first-level data storage unit information is the data storage unit information of the data storage unit obtained by merging and rearranging the data storage unit corresponding to the 0th-level data storage unit information of the Q-layer data storage unit information. The third partition Each layer of data storage unit information in the layer 2 to P-1 layer data storage unit information of the data storage unit information of the second-level column cluster of the second-level column cluster is the first layer of the Q-layer data storage unit information. Data storage unit information corresponding to at least two layers of data storage unit information in layer 1 data storage unit information to layer Q-1 layer data storage unit information is merged Data storage units obtained after row information storage means. In the above technical solution, in the process of merging the first partition and the second partition, it is only necessary to read and write the data storage unit for merging and rearranging, and the data read and write volume is small, thereby reducing the time that the partition is frozen To improve the efficiency of partition consolidation and reduce business congestion. Further, the above technical solution can compress the number of information storage unit information layers in the third partition, which facilitates subsequent operations such as searching for data in the third partition.
结合第一方面,在第一方面的一种可能的实现方式中,该二级列簇的数据存放单元信息中的每个数据存放单元信息中的条目键值的前缀是非分区键值。With reference to the first aspect, in a possible implementation manner of the first aspect, a prefix of an entry key value in each data storage unit information in the data storage unit information of the secondary column cluster is a non-partitioned key value.
结合第一方面,在第一方面的一种可能的实现方式中,该第一分区的元数据包括该第一分区的二级列簇的数据存放单元信息,该第二分区的元数据包括该第二分区的二级列簇的数据存放单元信息,该第一数据库服务器合并该第一分区的元数据和该第二分区的元数据生成该第三分区的元数据,具体包括:该第一数据库服务器合并该第一分区的二级列簇的数据存放单元信息和该第二分区的二级列簇的数据存放单元信息生成该第三分区的二级列簇的数据存放单元信息。进一步的,该第三分区的二级列簇的数据存储单元信息的生成方式可以参考前面第一方面的各种实现方式中的目标二级列簇的数据存储单元信息的生成方式。With reference to the first aspect, in a possible implementation manner of the first aspect, the metadata of the first partition includes data storage unit information of a secondary column cluster of the first partition, and the metadata of the second partition includes the The data storage unit information of the second-level column cluster of the second partition, and the first database server merges the metadata of the first partition and the metadata of the second partition to generate metadata of the third partition, specifically including: the first The database server merges the data storage unit information of the second-level column clusters of the first partition and the data storage unit information of the second-level column clusters of the second partition to generate the data storage unit information of the second-level column clusters of the third partition. Further, for the manner of generating the data storage unit information of the second-level column clusters of the third partition, reference may be made to the manner of generating the data storage unit information of the target second-level column clusters in the foregoing various implementations of the first aspect.
结合第一方面,在第一方面的一种可能的实现方式中,该第一分区的元数据还包括该 第一分区的预写日志信息集合,该第二分区的元数据还包括该第二分区的预写日志信息集合,该方法还包括:该数据库服务器合并该第一分区的预写日志信息集合和该第二分区的预写日志信息集合生成该第三分区的预写日志信息集合,其中,该第三分区的预写日志信息集合包括该第一分区的预写日志信息集合中的预写日志信息以及该第二分区的预写日志信息集合中的预写日志信息,其中N为大于或等于2的正整数,N 1和N 2为大于或等于1的正整数且N 1与N 2的和为N。上述技术方案中,在对第一分区和第二分区进行合并时,可以只对第一分区的预写日志信息集合和第二分区的预写日志信息集合进行合并,而无需复制相应的预写日志信息,这样可以减少数据的读写量,从而减少因对分区被冻结的时间,提高分区合并效率,减少业务拥塞。 With reference to the first aspect, in a possible implementation manner of the first aspect, the metadata of the first partition further includes a set of write-ahead log information of the first partition, and the metadata of the second partition further includes the second A partitioned write-ahead log information set, the method further comprising: the database server merging the write-ahead log information set of the first partition and the write-ahead log information set of the second partition to generate a write-ahead log information set of the third partition, The write-ahead log information set of the third partition includes the write-ahead log information in the write-ahead log information set of the first partition and the write-ahead log information in the write-ahead log information set of the second partition, where N is A positive integer greater than or equal to 2, N 1 and N 2 are positive integers greater than or equal to 1, and the sum of N 1 and N 2 is N. In the above technical solution, when merging the first partition and the second partition, only the write-ahead log information set of the first partition and the write-ahead log information set of the second partition can be merged without copying the corresponding write-ahead Log information, which can reduce the amount of data read and write, thereby reducing the time that the partition is frozen, improve the efficiency of partition consolidation, and reduce business congestion.
结合第一方面,在第一方面的一种可能的实现方式中,该第一分区的元数据还包括该第一分区的主列簇的数据存放单元信息,该第二分区的元数据还包括该第二分区的主列簇的数据存放单元信息,该方法还包括:该数据库服务器合并该第一分区的主列簇的数据存放单元信息和该第二分区的主列簇的数据存放单元信息生成该第三分区的主列簇的数据存放单元信息。上述技术方案中,在对第一分区和第二分区进行合并时,可以只对第一分区的主列簇的数据存放单元信息和第二分区的主列簇的数据存放单元信息进行合并,而无需复制相应的数据存放单元中的数据条目,这样可以减少数据的读写量,从而减少因对分区被冻结的时间,提高分区合并效率,减少业务拥塞。With reference to the first aspect, in a possible implementation manner of the first aspect, the metadata of the first partition further includes data storage unit information of a main column cluster of the first partition, and the metadata of the second partition further includes The data storage unit information of the main column cluster of the second partition, the method further comprises: the database server merging the data storage unit information of the main column cluster of the first partition and the data storage unit information of the main column cluster of the second partition Generate data storage unit information of the main column cluster of the third partition. In the above technical solution, when the first partition and the second partition are merged, only the data storage unit information of the main column cluster of the first partition and the data storage unit information of the main column cluster of the second partition may be merged, and There is no need to copy the data entries in the corresponding data storage unit, which can reduce the amount of data read and write, thereby reducing the time that the partition is frozen, improve the efficiency of partition consolidation, and reduce business congestion.
结合第一方面,在第一方面的一种可能的实现方式中,该第一分区的主列簇的数据存放单元信息包括K 1层数据存放单元信息,其中K 1为大于或等于1的正整数,该第二分区的主列簇的数据存放单元信息包括K 2层数据存放单元信息,其中K 2为大于或等于1的正整数;该第三分区的主列簇的数据存放单元信息包括K层数据存放单元信息,其中该第三分区的主列簇的数据存放单元信息包括的K层数据存放单元信息中的第k层数据存放单元信息包括该K 1层数据存放单元信息的K层数据存放单元信息中的第k层数据存放单元信息和该K 2层数据存放单元信息的K层数据存放单元信息中的第k层数据存放单元信息,其中K为K 1与K 2的最小值,其中该K 1层数据存放单元信息的第k层数据存放单元信息中的任一个数据存放单元信息的条目键值和该K 2层数据存放单元信息的第k层数据存放单元信息中的任一个数据存放单元信息的条目键值不重叠。 With reference to the first aspect, in a possible implementation manner of the first aspect, the data storage unit information of the main column cluster of the first partition includes K 1 layer data storage unit information, where K 1 is a positive value greater than or equal to 1 Integer, the data storage unit information of the main column cluster of the second partition includes K 2 layer data storage unit information, where K 2 is a positive integer greater than or equal to 1; the data storage unit information of the main column cluster of the third partition includes K-layer data storage unit information, where the data storage unit information of the main column cluster of the third partition includes the K-th data storage unit information in the K-layer data storage unit information including the K-layer of the K 1- level data storage unit information k-th layer data storage layer data unit information K information data storage unit storing the cell information in the k-th layer data storage unit of the information layer and the data storage unit K 2 information, wherein K is the minimum value of K 1 and K 2 wherein the k-th layer data entry key according to any of the k-th layer data storage layer data K 1 cell information storage unit storing information in a cell information and data of the data storage layer K 2 information storage unit A key entry of any data stored in the cell information unit of information do not overlap.
结合第一方面,在第一方面的一种可能的实现方式中,该主列簇的数据存放单元信息中的每个数据存放单元信息中的条目键值的前缀是分区键值。With reference to the first aspect, in a possible implementation manner of the first aspect, a prefix of an entry key value in each data storage unit information in the data storage unit information of the main column cluster is a partition key value.
第二方面,提供了一种数据库服务器,该数据库服务器包括用于执行第一方面或第一方面的任一种可能的实现方式的单元。In a second aspect, a database server is provided, and the database server includes a unit for executing the first aspect or any possible implementation manner of the first aspect.
第三方面,提供了一种数据库服务器,该数据库服务器包括处理器和通信接口。处理器结合通信接口实现第一方面或第一方面的任一种可能的实现方式。In a third aspect, a database server is provided. The database server includes a processor and a communication interface. The processor combines the communication interface to implement the first aspect or any possible implementation manner of the first aspect.
第四方面,本申请实施例提供一种计算机存储介质,数据库服务器运行计算机指令该实现存第一方面或第一方面的任一种可能的实现方式。In a fourth aspect, an embodiment of the present application provides a computer storage medium, and the database server runs a computer instruction to implement the first aspect or any possible implementation manner of the first aspect.
第五方面,本申请提供了一种包含指令的计算机程序产品,当该计算机程序产品中的计算机指令在数据库服务器上运行时,使得数据库服务器执行上述第一方面或第一方面的任一种可能的实现方式。In a fifth aspect, the present application provides a computer program product containing instructions, and when the computer instructions in the computer program product are run on a database server, the database server is caused to execute the first aspect or any one of the first aspect above. Implementation.
第六方面,本申请提供了一种分布式数据库系统中分区合并方法,该分布式数据库系统包括第一数据库服务器、第二数据库服务器和管理服务器,该第一数据库服务器运行第一分区,该第二数据库服务器运行第二分区;该方法包括:该管理服务器创建第三分区,确定将该第一分区和该第二分区合并为该第三分区;该管理服务器向该第一数据库服务器发送合并指令,该合并指令用于实现将该第一分区和该第二分区合并为第三分区,其中该第一分区和该第二分区为相邻分区;该合并指令包含该第一分区的当前文件的标识和该第二分区的当前文件的标识;该第一分区的当前文件中记录有存储该第一分区的元数据的文件的文件标识;该第二分区的当前文件中记录有存储该第二分区的元数据的文件的文件标识。In a sixth aspect, the present application provides a method for merging partitions in a distributed database system. The distributed database system includes a first database server, a second database server, and a management server. The first database server runs the first partition. The two database servers run a second partition; the method includes: the management server creates a third partition and determines to merge the first partition and the second partition into the third partition; the management server sends a merge instruction to the first database server , The merge instruction is used to implement merging the first partition and the second partition into a third partition, wherein the first partition and the second partition are adjacent partitions; the merge instruction includes a current file of the first partition An identifier and an identifier of a current file of the second partition; a file identifier of a file storing metadata of the first partition is recorded in the current file of the first partition; and a storage of the second is recorded in a current file of the second partition The file ID of the partitioned metadata file.
结合第六方面,在第六方面的一种可能的实现方式中,该管理服务器接收第一数据库发送的响应消息,该响应消息包含第三分区的当前文件标识。With reference to the sixth aspect, in a possible implementation manner of the sixth aspect, the management server receives a response message sent by the first database, and the response message includes a current file identifier of the third partition.
结合第六方面,在第六方面的一种可能的实现方式中,该管理服务器建立该第三分区与该第一数据库服务器的映射关系。该管理服务器在创建第三分区后,更新分区路由表,分区路由表包含第三分区与该第一数据库服务器的映射关系,具体实现可以是第三分区的标识与数据库服务器的地址的映射关系。With reference to the sixth aspect, in a possible implementation manner of the sixth aspect, the management server establishes a mapping relationship between the third partition and the first database server. After the management server creates the third partition, the partition routing table is updated. The partition routing table includes the mapping relationship between the third partition and the first database server. The specific implementation may be the mapping relationship between the identifier of the third partition and the address of the database server.
结合第六方面,在第六方面的一种可能的实现方式中,该管理服务器根据该第一数据库服务器和该第二数据库服务器的负载,确定该第一数据库服务器将该第一分区和该第二分区合并为该第三分区;其中,该第一数据库服务器的负载轻于该第二数据库服务器的负载。With reference to the sixth aspect, in a possible implementation manner of the sixth aspect, the management server determines, according to the load of the first database server and the second database server, that the first database server divides the first partition and the first database server. The two partitions are merged into the third partition; wherein the load of the first database server is lighter than the load of the second database server.
第七方面,提供了一种管理服务器,该管理服务器包括用于执行第六方面或第六方面的任一种可能的实现方式的单元。According to a seventh aspect, a management server is provided, and the management server includes a unit for executing the sixth aspect or any one of the possible implementation manners of the sixth aspect.
第八方面,提供了一种管理服务器,该管理服务器包括处理器和通信接口。处理器结合通信接口实现第六方面或第六方面的任一种可能的实现方式。According to an eighth aspect, a management server is provided. The management server includes a processor and a communication interface. The processor implements the sixth aspect or any possible implementation manner of the sixth aspect in combination with the communication interface.
第九方面,本申请实施例提供一种计算机存储介质,管理服务器运行计算机指令该实现存第六方面或第六方面的任一种可能的实现方式。In a ninth aspect, an embodiment of the present application provides a computer storage medium, and the management server runs a computer instruction to implement the sixth aspect or any possible implementation manner of the sixth aspect.
第十方面,本申请提供了一种包含指令的计算机程序产品,当该计算机程序产品中的计算机指令在管理服务器上运行时,使得管理服务器实现上述第六方面或第六方面的任一种可能的实现方式。In a tenth aspect, the present application provides a computer program product containing instructions. When the computer instructions in the computer program product run on a management server, the management server implements any of the foregoing sixth aspect or the sixth aspect. Implementation.
第十一方面,本申请提供了一种分布式数据库系统,其中,该分布式数据库系统包括第一数据库服务器、第二数据库服务器和管理服务器;该第一数据库服务器用于实现第一方面或第一方面的任一种可能的实现方式,该管理服务器用于实现上述第六方面或第六方面的任一种可能的实现方式。In an eleventh aspect, the present application provides a distributed database system, where the distributed database system includes a first database server, a second database server, and a management server; the first database server is configured to implement the first aspect or the first Any possible implementation manner of one aspect, and the management server is configured to implement the foregoing sixth aspect or any possible implementation manner of the sixth aspect.
第十二方面,本申请实施例提供一种分布式数据库系统中分区合并方法,该分布式数据库系统包括第一数据库服务器、第二数据库服务器和管理服务器,该第一数据库服务器运行第一分区,该第二数据库服务器运行第二分区;该方法包括:该第一数据库服务器接收管理服务器发送的合并指令,该合并指令用于实现将该第一分区和该第二分区合并为第三分区,其中该第一分区和该第二分区为相邻分区;该第一数据库服务器根据该第一分区获取该第一分区的元数据和该第二分区的元数据,合并该第一分区的元数据和该第二分区 的元数据生成该第三分区的元数据。In a twelfth aspect, an embodiment of the present application provides a method for merging partitions in a distributed database system. The distributed database system includes a first database server, a second database server, and a management server. The first database server runs a first partition. The second database server runs a second partition; the method includes: the first database server receives a merge instruction sent by a management server, the merge instruction is used to implement merging the first partition and the second partition into a third partition, where The first partition and the second partition are adjacent partitions; the first database server obtains metadata of the first partition and metadata of the second partition according to the first partition, and combines metadata of the first partition and The metadata of the second partition generates metadata of the third partition.
结合第十二方面,在第十二方面的一种可能的实现方式中,该合并指令包含该第一分区的当前文件的标识和该第二分区的当前文件的标识;该第一分区的当前文件中记录有存储该第一分区的元数据的文件的文件标识;该第二分区的当前文件中记录有存储该第二分区的元数据的文件的文件标识;该第一数据库服务器获取该第一分区的元数据和该第二分区的元数据,具体包括:根据该第一分区的当前文件的标识获取该第一分区的元数据,根据该第二分区的当前文件的标识获取该第二分区的元数据。With reference to the twelfth aspect, in a possible implementation manner of the twelfth aspect, the merge instruction includes an identifier of the current file of the first partition and an identifier of the current file of the second partition; A file identifier of a file storing metadata of the first partition is recorded in the file; a file identifier of a file storing metadata of the second partition is recorded in a current file of the second partition; the first database server obtains the first The metadata of a partition and the metadata of the second partition specifically include: obtaining the metadata of the first partition according to the identifier of the current file of the first partition, and obtaining the second metadata according to the identifier of the current file of the second partition Partition metadata.
在第十二方面的其他可能实现方式,可以参考第一方面的任一种可能的实现方式。For other possible implementation manners in the twelfth aspect, reference may be made to any possible implementation manner in the first aspect.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是一个分区示意图。Figure 1 is a schematic diagram of a partition.
图2是KVDB的架构示意图。Figure 2 is a schematic diagram of the KVDB architecture.
图3是根据本申请实施例提供的处理分区的方法的示意性流程图。FIG. 3 is a schematic flowchart of a method for processing a partition according to an embodiment of the present application.
图4是层合并过程的示意图。FIG. 4 is a schematic diagram of a layer merging process.
图5是根据本申请实施例提供的数据库服务器的结构框图。FIG. 5 is a structural block diagram of a database server according to an embodiment of the present application.
图6是根据本发明实施例提供的数据库服务器的结构框图。FIG. 6 is a structural block diagram of a database server according to an embodiment of the present invention.
图7是根据本申请实施例提供的管理服务器的结构框图。FIG. 7 is a structural block diagram of a management server according to an embodiment of the present application.
图8是根据本发明实施例提供的管理服务器的结构框图。FIG. 8 is a structural block diagram of a management server according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合附图,对本申请中的技术方案进行描述。The technical solutions in this application will be described below with reference to the drawings.
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下中的至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a、b、c、a-b、a-c、b-c、或a-b-c,其中a、b、c可以是单个,也可以是多个。另外,在本申请的实施例中,“第一”、“第二”等字样并不对数量和执行次序进行限定。In the present application, "at least one" means one or more, and "multiple" means two or more. "And / or" describes the association relationship of related objects, and indicates that there can be three kinds of relationships, for example, A and / or B can represent: the case where A exists alone, A and B exist simultaneously, and B alone exists, where A, B can be singular or plural. The character "/" generally indicates that the related objects are an "or" relationship. "At least one or more of the following" or similar expressions refers to any combination of these items, including any combination of single or plural items. For example, at least one item (a), a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, and c may be single or multiple. In addition, in the embodiments of the present application, the words “first”, “second” and the like do not limit the number and execution order.
需要说明的是,本申请中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。It should be noted that, in this application, words such as "exemplary" or "for example" are used as examples, illustrations, or illustrations. Any embodiment or design described as "exemplary" or "for example" in this application should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of the words "exemplary" or "for example" is intended to present the relevant concept in a concrete manner.
为了便于本领域技术人员更好地理解本申请实施例,首先对本申请实施例分布式数据库系统涉及的一些基本概念进行介绍。In order to facilitate those skilled in the art to better understand the embodiments of the present application, some basic concepts related to the distributed database system in the embodiments of the present application are first introduced.
KVDB是使用键值存储的数据库。数据库中的数据是按照键值的形式进行组织、索引和存储的。常用的KVDB包括:RockesDB,LevelDB等。KVDB is a database that uses a key-value store. The data in the database is organized, indexed, and stored in the form of key values. Common KVDB includes: RockesDB, LevelDB, etc.
分区键值:分区技术中,通常固定使用数据表中数据条目的某个列域的值、或者某几 个列域值的顺序组合值来决定分区,该值称为分区键值(partition key)。每个数据条目根据其分区键值可以唯一确定本数据条目所在的分区。Partitioning key value: In partitioning technology, the value of a certain column field of the data entry in the data table, or the sequential combination of several column field values is usually used to determine the partition. This value is called the partition key. . Each data entry can uniquely determine the partition in which the data entry is located according to its partition key value.
下面结合表1对分区键值进行更具体地描述。The partition key values are described in more detail in conjunction with Table 1.
表1Table 1
bnbn onon verver crtcrt dlcdlc
AA dd 11 15012328529861501232852986 2020
A A dd 22 15012328529861501232852986 2020
BB xx 11 15067102389831506710238983 2020
表1示出了三个数据条目的五个列域{bn,on,ver,crt,dlc}。例如,第一个数据条目的五个列域的值分别为{A,d,1,1501232852986,20}。分区键值可以是上述五个列域中的一个列域的值,也可以是几个列域的值的顺序组合值。例如,假设分区键值是列域{bn,on}的组合值。可以看出,上述三个数据条目中,第一个数据条目和第二个数据条目的列域{bn,on}均为{A,d}。换句话说,第一个数据条目和第二个数据条目的分区键值均为{A,d}。上述三个数据条目中的第三个数据条目的列域{bn,on}为{B,x}。换句话说,第三个数据条目的分区键值为{B,x}。Table 1 shows the five column fields {bn, on, ver, crt, dlc} of the three data entries. For example, the values of the five column fields of the first data entry are {A, d, 1, 1501232852986, 20}. The partition key value may be a value of one column field of the above five column fields, or a sequential combination of the values of several column fields. For example, suppose the partition key value is a combined value of the column field {bn, on}. It can be seen that in the above three data entries, the column fields {bn, on} of the first data entry and the second data entry are both {A, d}. In other words, the partition key values for the first and second data entries are both {A, d}. The column field {bn, on} of the third data entry among the above three data entries is {B, x}. In other words, the partition key value for the third data entry is {B, x}.
为了便于描述,表1中以字母表示列域On的值。在实际应用中,列域On的值可以是具体的数值,例如0109,0208等。For ease of description, the value of the column field On is represented by a letter in Table 1. In practical applications, the value of the column field On may be a specific value, such as 0109, 0208, and the like.
本申请实施例中分区中的数据条目是按照分区键值的自然序排列的。依然假设分区键值是列域{bn,on}的组合值。图1是一个分区键值为{bn,on}的多个分区的示意图。如图1所示的五个分区边界上的点的归属可以采用左开右闭或者右开左闭的原则划分。若采用左开右闭原则,则分区键值为bn=A,on=w的数据条目为分区1中的数据条目。若采用右开左闭原则,则分区键值为bn=A,on=w的数据条目为分区2中的数据条目。结合图1的分区示意以及表1所示的三个数据条目的分区键值可以看出,第一个数据条目和第二个数据条目是分区1中的数据条目,第三个数据条目是分区2中的数据条目。为了保证分区中的数据条目按照分区键值的自然序排列,可以采用范围分区(range partition)方式对数据表进行拆分。其他数据表拆分方式也可能使得分区中的数据条目是按照分区键值的自然序排列。因此,本申请实施例对数据表的拆分方式并不进行限定,只要数据表的拆分方式能够使得数据表拆分后,分区中的数据条目是按照分区键值的自然序排列即可。从图1所示的分区可以,分区1和分区2中的分区键值按照自然序排列并且连续,称为相邻的分区。本申请实施例中,KVDB将数据表中的数据条目以分区为粒度由不同的数据库服务器管理,即数据库服务器提供分区的数据条目的访问,也称为数据库服务器为分区提供服务或数据库服务器运行分区。The data entries in the partition in the embodiment of the present application are arranged according to the natural order of the partition key values. It is still assumed that the partition key value is the combined value of the column fields {bn, on}. FIG. 1 is a schematic diagram of multiple partitions with a partition key value of {bn, on}. The assignment of points on the boundaries of the five partitions as shown in FIG. 1 can be divided using the principle of left opening, right closing, or right opening, left closing. If the principle of left-open and right-close is adopted, the data entry with partition key values of bn = A and on = w is the data entry in partition 1. If the right open and left closed principle is adopted, the data entry with partition key values of bn = A and on = w is the data entry in partition 2. With reference to the partitioning diagram in Figure 1 and the partition key values of the three data entries shown in Table 1, it can be seen that the first data entry and the second data entry are the data entries in partition 1, and the third data entry is the partition. Data entry in 2. In order to ensure that the data entries in the partition are arranged in the natural order of the partition key values, the data table can be split by using a range partition method. Other data table splitting methods may also cause the data entries in the partition to be arranged in the natural order of the partition key values. Therefore, the embodiment of the present application does not limit the method of splitting the data table, as long as the method of splitting the data table enables the data table to be split, the data entries in the partition are arranged in the natural order of the partition key values. It can be seen from the partitions shown in FIG. 1 that the partition key values in partitions 1 and 2 are arranged in a natural order and are continuous, which are called adjacent partitions. In the embodiment of the present application, KVDB manages the data entries in the data table with partitions as granularity and is managed by different database servers. That is, the database server provides access to partitioned data entries. .
分区路由表:分区路由表可以包括以下信息:分区标识和归属的数据库服务器的地址,还可以包括分区根索引文件标识、分区的左边界、分区的右边界和分区状态等。数据库服务器的地址可以是互联网协议(Internet Protocol,IP)地址,也可以是数据库服务器的标识等,本发明实施例对此不作限定,分区路由表也称为分区视图。图1是一个分区示意图。例如,表2是一个根据图1所示的多个分区的分区路由表。分区根索引文件标识标识每个 分区的清单文件的文件名,分区状态表示该分区当前的状态,例如正常服务状态、分裂状态、合并状态、隔离状态等。如表2所示的Normal表示分区当前的状态为正常服务状态。Partition routing table: The partition routing table can include the following information: the partition ID and the address of the database server to which it belongs, and it can also include the partition root index file ID, the left boundary of the partition, the right boundary of the partition, and the partition status. The address of the database server may be an Internet Protocol (IP) address, or an identifier of the database server, etc. This is not limited in this embodiment of the present invention, and the partition routing table is also referred to as a partitioned view. Figure 1 is a schematic diagram of a partition. For example, Table 2 is a partition routing table based on multiple partitions shown in FIG. The partition root index file identifier identifies the file name of the inventory file for each partition. The partition status indicates the current status of the partition, such as normal service status, split status, merge status, isolation status, and so on. Normal shown in Table 2 indicates that the current status of the partition is normal service status.
表2Table 2
Figure PCTCN2019097559-appb-000001
Figure PCTCN2019097559-appb-000001
$Min表示无穷小,$Max表示无穷大。以数据条目的分区键值为输入,查询分区路由表,就可以得到该数据条目所属的分区,以及归属的数据库服务器等信息。以表1中的第三个数据条目为例,该数据条目的分区键值为{B,x}。根据如表2所示的分区路由表,可以确定该数据条目所属的分区为分区2,归属的数据库服务器地址为8.11.234.2:27021。$ Min is infinitely small, and $ Max is infinite. With the partition key value of the data entry as input, querying the partition routing table can obtain information such as the partition to which the data entry belongs and the database server to which it belongs. Taking the third data entry in Table 1 as an example, the partition key value of the data entry is {B, x}. According to the partition routing table shown in Table 2, it can be determined that the partition to which the data entry belongs is partition 2, and the home database server address is 8.11.234.2:27021.
本申请实施例中所称的分区合并都是针对相邻分区的合并,例如分区1和分区2合并,分区3和分区4合并,分区4和分区5合并。The partition merges referred to in the embodiments of the present application are all merges of adjacent partitions, for example, merge partition 1 and partition 2, merge partition 3 and partition 4, and merge partition 4 and partition 5.
主索引条目包括分区键值。除分区键值外,主索引条目还可以包括数据条目的其他多个列域的值。分区键值以及其他多个列域中的一个或多个列域的值可以构成主索引条目。还以表1为例,主索引条目可以包括如表1所示的五个列域{bn,on,ver,crt,dlc},其中主索引条目键值对应列域{bn,on,ver}。{bn,on}可以称为主索引条目的前缀。可以看出,{bn,on}是分区键值。因此,该主索引条目是以分区键值为前缀的主索引条目。条目键值是对一个数据条目的唯一索引,可以包括该数据条目的多个列域。The primary index entry includes the partition key value. In addition to the partition key value, the main index entry can also include values from several other column fields of the data entry. Partition key values and values from one or more column fields from multiple other column fields can make up the primary index entry. Taking Table 1 as an example, the main index entry may include five column fields {bn, on, ver, crt, dlc} as shown in Table 1, where the key value of the main index entry corresponds to the column field {bn, on, ver} . {bn, on} can be called the prefix of the main index entry. It can be seen that {bn, on} is the partition key value. Therefore, the primary index entry is the primary index entry prefixed by the partition key value. An entry key is a unique index into a data entry and can include multiple column fields for that data entry.
二级索引条目也可以称为辅助索引条目。在分布式数据库系统中,为了满足复杂查询场景,可以为每个数据条目创建一个或多个二级索引条目。二级索引条目由二级索引列域与主索引条目键值对应的列域组成。二级索引条目的格式可以为二级索引列域+主索引条目键值对应的列域,或称为二级索引列域和主索引条目键值对应的列域。在此情况下,该二级索引条目是以非分区键值为前缀的二级索引条目。二级索引条目的格式还可以是主索引条目键值对应的列域+二级索引列域。在此情况下,该二级索引条目是以分区键值为前缀的二级索引条目。在每个数据条目包括多个二级索引条目的情况下,该多个二级索引条目中不同的二级索引条目中所包括的二级索引列域不相同。例如,假设每个数据条目包括两个二级索引条目,其中一个二级索引列域可以为dlc,另一个二级索引列域可以为crt。Secondary index entries can also be called secondary index entries. In a distributed database system, to meet complex query scenarios, one or more secondary index entries can be created for each data entry. The secondary index entry consists of the secondary index column field and the column field corresponding to the key value of the primary index entry. The format of the secondary index entry can be the secondary index column field + the primary index entry key value column field, or the secondary index column field and the primary index entry key value column field. In this case, the secondary index entry is a secondary index entry that is prefixed with a non-partitioned key value. The format of the secondary index entry can also be the column field corresponding to the key value of the primary index entry + the secondary index column field. In this case, the secondary index entry is a secondary index entry prefixed by the partition key value. When each data entry includes multiple secondary index entries, the secondary index column fields included in different secondary index entries in the multiple secondary index entries are different. For example, suppose each data entry includes two secondary index entries, one of which can be dlc and the other can be crt.
每个分区下可以包括一个主索引条目集合或者一个或多个二级索引条目集合。主索引 条目集合由该分区下的所有主索引条目组成。主索引条目集合中的主索引条目按照主索引条目键值排序。Each partition may include a primary index entry set or one or more secondary index entry sets. The main index entry set consists of all the main index entries under the partition. The primary index entries in the primary index entry set are sorted by the primary index entry key value.
二级索引条目集合中的二级索引条目按照二级索引条目键值排序。若每个分区下包括多个二级索引条目集合,则每个二级索引条目集合由具有相同二级索引条目列域的二级索引条目组成。假设每个数据条目包括两个二级索引条目,其中一个二级索引列域可以为dlc,另一个二级索引列域可以为crt。该分区下可以包括两个二级索引条目集合,其中一个二级索引条目集合包括该分区下二级索引列域为crt的全部二级索引条目键值,另一个二级索引条目集合包括分区下二级索引列域为dlc的全部二级索引条目键值。The secondary index entries in the secondary index entry set are sorted by the secondary index entry key value. If each partition includes multiple secondary index entry sets, each secondary index entry set consists of secondary index entries with the same secondary index entry column domain. Assume that each data entry includes two secondary index entries, one of which can be dlc and the other can be crt. This partition can include two secondary index entry sets, one of which includes all secondary index entry key values of the secondary index column domain of the partition under crt, and the other secondary index entry set includes the partition. The secondary index column field is the key value of all secondary index entries of the dlc.
具有相同特征的多个列域可以成为一个列簇(column family)。如上所述,主索引条目集合中所包括的主索引条目具有相同的多个列域。因此,一个主索引条目集合可以称为一个列簇,该列簇可以称为主索引列簇。类似的,一个二级索引条目集合包括的二级索引条目也具有相同的多个列域。因此一个二级索引条目也可以作为一个列簇,该列簇可以称为二级索引列簇。Multiple column domains with the same characteristics can become a column family. As described above, the main index entries included in the main index entry set have the same multiple column fields. Therefore, a main index entry set can be called a column cluster, and the column cluster can be called a main index column cluster. Similarly, the secondary index entries included in a secondary index entry set also have the same multiple column fields. Therefore, a secondary index entry can also be used as a column cluster, which can be called a secondary index column cluster.
本申请的一些实施例是以采用日志结构的合并树(long structured merge-tree,LSM-tree)算法的KVDB为例进行描述的。因此,以下对LSM-tree进行简单介绍。Some embodiments of the present application are described by taking a KVDB adopting a log-structured merged-tree (LSM-tree) algorithm as an example. Therefore, the following briefly introduces the LSM-tree.
预写日志(write-ahead logging,WAL)文件:当在KVDB中插入一条数据条目时,该数据条目会先写入WAL文件中,写入成功之后插入到内存表(memory table,MemTable)。Write-ahead logging (WAL) file: When a data entry is inserted into the KVDB, the data entry is written into the WAL file first, and then inserted into the memory table (memTable) after the write is successful.
MemTable:MemTable对应WAL文件,是WAL文件内容在内存中的有序组织结构。MemTable提供了键值数据(数据条目)的写入、删除以及读取的操作结构。MemTable内部将数据条目按条目键值有序存储。MemTable: MemTable corresponds to a WAL file, which is an ordered organization structure of the contents of the WAL file in memory. MemTable provides a structure for writing, deleting, and reading key-value data (data entries). MemTable internally stores data items in order by entry key value.
不可变MemTable:当MemTable占用的内存空间到达一个上限值之后,需要将内存中的按条目键值有序存储的数据条目转储到排序字符串表(Sorted String Table,SSTable),同时对应的WAL文件不再写入新的数据条目。此时,MemTable会被冻结成不可变MemTable(Immutable MemTable),并同时生成一个新的MemTable。新到来的数据条目被记入新的WAL文件和新生成的MemTable中。不可变MemTable中的数据条目是不可更改的。换句话说,不可变MemTable中的数据条目只能读取不能写入或删除。Immutable MemTable: When the memory space occupied by MemTable reaches an upper limit, it is necessary to dump the data items stored in memory in order according to the entry key value to the Sorted String Table (SSTable), and correspondingly WAL files no longer write new data entries. At this time, the MemTable will be frozen into an immutable MemTable (Immutable MemTable), and a new MemTable will be generated at the same time. The new data entry is recorded in the new WAL file and the newly generated MemTable. The data entries in the immutable MemTable are immutable. In other words, the data items in the immutable MemTable can only be read, not written or deleted.
SSTable是KVDB数据存放的单元。每个SSTable中的条目键值是有序的。每个不可变MemTable进行合并处理后,会得到一个SSTable。对不可变MemTable进行合并得到SSTable的过程可以成为次级合并(minor compaction)。SSTable is the unit where KVDB data is stored. The entry key values in each SSTable are ordered. After each immutable MemTable is merged, an SSTable is obtained. The process of merging the immutable MemTable to obtain the SSTable can become a minor compaction.
KVDB对SSTable文件的存储划分为不同层(level),level 0至level n,其中n为大于或等于1的正整数。level 0中会包括多个SSTable文件,在该多个SSTable中,一个SStable是对一个不可变MemTable进行次级合并后得到的。换句话说,对多个不可变MemTable进行次级合并分别得到对应的SSTable。该多个SSTable中的不同SSTable之间的条目键值会有重合。在满足一定条件后,level 0中的SSTable与Level 1中的SSTable进行合并,合并后得到的SSTable就是level 1中存储的SSTable。KVDB divides the storage of the SSTable file into different levels, level 0 to level n, where n is a positive integer greater than or equal to 1. Level 0 includes multiple SSTable files. Among the multiple SSTables, one SStable is obtained by sub-merging an immutable MemTable. In other words, sub-merging multiple immutable MemTables to obtain corresponding SSTables. The entry keys of different SSTables in the multiple SSTables will overlap. After meeting certain conditions, the SSTable in level 0 and SSTable in level 1 are merged, and the SSTable obtained after the merge is the SSTable stored in level 1.
level 1至level n中每层level维护指定的SSTable个数,每层level内的所有SSTable之间的条目键值不重叠。当一层level中的SSTable满足一定条件时,该level中的SSTable可以被选出与下一层level(即level的值加1对应的Level层,例如,level 1的下一level 是level 2,level 2的下一level是level 3,以此类推)中的SSTable合并。在合并之后,该选出的SSTable被删除。两个level之间的SSTable的合并处理可以称为主合并(major compaction)。Each level in level 1 to level n maintains the specified number of SSTables, and the entry key values between all SSTables in each level do not overlap. When the SSTable in a level meets certain conditions, the SSTable in that level can be selected as the Level layer corresponding to the next level (that is, the value of level plus 1). For example, the next level of level 1 is level 2. The next level of level 2 is level 3, and so on). After merging, the selected SSTable is deleted. The merge processing of the SSTable between the two levels can be referred to as a major compaction.
清单文件(manifest file):清单文件用于记录WAL文件信息以及SSTable信息。更具体地,清单文件中所记录的WAL文件信息包括WAL文件的标识以及该WAL文件的时间序列号。清单文件中记录的SSTable信息包括SSTable所属的列簇、该SSTtable所属的level,该SSTable的标识、该SSTable的时间序列号、该SSTable的大小、该SSTable的最小条目键值以及该SSTable的最大条目键值中的一种或几种。Manifest file: The manifest file is used to record WAL file information and SSTable information. More specifically, the WAL file information recorded in the manifest file includes an identification of the WAL file and a time serial number of the WAL file. The SSTable information recorded in the manifest file includes the column cluster to which the SSTable belongs, the level to which the SSTtable belongs, the identity of the SSTable, the time sequence number of the SSTable, the size of the SSTable, the minimum entry key value of the SSTable, and the maximum entry of the SSTable. One or more of the key-values.
下面将结合图2对KVDB进行介绍。The KVDB will be described below with reference to FIG. 2.
如图2所示的KVDB 200包括:管理服务器210,数据库服务器221、数据库服务器222和数据库服务器223。如图2所示的KVDB 200还可以包括存储服务器231、存储服务器232和存储服务器233。The KVDB 200 shown in FIG. 2 includes a management server 210, a database server 221, a database server 222, and a database server 223. The KVDB 200 shown in FIG. 2 may further include a storage server 231, a storage server 232, and a storage server 233.
数据库服务器221、数据库服务器222和数据库服务器223可以合称为分布式数据库服务集群。存储服务器231、存储服务器232和存储服务器233可以为KVDB提供分布式共享存储池。具体实现中,也可以由集中式的存储为KVDB提供存储资源,例如由存储阵列为KVDB提供存储资源。The database server 221, the database server 222, and the database server 223 may be collectively referred to as a distributed database service cluster. The storage server 231, the storage server 232, and the storage server 233 may provide a distributed shared storage pool for KVDB. In specific implementation, KVDB can also be provided with storage resources by centralized storage, for example, storage arrays can provide storage resources for KVDB.
管理服务器210负责指定分区与数据库服务器之间的归属关系。分区路由表也由管理服务器210负责维护。The management server 210 is responsible for specifying the ownership relationship between the partition and the database server. The partition routing table is also maintained by the management server 210.
上述将MemTable冻结为不可变MemTable、次级合并以及主合并的操作都可以由数据库服务器执行。The above operations of freezing a MemTable to an immutable MemTable, a secondary merge, and a primary merge can all be performed by a database server.
在分布式数据库系统中,一个数据库服务器负责一个分区的数据条目的存储管理,因此,以存储服务器为例,数据库服务器生成的相应分区的WAL文件,SSTable以及清单文件可以持久化在存储服务器中,该数据库服务器可以访问存储服务器中存储的该相应分区的WAL文件,SSTable以及清单文件。MemTable,不可变MemTable都保存在数据库服务器的内存中。In a distributed database system, a database server is responsible for the storage management of data items in a partition. Therefore, taking the storage server as an example, the WAL file, SSTable, and manifest file of the corresponding partition generated by the database server can be persisted in the storage server. The database server can access the WAL file, SSTable, and manifest file of the corresponding partition stored in the storage server. MemTable, immutable MemTable are stored in the memory of the database server.
图3是根据本申请实施例提供的处理分区的方法的示意性流程图。图3所示的方法可以应用于基于日志结构的合并树的KVDB。其中,分区1由数据库服务器1提供服务,分区2由数据库服务器2提供服务。FIG. 3 is a schematic flowchart of a method for processing a partition according to an embodiment of the present application. The method shown in FIG. 3 can be applied to a KVDB of a merge tree based on a log structure. Among them, partition 1 is served by database server 1, and partition 2 is served by database server 2.
301,管理服务器根据均衡策略对相邻的第一分区和第二分区合并,在分区路由表中标记分区1和分区2为合并状态,并持久化。301. The management server merges adjacent first partitions and second partitions according to a balancing policy, marks partition 1 and partition 2 as merged states in the partition routing table, and persists them.
均衡策略可以为分区数据条目数量、访问热度等,或者运行该分区的数据库服务器的负载等。The balancing strategy can be the number of partition data entries, access popularity, etc., or the load of the database server running the partition.
302,管理服务器通知数据库服务器1的第一分区准备合并。302. The management server notifies the first partition of the database server 1 that it is ready to merge.
303,管理服务器通知数据库服务器2的第二分区准备合并。303. The management server notifies the second partition of the database server 2 that it is ready to merge.
304,数据库服务器2停止合并任务并将第二分区置为只读,将写请求挂起,向该管理服务器发送成功响应。304. The database server 2 stops the merge task and sets the second partition as read-only, suspends the write request, and sends a successful response to the management server.
具体地,若该数据库服务器2有尚未开始的合并任务,则该数据库服务器2停止进行合并任务。若该数据库服务器2正在进行合并任务,则该数据库服务器2在完成正在进行 的合并任务后,停止进行合并任务。换句话说,数据库服务器2在收到管理服务器1发送的准备合并的通知后,不再对第二分区保存的内容进行更改。Specifically, if the database server 2 has a merge task that has not yet started, the database server 2 stops performing the merge task. If the database server 2 is performing a merge task, the database server 2 stops performing the merge task after completing the ongoing merge task. In other words, the database server 2 does not make changes to the content saved in the second partition after receiving the notification of preparation for merging sent by the management server 1.
305,数据库服务器1停止合并任务并将第一分区置为只读,将写请求挂起,向该管理服务器发送成功响应。305. The database server 1 stops the merge task and sets the first partition as read-only, suspends the write request, and sends a successful response to the management server.
步骤305与步骤304类似,在此就不必赘述。Step 305 is similar to step 304, and it is unnecessary to repeat it here.
306,管理服务器在分布式数据库系统中创建第三分区,在分区路由表中将第三分区标记为初始状态。306. The management server creates a third partition in the distributed database system, and marks the third partition as an initial state in the partition routing table.
管理服务器在分布式数据库系统中创建第三分区,具体实现可以为生成第三分区的标识,将第三分区的标识添加至分区路由表。示例性的,本申请实施例中具体实现过程中,管理服务器可以基于数据库服务器1和数据库服务器2的负载确定由负载较轻的数据库服务器1运行第三分区,例如数据库服务器1的负载较小,则在数据库服务器1上运行第三分区,由数据库服务器1为第三分区提供服务。可以理解的是,可选的,管理服务器也可以指示其他数据库服务器(例如数据库服务器2或者数据库服务器3等)将该第一分区与该第二分区合并为该第三分区,即在其他数据库服务器上运行第三分区。本申请实施例中具体实现过程中,另一种实现方式,在数据库服务器1和数据库服务器2随机选择一个数据库服务器运行第三分区,或者通过特定算法在数据库服务器1和数据库服务器2中选择一个数据库服务器运行第三分区,例如可以基于Hash或数据库服务器所管理的分区标识(如分区编号)对总的分区数量取模决定一个数据库服务器运行第三分区。具体的,管理服务器根据上述实现方式确定运行第三分区的数据库服务器。The management server creates a third partition in the distributed database system. The specific implementation may be to generate the identifier of the third partition and add the identifier of the third partition to the partition routing table. Exemplarily, in a specific implementation process in the embodiment of the present application, the management server may determine that the third partition is run by the lightly loaded database server 1 based on the load of the database server 1 and the database server 2, for example, the load of the database server 1 is small. The third partition is run on the database server 1, and the database server 1 provides services for the third partition. It can be understood that, optionally, the management server may also instruct other database servers (such as database server 2 or database server 3) to merge the first partition and the second partition into the third partition, that is, in other database servers Run on the third partition. In the specific implementation process in the embodiment of the present application, in another implementation manner, a database server 1 and a database server 2 are randomly selected to run the third partition, or a database is selected between the database server 1 and the database server 2 through a specific algorithm. The server runs the third partition. For example, a database server may perform the third partition based on a partition identifier (such as a partition number) managed by the database server to determine the total number of partitions. Specifically, the management server determines the database server running the third partition according to the foregoing implementation manner.
307,管理服务器向数据库服务器1发送合并指令,该合并指令用于实现将第一分区与第二分区合并为第三分区。307. The management server sends a merge instruction to the database server 1, and the merge instruction is used to implement merging the first partition and the second partition into a third partition.
本申请实施例中,通过合并第一分区的元数据和第二分区的元数据生成第三分区的元数据这种方式实现将第一分区和第二分区合并为第三分区,即访问第三分区的元数据即可访问第一分区的数据条目和第二分区的数据条目。具体的,一种实现,该合并指令中包含该第一分区的当前(current)文件的标识和该第二分区的当前文件的标识。该第一分区的当前文件中记录有存储该第一分区的元数据的文件的文件标识。该第二分区的当前文件中记录有存储该第二分区的元数据的文件的文件标识。以采用LSM-tree算法的KVDB为例,LSM-tree算法的KVDB中的清单文件用于存储分区的元数据,因此当前文件中记录有存储分区的元数据的文件的文件标识可以是LSM-tree算法的KVDB中的清单文件的标识。In the embodiment of the present application, the method of combining the metadata of the first partition and the metadata of the second partition to generate metadata of the third partition is to implement combining the first partition and the second partition into the third partition, that is, to access the third partition. The metadata of the partition can access the data entry of the first partition and the data entry of the second partition. Specifically, in an implementation, the merge instruction includes an identifier of a current file of the first partition and an identifier of a current file of the second partition. In the current file of the first partition, a file identifier of a file storing metadata of the first partition is recorded. A file identifier of a file storing metadata of the second partition is recorded in the current file of the second partition. Taking KVDB using the LSM-tree algorithm as an example, the manifest file in the KVDB of the LSM-tree algorithm is used to store partition metadata, so the file identifier of the file in which the metadata of the storage partition is recorded in the current file may be LSM-tree The identification of the manifest file in the algorithm's KVDB.
308,数据库服务器1创建对应于第三分区的数据库。308. The database server 1 creates a database corresponding to the third partition.
数据库服务器1创建对应于第三分区的数据库,一种实现方式为启动新的数据库进程或数据库实例,另一种实现方式可以为使用当前运行的数据库进程或数据库实例作为第三分区的数据库。The database server 1 creates a database corresponding to the third partition. One implementation is to start a new database process or database instance, and another implementation may be to use the currently running database process or database instance as the database of the third partition.
309,数据库服务器1获取第一分区的元数据和第二分区的元数据。309. The database server 1 obtains metadata of the first partition and metadata of the second partition.
具体实现,数据库服务器1读取第一分区的当前文件,得到第一分区的元数据;读取第二分区的当前文件,得到第二分区的元数据;将该第一分区的元数据与该第二分区的元数据载入到数据库服务器1的内存。Specifically, the database server 1 reads the current file of the first partition to obtain metadata of the first partition; reads the current file of the second partition to obtain metadata of the second partition; and combines the metadata of the first partition with the The metadata of the second partition is loaded into the memory of the database server 1.
另外一种实现,数据库服务器1从管理服务器获取运行第二分区的数据库服务器2的 地址,数据库服务器1从数据库服务器2获取第二分区的元数据,或者从从数据库服务器2获取第二分区的元数据的信息,数据库服务器1根据第二分区的元数据的信息获取第二分区的元数据。第二分区的元数据的信息可以是存储第二分区的元数据的文件的文件标识。In another implementation, the database server 1 obtains the address of the database server 2 running the second partition from the management server, the database server 1 obtains the metadata of the second partition from the database server 2, or obtains the metadata of the second partition from the database server 2. For the information of the data, the database server 1 obtains the metadata of the second partition according to the information of the metadata of the second partition. The information of the metadata of the second partition may be a file identifier of a file storing the metadata of the second partition.
310,数据库服务器1合并第一分区的元数据与第二分区的元数据,生成第三分区的元数据。310. The database server 1 merges metadata of the first partition with metadata of the second partition to generate metadata of the third partition.
具体地,数据库服务器1生成第三分区的元数据。进一步的,数据库服务器1为第三分区创建当前文件,该第三分区的当前文件中记录有存储该第三分区的元数据的文件的文件标识。Specifically, the database server 1 generates metadata of the third partition. Further, the database server 1 creates a current file for the third partition, and the current file of the third partition records a file identifier of a file storing metadata of the third partition.
该第一分区的元数据包括第一分区的二级列簇的数据存放单元信息。该第二分区的元数据包括第二分区的二级列簇的数据存放单元信息。以采用LSM-tree算法的KVDB为例,数据存放单元信息是SSTable信息,数据存放单元是SSTable。The metadata of the first partition includes data storage unit information of a secondary column cluster of the first partition. The metadata of the second partition includes data storage unit information of the secondary column clusters of the second partition. Taking KVDB using the LSM-tree algorithm as an example, the data storage unit information is SSTable information, and the data storage unit is SSTable.
在第一分区与第二分区中,以分区键值为前缀的索引(包括主索引和二级索引)条目与以非分区键值为前缀的二级索引条目分别使用不同的列簇组织存储。主列簇的数据存放单元信息是指以分区键值为前缀的索引(包括主索引和二级索引)条目组织存储的相关信息。二级列簇的数据存放单元信息是指以非分区键值为前缀的二级索引条目组织存储的相关信息。In the first partition and the second partition, the index (including the primary index and the secondary index) prefixed by the partition key value and the secondary index entry prefixed by the non-partition key value are stored in different column clusters, respectively. The data storage unit information of the primary column cluster refers to the related information organized and stored by the index (including the primary index and the secondary index) entries prefixed by the partition key value. The data storage unit information of the secondary column cluster refers to related information organized and stored by the secondary index entries whose prefixes are non-partitioned key values.
数据库服务器1可以合并该第一分区的二级列簇的数据存放单元信息与该第二分区的二级列簇的数据存放单元信息生成目标二级列簇的数据存放单元信息。可选的,在一些实施例中,该目标二级列簇的数据存放单元信息可以作为该第三分区的二级列簇的数据存放单元信息。可选的,在另一些实施例中,数据服务器1可以根据该目标列簇的数据存放单元信息确定该第三分区的二级列簇的数据存放单元信息。The database server 1 may combine data storage unit information of the secondary column clusters of the first partition with data storage unit information of the secondary column clusters of the second partition to generate data storage unit information of the target secondary column clusters. Optionally, in some embodiments, the data storage unit information of the target secondary column cluster may be used as the data storage unit information of the secondary column cluster of the third partition. Optionally, in other embodiments, the data server 1 may determine the data storage unit information of the secondary column cluster of the third partition according to the data storage unit information of the target column cluster.
该第一分区的二级列簇的数据存放单元信息包括P 1层的数据存放单元信息,其中P 1为大于或等于2的正整数;该第二分区的二级列簇的数据存放单元信息包括P 2层的数据存放单元信息,其中P 2为大于或等于2的正整数;该目标二级列簇的数据存放单元信息包括Q层的数据存放单元信息,该Q层的数据存放单元信息包括该第一分区的二级列簇的数据存放单元信息和该第二分区的二级列簇的数据存放单元信息,其中,该Q层的数据存放单元信息的一层数据存放单元信息包括该P 1层的数据存放单元信息中的一层数据存放单元信息和该P 2层的数据存放单元信息中的一层数据存放单元信息,该Q层的数据存放单元信息的Q-1层的数据存放单元信息中的一层数据存放单元信息包括该P 1层数据存放单元信息中的一层数据存放单元信息或者该P 2层数据存放单元信息中的一层数据存放单元信息,其中Q等于P 1+P 2-1。 The data storage unit information of the second column cluster of the first partition includes data storage unit information of the P 1 layer, where P 1 is a positive integer greater than or equal to 2; the data storage unit information of the second column cluster of the second partition Including the data storage unit information of the P 2 layer, where P 2 is a positive integer greater than or equal to 2. The data storage unit information of the target secondary column cluster includes the data storage unit information of the Q layer, and the data storage unit information of the Q layer The data storage unit information of the second-level column cluster of the first partition and the data storage unit information of the second-level column cluster of the second partition, wherein the one-level data storage unit information of the Q-level data storage unit information includes the One layer of data storage unit information in the data storage unit information of the layer P 1 and one layer of data storage unit information in the data storage unit information of the P 2 layer, and the data of the Q-1 layer of the data storage unit information of the Q layer information storage means storing one data unit information includes the number of data of one layer of the data storage layer P 1 in the cell information storage unit of the information or data layer P 2 is stored in the cell information Information storing means,. 1 + wherein Q is equal to P 2 -1 P.
可选的,在一些实施例中,该Q层数据存放单元信息中的第0层数据存放单元信息包括该第一分区的二级列簇的数据存放单元信息的第0层数据存放单元信息和该第二分区的二级列簇的数据存放单元信息的第0层数据存放单元信息,该Q层数据存放单元信息中的第2×q-1层数据存放单元信息包括该P 1层数据存放单元信息中P层数据存放单元信息的第q层数据存放单元信息,该Q层数据单元存放信息中的第2×q层数据存放单元信息包括该P 2层数据存放单元信息中P层数据存放单元信息的第q层数据存放单元信息,q=1,……,P-1,其中P的取值为P 1与P 2中的最小值减1。 Optionally, in some embodiments, the layer 0 data storage unit information in the layer Q data storage unit information includes the layer 0 data storage unit information of the data storage unit information of the second-level column clusters of the first partition and The layer 0 data storage unit information of the data storage unit information of the second-level column cluster of the second partition, and the layer 2 × q-1 layer data storage unit information in the data storage unit information of the Q layer includes the data storage of the P 1 layer. The q layer data storage unit information of the P layer data storage unit information in the unit information, and the 2 × q layer data storage unit information in the Q layer data unit storage information includes the P layer data storage in the P 2 layer data storage unit information. The q-level data of the unit information stores unit information, where q = 1,..., P-1, where the value of P is the minimum of P 1 and P 2 minus 1.
可选的,在一些实施例中,该P 1层数据存放单元信息中P层数据存放单元信息可以是该P 1层数据存放单元信息中的第0层至第P-1层数据存放单元信息。类似的,该P 2层数据存放单元信息中P层数据存放单元信息可以是该P 2层数据存放单元信息中的第1层至第P层数据存放单元信息。 Alternatively, in some embodiments, P-layer data of the layer data P 1 in the cell information storage unit storing information may be the first layer of the P 1 0 layer data storage unit to the first information layer P-1 data information storage unit . Similarly, the data storage layer P 2 P layer data unit information storing means that the information may be stored P 2 layer data unit information layer 1 to layer data P stored cell information.
可选的,在一些实施例中,该P 1层数据存放单元信息中P层数据存放单元信息可以是该P 1层数据存放单元信息中的倒数第1层至倒数第P层数据存放单元信息。类似的,该P 2层数据存放单元信息中P层数据存放单元信息可以是该P 2层数据存放单元信息中的倒数第1层至倒数第P层数据存放单元信息。 Alternatively, in some embodiments, layer data of the P 1 P layer data unit information storing means storing information may be a P layer of the data storage unit in the information layer of the reciprocal of the first to the penultimate layer data storage unit of the information P . Similarly, the data storage layer P 2 P layer data unit information storing means that the information may be the data storage layer P 2 in the cell information to the inverse of the first layer P layer penultimate data cell information is stored.
可选的,在一些实施例中,该P 1层数据存放单元信息中P层数据存放单元信息可以是该P 1层数据存放单元信息中的中间P层数据存放单元信息。例如,该P 1层数据存放单元信息中P层数据存放单元信息该P 1层数据存放单元信息的第2层至第P+1层数据存放单元信息。类似的,该P 2层数据存放单元信息中P层数据存放单元信息可以是该P 2层数据存放单元信息中的中间P层数据存放单元信息。例如,该P 2层数据存放单元信息中P层数据存放单元信息该P 2层数据存放单元信息的第2层至第P+1层数据存放单元信息。 Alternatively, in some embodiments, the data storage layer P. 1 P layer data unit information storing means information may be an intermediate layer data of the P P unit. 1 layer data stored in the storage cell information message. For example, the data storage layer P 1 P layer data unit information storing unit of the information layer of the P-2 to P + 1 layer 1 layer data storage means data information storing cell information. Similarly, the data storage layer P 2 P layer data unit information storing means information may be an intermediate layer data of the P 2 P layer data stored in the cell information storage unit information. For example, the layer 2 data storage unit information in the layer 2 data storage unit information and the layer 2 to P + 1 layer data storage unit information in the layer 2 data storage unit information.
以采用LSM-tree算法的KVDB为例,如前所述,数据存放单元信息是SSTable信息,数据存放单元是SSTable。第一分区的清单文件中包括二级列簇的P 1层SSTable信息,该P 1层SSTable信息的索引条目是以非分区键值为前缀的二级索引条目。第二分区的清单文件中包括二级列簇的P 2层SSTable信息,该P 2层SSTable信息的索引条目是以非分区键值为前缀的二级索引条目。目标清单文件中包括二级列簇的Q层SSTable信息,该Q层SSTable信息的索引条目是以非分区键值为前缀的二级索引条目。可选的,在一些实施例中,该目标清单文件可以是该第三分区的清单文件。可选的,在另一些实施例中,该目标清单文件可以用于确定该第三分区的清单文件。 Taking KVDB using the LSM-tree algorithm as an example, as mentioned earlier, the data storage unit information is SSTable information, and the data storage unit is SSTable. A first partition manifest file includes information SSTable layer P 1 column two clusters, the index entry information SSTable layer P 1 is non-partition key value for the two index entries prefix. The second partition manifest file information comprises P 2 layer SSTable two columns cluster, the index entry information SSTable P 2 layer in a non-partition key value for the two index entries prefix. The target manifest file includes the Q-level SSTable information of the second-level column clusters, and the index entries of the Q-level SSTable information are the second-level index entries prefixed by the non-partition key value. Optionally, in some embodiments, the target manifest file may be a manifest file of the third partition. Optionally, in other embodiments, the target manifest file may be used to determine the manifest file of the third partition.
假设P 1与P 2取值均为2。在此情况下Q=2+2-1=3。在此情况下,该第一分区的清单文件中包括二级列簇的2层SSTable信息,分别为第0层SSTable信息至第1层SStable信息。该第二分区的清单文件中包括二级列簇的2层SSTable信息,分别为第0层SSTable信息至第1层SStable信息。该目标清单文件中包括二级列簇的3层SSTable信息,分别为第0层SSTable信息至第2层SStable信息。 Assume that P 1 and P 2 both take the value 2. In this case, Q = 2 + 2-1 = 3. In this case, the manifest file of the first partition includes the 2-level SSTable information of the secondary column cluster, which is the SSTable information of the 0th level to the SStable information of the 1st level, respectively. The list file of the second partition includes the second-level SSTable information of the second-level column cluster, which is the SSTable information of the 0th layer to the SStable information of the first layer. The target list file includes the three-level SSTable information of the second-level column clusters, which are the tier-0 SSTable information to the second-tier SStable information.
该目标清单文件中的第0层SSTable信息包括该第一分区的清单文件中的第0层SSTable信息和该第二分区的清单文件中的第0层SSTable信息。The layer 0 SSTable information in the target manifest file includes the layer 0 SSTable information in the manifest file of the first partition and the layer 0 SSTable information in the manifest file of the second partition.
该目标清单文件中的第1层SSTable信息包括该第一分区的清单文件中的第1层SSTable信息。The layer 1 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the first partition.
该目标清单文件中的第2层SSTable信息包括该第二分区的清单文件中的第1层SSTable信息。The layer 2 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the second partition.
下面结合表3、表4和表5对二级列簇的数据存放单元信息的合并进行描述。The following describes the combination of data storage unit information of the secondary column clusters with reference to Tables 3, 4 and 5.
表3table 3
Figure PCTCN2019097559-appb-000002
Figure PCTCN2019097559-appb-000002
Figure PCTCN2019097559-appb-000003
Figure PCTCN2019097559-appb-000003
表3示出了第一分区的二级列簇的数据存放单元信息包括的两层数据存放单元信息,该两层数据存放单元信息中的每层数据存放单元信息包括2个数据存放单元信息。表3是以采用LSM-tree算法的KVDB为例得到的表。因此,相应的该数据存放单元信息也可以称为SSTable信息。Table 3 shows the two-layer data storage unit information included in the data storage unit information of the second-level column cluster of the first partition. Each layer of data storage unit information in the two-layer data storage unit information includes two data storage unit information. Table 3 is a table obtained by taking KVDB using the LSM-tree algorithm as an example. Therefore, the corresponding data storage unit information can also be called SSTable information.
表4Table 4
Figure PCTCN2019097559-appb-000004
Figure PCTCN2019097559-appb-000004
表4示出了第二分区的二级列簇的数据存放单元信息包括的两层数据存放单元信息,该两层数据存放单元信息中的每层数据存放单元信息包括2个数据存放单元信息。表4是以采用LSM-tree算法的KVDB为例得到的表。因此,相应的该数据存放单元信息也可以称为SSTable信息。Table 4 shows the two-layer data storage unit information included in the data storage unit information of the second-level column cluster of the second partition. Each layer of data storage unit information in the two-layer data storage unit information includes two data storage unit information. Table 4 is a table obtained by taking KVDB using the LSM-tree algorithm as an example. Therefore, the corresponding data storage unit information can also be called SSTable information.
表5table 5
Figure PCTCN2019097559-appb-000005
Figure PCTCN2019097559-appb-000005
Figure PCTCN2019097559-appb-000006
Figure PCTCN2019097559-appb-000006
表5示出了目标二级列簇的数据存放单元信息包括的三层数据存放单元信息,该三层数据存放单元信息中的第0层数据存放单元信息包括4个数据存放单元信息,第1层和第二层数据存放单元信息分别包括2个数据存放单元信息。表5是以采用LSM-tree算法的KVDB为例得到的表。因此,相应的该数据存放单元信息也可以称为SSTable信息。可以看出,表5所示的level 0的SSTable信息包括表3和表4所示的level 0的SSTable信息,表5所示的level 1的SSTable信息包括表3所示的level 1的SSTable信息。表5所示的level 2的SSTable信息包括表4所示的level 2的SSTable信息。换句话说,合并后得到的目标清单文件的level 0的SSTable信息包括第一分区的清单文件中的level 0的SSTable信息与第二分区的清单文件中的level 0的SSTable信息;合并后得到的目标清单文件的level 1的SSTable信息包括第一分区的清单文件中的level 1的SSTable信息;合并后得到的目标清单文件的level 2的SSTable信息包括第二分区的清单文件中的level 1的SSTable信息。Table 5 shows the three-layer data storage unit information included in the data storage unit information of the target second-level column cluster. The 0-level data storage unit information in the three-layer data storage unit information includes 4 data storage unit information. The layer and second layer data storage unit information includes two data storage unit information, respectively. Table 5 is a table obtained by taking KVDB using the LSM-tree algorithm as an example. Therefore, the corresponding data storage unit information can also be called SSTable information. It can be seen that the SSTable information of level 0 shown in Table 5 includes the SSTable information of level 0 shown in Table 3 and Table 4, and the SSTable information of level 1 shown in Table 5 includes the SSTable information of level 1 shown in Table 3. . The SSTable information of level 2 shown in Table 5 includes the SSTable information of level 2 shown in Table 4. In other words, the SSTable information of level 0 in the target manifest file after merging includes the SSTable information of level 0 in the manifest file of the first partition and the SSTable information of level 0 in the manifest file of the second partition; The SSTable information of level 1 in the target manifest file includes the SSTable information of level 1 in the manifest file of the first partition; the SSTable information of level 2 of the target manifest file after the merge includes the SSTable of level 1 in the manifest file of the second partition information.
可选的,在另一些实施例中,该Q层数据存放单元信息中的第2×q-1层数据存放单元信息包括该第二分区的二级列簇的P层数据存放单元信息的第q层数据存放单元信息,该2×P层数据单元存放信息中的第2×q层数据存放单元信息包括该第一分区的二级列簇的P层数据存放单元信息的第q层数据存放单元信息。Optionally, in other embodiments, the 2 × q-1 layer data storage unit information in the Q layer data storage unit information includes the first layer of the P layer data storage unit information in the second-level column cluster of the second partition. q layer data storage unit information. The 2 × q layer data storage unit information in the 2 × P layer data unit storage information includes the q layer data storage of the P layer data storage unit information of the second-level column cluster of the first partition. Unit information.
假设P 1与P 2取值均为2。在此情况下Q=2+2-1=3。在此情况下,该第一分区的清单文件中包括第0层SSTable信息和第1层SStable信息。该第二分区的清单文件中包括第0层SSTable信息和第1层SStable信息。该目标清单文件中包括二级列簇的3层SSTable信息,分别为第0层SSTable信息至第2层SStable信息。 Assume that P 1 and P 2 both take the value 2. In this case, Q = 2 + 2-1 = 3. In this case, the manifest file of the first partition includes layer 0 SSTable information and layer 1 SStable information. The manifest file of the second partition includes layer 0 SSTable information and layer 1 SStable information. The target list file includes the three-level SSTable information of the second-level column clusters, which are the tier-0 SSTable information to the second-tier SStable information.
该目标清单文件中的第0层SSTable信息包括该第一分区的清单文件中的第0层SSTable信息和该第二分区的清单文件中的第0层SSTable信息。The layer 0 SSTable information in the target manifest file includes the layer 0 SSTable information in the manifest file of the first partition and the layer 0 SSTable information in the manifest file of the second partition.
该目标清单文件中的第1层SSTable信息包括该第二分区的清单文件中的第1层SSTable信息。The layer 1 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the second partition.
该目标清单文件中的第2层SSTable文件信息包括该第一分区的清单文件中的第1层SSTable信息。The layer 2 SSTable file information in the target manifest file includes the layer 1 SSTable information in the manifest file of the first partition.
上述实施例中,来自于第一分区的SSTable信息与来自于第二分区的SSTable信息叠加的方式是互相穿插。在另一些实施例中,来自于第一分区的SSTable信息与来自于第二分区的SStable信息也可以是堆叠的。In the above embodiment, the SSTable information from the first partition and the SSTable information from the second partition are superimposed on each other. In other embodiments, the SSTable information from the first partition and the SStable information from the second partition may be stacked.
可选的,在一些实施例中,该Q层数据存放单元信息中的第0层数据存放单元信息包 括该第一分区的的第0层数据存放单元信息和该第二分区的第0层数据存放单元信息,该Q层数据存放单元信息中的第1层数据存放单元信息至第P-1层数据存放单元信息分别为该第一分区的P层数据存放单元信息中的第1层数据存放单元信息至第P-1层数据存放单元信息;该Q层数据存放单元信息中的第P层数据存放单元信息至第Q-1层数据存放单元信息分别为该第二分区的P层数据存放单元信息的中的第1层数据存放单元信息至第P-1层数据存放单元信息。Optionally, in some embodiments, the layer 0 data storage unit information in the layer Q data storage unit information includes the layer 0 data storage unit information of the first partition and the layer 0 data of the second partition Storage unit information. The first layer data storage unit information to the P-1 layer data storage unit information in the Q layer data storage unit information are the first layer data storage in the P layer data storage unit information of the first partition. The unit information to the P-1 layer data storage unit information; the Q layer data storage unit information from the P layer data storage unit information to the Q-1 layer data storage unit information are the P layer data storage of the second partition, respectively Among the unit information, layer 1 data stores unit information to layer P-1 layer data stores unit information.
假设P 1与P 2取值均为4。在此情况下Q=4+4-1=7。在此情况下,该第一分区的清单文件中包括二级列簇的4层SSTable信息,分别为第0层SSTable信息至第3层SStable信息。该第二分区的清单文件中包括二级列簇的4层SSTable信息,分别为第0层SSTable信息至第3层SStable信息。该目标清单文件中包括二级列簇的7层SSTable信息,分别为第0层SSTable信息至第6层SStable信息。 Assume that P 1 and P 2 both take the value 4. In this case Q = 4 + 4-1 = 7. In this case, the manifest file of the first partition includes the 4-level SSTable information of the second-level column cluster, which is the SSTable information of the 0th level to the SStable information of the 3rd level. The manifest file of the second partition includes the 4-level SSTable information of the second-level column clusters, which are the SSTable information of the 0th level to the SStable information of the 3rd level. The target list file includes the 7-level SSTable information of the secondary column cluster, which is the SSTable information of the 0th layer to the SStable information of the 6th layer.
该目标清单文件中的第0层SSTable信息包括该第一分区的清单文件中的第0层SSTable信息和该第二分区的清单文件中的第0层SSTable信息。The layer 0 SSTable information in the target manifest file includes the layer 0 SSTable information in the manifest file of the first partition and the layer 0 SSTable information in the manifest file of the second partition.
该目标清单文件中的第1层SSTable信息包括该第一分区的清单文件中的第1层SSTable信息。The layer 1 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the first partition.
该目标清单文件中的第2层SSTable信息包括该第一分区的清单文件中的第2层SSTable信息。The layer 2 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the first partition.
该目标清单文件中的第3层SSTable信息包括该第一分区的清单文件中的第3层SSTable信息。The layer 3 SSTable information in the target manifest file includes the layer 3 SSTable information in the manifest file of the first partition.
该目标清单文件中的第4层SSTable信息包括该第二分区的清单文件中的第1层SSTable信息。The layer 4 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the second partition.
该目标清单文件中的第5层SSTable信息包括该第二分区的清单文件中的第2层SSTable信息。The layer 5 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the second partition.
该目标清单文件中的第6层SSTable信息包括该第二分区的清单文件中的第3层SSTable信息。The layer 6 SSTable information in the target manifest file includes the layer 3 SSTable information in the manifest file of the second partition.
可选的,在另一些实施例中,该Q层数据存放单元信息中的第0层数据存放单元信息包括该第一分区的第0层数据存放单元信息和该第二分区的第0层数据存放单元信息,该Q层数据存放单元信息中的第1层数据存放单元信息至第P-1层数据存放单元信息分别为该第二分区的P层数据存放单元信息的中的第1层数据存放单元信息至第P-1层数据存放单元信息;该Q层数据存放单元信息中的第P层数据存放单元信息至第Q-1层数据存放单元信息分别为该第一分区的P层数据存放单元信息的中的第1层数据存放单元信息至第P-1层数据存放单元信息。Optionally, in other embodiments, the layer 0 data storage unit information in the layer Q data storage unit information includes the layer 0 data storage unit information of the first partition and the layer 0 data of the second partition. Storage unit information, the first layer data storage unit information to the P-1 layer data storage unit information in the Q layer data storage unit information are the first layer data in the second layer P layer data storage unit information The storage unit information to the P-1 layer data storage unit information; the P-layer data storage unit information to the Q-1 layer data storage unit information in the Q-layer data storage unit information are the P-layer data of the first partition, respectively Among the storage unit information, layer 1 data storage unit information to layer P-1 layer data storage unit information.
假设P 1与P 2取值均为4。在此情况下Q=4+4-1=7。在此情况下,该第一分区的清单文件中包括第0层SSTable信息至第3层SStable信息。该第二分区的清单文件中包括第0层SSTable信息至第3层SStable信息。该目标清单文件中包括二级列簇的7层SSTable信息,分别为第0层SSTable信息至第6层SStable信息。 Assume that P 1 and P 2 both take the value 4. In this case Q = 4 + 4-1 = 7. In this case, the manifest file of the first partition includes layer 0 SSTable information to layer 3 SStable information. The manifest file of the second partition includes layer 0 SSTable information to layer 3 SStable information. The target list file includes the 7-level SSTable information of the secondary column cluster, which is the SSTable information of the 0th layer to the SStable information of the 6th layer.
该目标清单文件中的第0层SSTable信息包括该第一分区的清单文件中的第0层SSTable信息和该第二分区的清单文件中的第0层SSTable信息。The layer 0 SSTable information in the target manifest file includes the layer 0 SSTable information in the manifest file of the first partition and the layer 0 SSTable information in the manifest file of the second partition.
该目标清单文件中的第1层SSTable信息包括该第二分区的清单文件中的第1层SSTable信息。The layer 1 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the second partition.
该目标清单文件中的第2层SSTable信息包括该第二分区的清单文件中的第2层SSTable信息。The layer 2 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the second partition.
该目标清单文件中的第3层SSTable信息包括该第二分区的清单文件中的第3层SSTable信息。The layer 3 SSTable information in the target manifest file includes the layer 3 SSTable information in the manifest file of the second partition.
该目标清单文件中的第4层SSTable信息包括该第一分区的清单文件中的第1层SSTable信息。The layer 4 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the first partition.
该目标清单文件中的第5层SSTable信息包括该第一分区的清单文件中的第2层SSTable信息。The layer 5 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the first partition.
该目标清单文件中的第6层SSTable信息包括该第一分区的清单文件中的第3层SSTable信息。The layer 6 SSTable information in the target manifest file includes layer 3 SSTable information in the manifest file of the first partition.
可以看出,相邻分区的二级列簇的数据存放单元信息合并过程中第0层数据存放单元信息的合并方式是直接将两个分区的二级列簇的数据存放单元信息中第0层数据存放单元信息合并为同一层二级列簇的数据存放单元信息。换句话说,合并方式是直接将第二分区中的第0层数据存放单元信息直接追加到第一分区中第0层的数据存放单元信息。这种合并方式可以称为追加合并。而相邻分区的二级列簇的数据存放单元信息合并过程中除第0层数据存放单元信息以外的数据存放单元信息的合并方式是以叠加的方式合并的,以下可以将这种合并方式成为叠加合并。It can be seen that in the process of merging the data storage unit information of the secondary column clusters of adjacent partitions, the tier 0 data storage unit information is merged by directly merging the data storage unit information of the two-level column clusters of the two partitions in the 0th layer. The data storage unit information is merged into the data storage unit information of the same second-level column cluster. In other words, the merging method is to directly add the layer 0 data storage unit information in the second partition directly to the layer 0 data storage unit information in the first partition. This merge method can be called append merge. In the process of merging the data storage unit information of the secondary column clusters of adjacent partitions, the data storage unit information other than the layer 0 data storage unit information is merged in a superimposed manner. The following merge method can be Overlay merge.
上述例子中,P 1与P 2取值相同。在一些情况下,P 1与P 2取值可以是不同的。在此情况下,该Q层数据存放单元中的2×P层数据存放单元信息可以包括该P 1层数据存放单元信息中的P层数据存放单元信息以及该P 2层数据存放单元信息中的P层数据存放单元信息。该Q层数据存放单元信息的Q-2×P层数据存放单元信息可以包括P’层数据存放单元信息,其中P’的取值为max(P 1,P 2)-min(P 1,P 2),其中max(P 1,P 2)表示P 1与P 2中的最大值,min(P 1,P 2)表示P 1与P 2中的最小值。换句话说,P’等于P 1与P 2中的最大值减去P 1与P 2中的最小值。或者P’的取值为|P 1-P 2|,即P’的取值为P 1与P 2之差的绝对值。 In the above example, P 1 and P 2 have the same value. In some cases, the values of P 1 and P 2 may be different. In this case, the Q layer data storage unit 2 × P layer data storage unit information may include the P. 1 layer data store P layer data unit information storing cell information and P 2 layer data stored in the cell information P-layer data stores unit information. The Q-2 × P layer data storage unit information of the Q layer data storage unit information may include P ′ layer data storage unit information, where the value of P ′ is max (P 1 , P 2 ) -min (P 1 , P 2 ), where max (P 1 , P 2 ) represents the maximum value of P 1 and P 2 , and min (P 1 , P 2 ) represents the minimum value of P 1 and P 2 . In other words, P 'is equal to 1 and the maximum value of P 2 P 1 and subtracting the minimum value P of the P 2. Or the value of P 'is | P 1 -P 2 |, that is, the value of P' is the absolute value of the difference between P 1 and P 2 .
例如,假设P 1=5,P 2=3,在此情况下Q=5+3-1=7。在此情况下,该第一分区的清单文件中包括二级列簇的5层SSTable信息,分别为第0层SSTable信息至第4层SStable信息。该第二分区的清单文件中包括二级列簇的3层SSTable信息,分别为第0层SSTable信息至第2层SStable信息。该目标清单文件中包括二级列簇的7层SSTable信息,分别为第0层SSTable信息至第6层SStable信息。 For example, suppose P 1 = 5 and P 2 = 3, in which case Q = 5 + 3-1 = 7. In this case, the manifest file of the first partition includes the 5-level SSTable information of the second-level column cluster, which is the SSTable information of the 0th level to the SStable information of the 4th level. The list file of the second partition includes the three-level SSTable information of the second-level column clusters, which are the tier-0 SSTable information to the second-tier SStable information, respectively. The target list file includes the 7-level SSTable information of the secondary column cluster, which is the SSTable information of the 0th layer to the SStable information of the 6th layer.
该目标清单文件中的第0层SSTable文件信息包括该第一分区的清单文件中的第0层SSTable信息和该第二分区的清单文件中的第0层SSTable信息。The layer 0 SSTable file information in the target manifest file includes the layer 0 SSTable information in the manifest file of the first partition and the layer 0 SSTable information in the manifest file of the second partition.
该目标清单文件中的第1层SSTable信息包括该第一分区的清单文件中的第1层SSTable信息。The layer 1 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the first partition.
该目标清单文件中的第2层SSTable信息包括该第二分区的清单文件中的第1层SSTable信息。The layer 2 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the second partition.
该目标清单文件中的第3层SSTable信息包括该第一分区的清单文件中的第2层 SSTable信息。The layer 3 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the first partition.
该目标清单文件中的第4层SSTable信息包括该第二分区的清单文件中的第2层SSTable信息。The layer 4 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the second partition.
该目标清单文件中的第5层SSTable信息包括该第一分区的清单文件中的第3层SSTable信息。The layer 5 SSTable information in the target manifest file includes layer 3 SSTable information in the manifest file of the first partition.
该目标清单文件中的第6层SSTable信息包括该第一分区的清单文件中的第4层SSTable信息。The layer 6 SSTable information in the target manifest file includes the layer 4 SSTable information in the manifest file of the first partition.
换句话说,该P’层数据存放单元信息可以是该Q层数据存放单元信息中的最后P’层数据存放单元信息。In other words, the P 'layer data storage unit information may be the last P' layer data storage unit information in the Q layer data storage unit information.
可选的,在另一些实施例中,该P’层数据存放单元信息也可以是该Q层数据存放单元信息中的前P’层数据存放单元信息。Optionally, in other embodiments, the P'-layer data storage unit information may also be the pre-P'-layer data storage unit information in the Q-layer data storage unit information.
例如,假设P 1=5,P 2=3,在此情况下Q=5+3-1=7。在此情况下,该第一分区的清单文件中包括二级列簇的5层SSTable信息,分别为第0层SSTable信息至第4层SStable信息。该第二分区的清单文件中包括二级列簇的3层SSTable信息,分别为第0层SSTable信息至第2层SStable信息。该目标清单文件中包括二级列簇的7层SSTable信息,分别为第0层SSTable信息至第6层SStable信息。 For example, suppose P 1 = 5 and P 2 = 3, in which case Q = 5 + 3-1 = 7. In this case, the manifest file of the first partition includes the 5-level SSTable information of the second-level column cluster, which is the SSTable information of the 0th level to the SStable information of the 4th level. The list file of the second partition includes the three-level SSTable information of the second-level column clusters, which are the tier-0 SSTable information to the second-tier SStable information, respectively. The target list file includes the 7-level SSTable information of the secondary column cluster, which is the SSTable information of the 0th layer to the SStable information of the 6th layer.
该目标清单文件中的第0层SSTable信息包括该第一分区的清单文件中的第0层SSTable信息和该第二分区的清单文件中的第0层SSTable信息。The layer 0 SSTable information in the target manifest file includes the layer 0 SSTable information in the manifest file of the first partition and the layer 0 SSTable information in the manifest file of the second partition.
该目标清单文件中的第1层SSTable信息包括该第一分区的清单文件中的第1层SSTable信息。The layer 1 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the first partition.
该目标清单文件中的第2层SSTable信息包括该第一分区的清单文件中的第2层SSTable信息。The layer 2 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the first partition.
该目标清单文件中的第3层SSTable信息包括该第一分区的清单文件中的第3层SSTable信息。The layer 3 SSTable information in the target manifest file includes the layer 3 SSTable information in the manifest file of the first partition.
该目标清单文件中的第4层SSTable信息包括该第二分区的清单文件中的第1层SSTable信息。The layer 4 SSTable information in the target manifest file includes the layer 1 SSTable information in the manifest file of the second partition.
该目标清单文件中的第5层SSTable信息包括该第一分区的清单文件中的第4层SSTable信息。The layer 5 SSTable information in the target manifest file includes the layer 4 SSTable information in the manifest file of the first partition.
该目标清单文件中的第6层SSTable信息包括该第二分区的清单文件中的第2层SSTable信息。The layer 6 SSTable information in the target manifest file includes the layer 2 SSTable information in the manifest file of the second partition.
该第一分区的元数据还包括第一分区的主列簇的数据存放单元信息。该第二分区的元数据包括第二分区的主列簇的数据存放单元信息。The metadata of the first partition further includes data storage unit information of a main column cluster of the first partition. The metadata of the second partition includes data storage unit information of a main column cluster of the second partition.
数据库服务器1可以合并该第一分区的主列簇的数据存放单元信息与该第二分区的主列簇的数据存放单元信息生成第三分区的主列簇的数据存放单元信息。The database server 1 may combine data storage unit information of the main column cluster of the first partition with data storage unit information of the main column cluster of the second partition to generate data storage unit information of the main column cluster of the third partition.
该第一分区的主列簇的数据存放单元信息包括K 1层数据存放单元信息,其中K 1为大于或等于1的正整数。该第二分区的主列簇的数据存放单元信息包括K 2层数据存放单元信息,其中K 2为大于或等于1的正整数。该第三分区的元数据包括第三分区的主列簇的数据存放单元信息,该第三分区的主列簇的数据存放单元信息包括K层数据存放单元信息,其 中该第三分区的主列簇的数据存放单元信息的第k层数据存放单元信息包括该K 1层数据存放单元信息的K层数据存放单元信息中的第k层数据存放单元信息和该K 2层数据存放单元信息的K层数据存放单元信息中的第k层数据存放单元信息,其中K为K 1与K 2的最小值,其中该K 1层数据存放单元信息的第k层数据存放单元信息中的任一个数据存放单元信息的条目键值和该K 2层数据存放单元信息的第k层数据存放单元信息中的任一个数据存放单元信息的条目键值不重叠。 The first column of the primary partition cluster data storage means K 1 information includes cell information storage layer data, wherein K 1 is greater than or equal to a positive integer. The data storage unit information of the main column cluster of the second partition includes K 2 layer data storage unit information, where K 2 is a positive integer greater than or equal to 1. The metadata of the third partition includes data storage unit information of the main column cluster of the third partition, and the data storage unit information of the main column cluster of the third partition includes K-layer data storage unit information, where the main column of the third partition The k-th data storage unit information of the data storage unit information of the cluster includes the k-th data storage unit information in the K-level data storage unit information of the K 1 -level data storage unit information and the K of the K 2- level data storage unit information. The k-th data storage unit information in the layer data storage unit information, where K is the minimum value of K 1 and K 2 , and any one of the k-th data storage unit information in the K 1- level data storage unit information is stored. The entry key value of the unit information does not overlap with the entry key value of any data storage unit information in the k-th data storage unit information of the K 2 layer data storage unit information.
M k1表示第一分区的主列簇的数据存放单元信息的K层数据存放单元信息中的第k层数据存放单元信息所包括的数据存放单元信息数目。例如,M 01=2表示第一分区的主列簇的数据存放单元信息的level 0包括2个数据存放单元信息。M k2表示第二分区的主列簇的数据存放单元信息的K层数据存放单元信息中的第k层数据存放单元信息所包括的数据存放单元信息数目。例如,M 02=2表示第二分区的主列簇的数据存放单元信息的level 0包括2个数据存放单元信息。第三分区的主列簇的数据存放单元信息中的第k层数据存放单元信息所包括的数据存放单元信息数目是M k1与M k2的和。例如,假设M 01=2且M 02=2,则第三分区的主列簇的数据存放单元信息中的level 0中包括4个数据存放单元信息。 M k1 represents the number of data storage unit information included in the k-th data storage unit information in the K-layer data storage unit information in the data storage unit information of the main column cluster of the first partition. For example, M 01 = 2 indicates that level 0 of the data storage unit information of the main column cluster of the first partition includes 2 data storage unit information. M k2 indicates the number of data storage unit information included in the k-th data storage unit information in the K-layer data storage unit information in the data storage unit information of the main column cluster of the second partition. For example, M 02 = 2 indicates that level 0 of the data storage unit information of the main column cluster of the second partition includes 2 data storage unit information. The number of data storage unit information included in the k-th data storage unit information in the data storage unit information of the main column cluster of the third partition is the sum of M k1 and M k2 . For example, assuming M 01 = 2 and M 02 = 2, level 0 in the data storage unit information of the main column cluster of the third partition includes 4 data storage unit information.
以采用LSM-tree算法的KVDB为例,清单文件中保存的SSTable信息是对应于K层SSTable信息。下面结合表6、表7和表8对主列簇的数据存放单元信息的合并进行描述。Taking KVDB using the LSM-tree algorithm as an example, the SSTable information stored in the manifest file corresponds to the K-layer SSTable information. The following describes the combination of the data storage unit information of the main column cluster with reference to Table 6, Table 7, and Table 8.
表6Table 6
Figure PCTCN2019097559-appb-000007
Figure PCTCN2019097559-appb-000007
表6示出了第一分区的主列簇的数据存放单元信息包括两层数据存放单元信息,该两层数据存放单元信息中的每层数据存放单元信息包括2个数据存放单元信息。表6是以采用LSM-tree算法的KVDB为例得到的表。因此,相应的该数据存放单元信息也可以称为SSTable信息。Table 6 shows that the data storage unit information of the main column cluster of the first partition includes two layers of data storage unit information, and each layer of data storage unit information in the two-layer data storage unit information includes 2 data storage unit information. Table 6 is a table obtained by taking KVDB using the LSM-tree algorithm as an example. Therefore, the corresponding data storage unit information can also be called SSTable information.
表7Table 7
Figure PCTCN2019097559-appb-000008
Figure PCTCN2019097559-appb-000008
Figure PCTCN2019097559-appb-000009
Figure PCTCN2019097559-appb-000009
表7示出了第二分区的主列簇的数据存放单元信息包括两层数据存放单元信息,该两层数据存放单元信息中的每层数据存放单元信息包括2个数据存放单元信息。表7是以采用LSM-tree算法的KVDB为例得到的表。因此,相应的该数据存放单元信息也可以称为SSTable信息。Table 7 shows that the data storage unit information of the main column cluster of the second partition includes two layers of data storage unit information, and each layer of data storage unit information in the two-layer data storage unit information includes 2 data storage unit information. Table 7 is a table obtained by taking KVDB using the LSM-tree algorithm as an example. Therefore, the corresponding data storage unit information can also be called SSTable information.
表8Table 8
Figure PCTCN2019097559-appb-000010
Figure PCTCN2019097559-appb-000010
表8示出了第三分区的主列簇的数据存放单元信息包括两层数据存放单元信息,该两层数据存放单元信息中的每层数据存放单元信息包括4个数据存放单元信息。表8是以采用LSM-tree算法的KVDB为例得到的表。因此,相应的该数据存放单元信息也可以称为SSTable信息。可以看出,表8所示的level 0的SSTable信息包括表6和表7所示的level 0的SSTable信息,表8所示的level 1的SSTable信息包括表6和表7所示的level 1的SSTable信息。Table 8 shows that the data storage unit information of the main column cluster of the third partition includes two layers of data storage unit information, and each layer of data storage unit information in the two-layer data storage unit information includes 4 data storage unit information. Table 8 is a table obtained by taking KVDB using the LSM-tree algorithm as an example. Therefore, the corresponding data storage unit information can also be called SSTable information. It can be seen that the SSTable information of level 0 shown in Table 8 includes the SSTable information of level 0 shown in Table 6 and Table 7, and the SSTable information of level 1 shown in Table 8 includes level 1 shown in Table 6 and Table 7. SSTable information.
第一分区的主列簇的数据存放单元信息与第二分区的主列簇的数据存放单元信息都是 以分区键值前缀的有序序列。由于分区是基于范围分区划分的,因此两个分区的主列簇存放单元信息中层与层之间的所有键值都没有范围上的重叠。因此可以直接将第二分区每个层的数据存放单元信息直接追加到第一分区的同层数据存放单元信息,从而形成一个分区的主列簇的数据存放单元信息。为便于描述,以下称这种一个分区中的一层数据存放单元信息直接追加到另一个分区中同层的数据存放单元信息的合并方式称为追加合并。The data storage unit information of the main column cluster of the first partition and the data storage unit information of the main column cluster of the second partition are both an ordered sequence prefixed by the partition key. Because the partition is divided based on the range partition, the main column clusters of the two partitions store all the key values between the layers in the unit information without overlapping the ranges. Therefore, the data storage unit information of each layer of the second partition can be directly added to the data storage unit information of the same layer of the first partition, thereby forming the data storage unit information of the main column cluster of a partition. For the convenience of description, the following method of merging information of a layer of data storage unit in one partition directly to data storage unit information of the same layer in another partition is called append merge.
与第二列簇的数据存放单元信息类似,K 1的取值与K 2的取值也可以不同。在K 1的取值与K 2的取值不同的情况下,该第三分区包括的数据存放单元信息除K层数据存放单元信息外,还可以包括K’层数据存放单元,其中K’的取值为max(K 1,K 2)-min(K 1,K 2),其中max(K 1,K 2)表示K 1与K 2中的最大值,min(K 1,K 2)表示K 1与K 2中的最小值。换句话说,K’等于K 1与K 2中的最大值减去K 1与K 2中的最小值。或者K’的取值为|K 1-K 2|,即K’的取值为K 1与K 2之差的绝对值。若K 1大于K 2,则该K’层数据存放单元为该第一分区的K 1层数据存放单元信息的最后K’层数据存放单元信息。若K 2大于K 1,则该K’层数据存放单元为该第一分区的K 2层数据存放单元信息的最后K’层数据存放单元信息。 Similar to the data storage unit information of the second column cluster, the value of K 1 and the value of K 2 may be different. In the K values ranging from 1 and K 2 are different, the third data storage unit includes a partition information storage unit in addition to K information data outer layer may further include K 'layer data storage unit, wherein K' is The value is max (K 1 , K 2 ) -min (K 1 , K 2 ), where max (K 1 , K 2 ) represents the maximum value of K 1 and K 2 , and min (K 1 , K 2 ) represents The minimum of K 1 and K 2 . In other words, K 'equals K 1 subtracting the minimum value of K 1 and K 2 and the maximum value of K 2. Or the value of K 'is | K 1 -K 2 |, that is, the value of K' is the absolute value of the difference between K 1 and K 2 . If K 1 is greater than K 2 , the K ′ layer data storage unit is the last K ′ layer data storage unit information of the K 1 layer data storage unit information of the first partition. If K 2 is greater than K 1 , the K ′ layer data storage unit is the last K ′ layer data storage unit information of the K 2 layer data storage unit information of the first partition.
例如,假设K 1=3,K 2=2,在此情况下K=2,K’=1。在此情况下,该第一分区的清单文件中包括主列簇的3层SSTable信息,分别为第0层SSTable信息至第2层SStable信息。该第二分区的清单文件中包括主列簇的2层SSTable信息,分别为第0层SSTable信息和第1层SStable信息。该目标清单文件中包括主列簇的3层SSTable信息,分别为第0层SSTable信息至第2层SStable信息。 For example, suppose K 1 = 3 and K 2 = 2, in which case K = 2 and K ′ = 1. In this case, the manifest file of the first partition includes the three-layer SSTable information of the main column cluster, which are the tier-0 SSTable information to the tier-2 SStable information, respectively. The manifest file of the second partition includes the two-layer SSTable information of the main column cluster, which are the tier-0 SSTable information and the tier-1 SStable information, respectively. The target list file includes 3 layers of SSTable information of the main column cluster, which are layer 0 SSTable information to layer 2 SStable information.
该目标清单文件中的第0层SSTable信息包括该第一分区的清单文件中主列簇的第0层SSTable信息和该第二分区的清单文件中主列簇的第0层SSTable信息。The level 0 SSTable information in the target manifest file includes the level 0 SSTable information of the main column cluster in the manifest file of the first partition and the level 0 SSTable information of the main column cluster in the manifest file of the second partition.
该目标清单文件中的第1层SSTable信息包括该第一分区的清单文件中主列簇的第1层SSTable信息和该第二分区的清单文件中主列簇的第1层SSTable信息。The first-level SSTable information in the target manifest file includes the first-level SSTable information of the main column cluster in the first partition's manifest file and the first-level SSTable information of the main column cluster in the second partition's manifest file.
该目标清单文件中的第2层SSTable信息包括该第一分区的清单文件中主列簇的第2层SSTable信息。The layer 2 SSTable information in the target manifest file includes the layer 2 SSTable information of the main column cluster in the manifest file of the first partition.
来自于同一个分区的主/二级列簇的数据存放单元信息在合并后的主/二级列簇的数据存放单元信息内的先后顺序也不发生变化。The order of the data storage unit information of the primary / secondary column clusters from the same partition in the combined data storage unit information of the primary / secondary column clusters does not change.
以主列簇的数据存放单元信息为例。如表6、表7和表8所示,SSTable f1.1.1在合并前位于SSTable f1.1.2之前。在合并之后,SSTable f1.1.1依然位于SSTable f1.1.2之前。Take the data storage unit information of the main column cluster as an example. As shown in Tables 6, 7, and 8, SSTable f1.1.1 precedes SSTable f1.1.2 before merging. After the merger, SSTable f1.1.1 is still before SSTable f1.1.2.
如表8所示,第三分区的主列簇的数据存放单元信息中,来自于第一分区的主列簇的数据存放单元信息的level 1的SSTable与第二分区的主列簇的数据存放单元信息的level 1的SSTable的先后顺序是:SSTable f1.1.1、SSTable f1.1.2、SSTable f2.1.1、SSTable f2.1.2。在一些实施例中,只要同一分区的主列簇的数据存放单元信息的先后顺序不发生变化就可以。换句话说,合并后的来自于另一分区的主列簇的数据存放单元信息可以位于该同一分区的主列簇的数据存放单元信息之前。例如,第三分区的主列簇的数据存放单元信息还可以如表9所示。As shown in Table 8, in the data storage unit information of the main column cluster of the third partition, the SSTable at level 1 of the data storage unit information of the main column cluster of the first partition and the data storage of the main column cluster of the second partition The order of the SSTable of the level 1 of the unit information is: SSTable f1.1.1, SSTable f1.1.2, SSTable f2.1.1, SSTable f2.1.2. In some embodiments, as long as the sequence of data storage unit information of the main column cluster of the same partition does not change. In other words, the merged data storage unit information from the main column cluster of another partition may be located before the data storage unit information of the main column cluster of the same partition. For example, the data storage unit information of the main column cluster of the third partition may also be shown in Table 9.
表9Table 9
Figure PCTCN2019097559-appb-000011
Figure PCTCN2019097559-appb-000011
Figure PCTCN2019097559-appb-000012
Figure PCTCN2019097559-appb-000012
如表9所示,第三分区的主列簇的数据存放单元信息中,来自于第一分区的主列簇的数据存放单元信息level 1的SSTable与第二分区的主列簇的数据存放单元信息的level 1的SSTable的先后顺序是:SSTable f1.1.1、SSTable f2.1.1、SSTable f1.1.2、SSTable f2.1.2。可以看出,虽然SSTable f2.1.1位于SSTable f1.1.1与SSTable f1.1.2之间,但是SSTable f1.1.1依然位于SSTable f1.1.2之前。As shown in Table 9, in the data storage unit information of the main column cluster of the third partition, the data storage unit information of the main column cluster of the first partition from level 1 SSTable and the data storage unit of the main column cluster of the second partition The order of the SSTable for the level of information is: SSTable f1.1.1, SSTable f2.1.1, SSTable f1.1.2, SSTable f2.1.2. It can be seen that although SSTable f2.1.1 is located between SSTable f1.1.1 and SSTable f1.1.2, SSTable f1.1.1 is still before SSTable f1.1.2.
二级列簇的数据存放单元信息的先后顺序与主列簇的数据存放单元信息的先后顺序是类似的,在此就不必赘述。The sequence of the data storage unit information of the secondary column cluster is similar to the sequence of the data storage unit information of the main column cluster, and it is unnecessary to repeat them here.
该第一分区的元数据还包括第一分区的预写日志信息集合。该第二分区的元数据包括第二分区的预写日志信息集合。The metadata of the first partition further includes a set of write-ahead log information of the first partition. The metadata of the second partition includes a set of write-ahead log information of the second partition.
数据库服务器1可以合并该第一分区的预写日志信息集合与该第二分区的预写日志信息集合生成第三分区的预写日志信息集合。The database server 1 may combine the write-ahead log information set of the first partition and the write-ahead log information set of the second partition to generate a write-ahead log information set of the third partition.
该第三分区的预写日志信息集合包括N个预写日志信息,该第一分区的预写日志信息集合包括该N个预写日志信息中的N 1个预写日志信息,该第二分区的预写日志信息集合包括该N个文件指示信息中的N 2个预写日志信息,其中N为大于或等于2的正整数,N 1和N 2为大于或等于1的正整数且N 1与N 2的和为N。 The third pre-write log partition information set includes N write-ahead log information, the first pre-write log partition information set includes the N 1 write-ahead log information of the N write-ahead log information, the second partition The set of write-ahead log information includes N 2 pieces of write-ahead log information in the N file instructions, where N is a positive integer greater than or equal to 2, N 1 and N 2 are positive integers greater than or equal to 1 and N 1 The sum with N 2 is N.
换句话说,上述实施例中,数据库服务器1仅将第一分区的元数据中的第一预写日志信息集合所包括的预写日志信息与第二分区的元数据的第二预写日志信息集合所包括的预写日志信息合并为第三元数据信息内的第三预写日志信息集合。数据库服务器1并未将第一预写日志信息集合中的N 1个预写日志信息所指示的N 1个预写日志从该第一分区中读取,并将该N 1个预写日志写入合并后的分区中。换句话说,在分区合并过程中仅需要对元数据信息内的预写日志信息进行合并,而无需对预写日志信息所指示的预写日志进行读写操作。 预写日志信息的可以包括预写日志的标识以及预写日志的时间序列号。因此,该预写日志信息的大小通常是在KB数量级范围内的。而预写日志的大小通常是在MB数量级范围内。因此,相对于对预写日志的读写,对预写日志信息的读写可以减少读写数据量,从而减少分布式数据库系统开销。 In other words, in the above embodiment, the database server 1 only compares the pre-write log information included in the first pre-write log information set in the metadata of the first partition with the second pre-write log information of the second partition metadata. The write-ahead log information included in the set is combined into a third write-ahead log information set within the third metadata information. The database server 1 does not read the N 1 write-ahead logs indicated by the N 1 write-ahead log information in the first write-ahead log information set from the first partition, and writes the N 1 write-ahead logs Into the merged partition. In other words, during the partition merging process, only the write-ahead log information in the metadata information needs to be merged, and there is no need to perform read and write operations on the write-ahead log indicated by the write-ahead log information. The write-ahead log information may include an identifier of the write-ahead log and a time serial number of the write-ahead log. Therefore, the size of the write-ahead log information is usually in the order of KB. The size of the write-ahead log is usually in the order of MB. Therefore, compared to reading and writing the write-ahead log, reading and writing the read-ahead log information can reduce the amount of read and write data, thereby reducing the overhead of the distributed database system.
还以采用LSM-tree算法的KVDB为例,预写日志信息可以是清单文件中保存的WAL文件信息。如上所述,WAL文件信息包括WAL文件的标识。可选的,还可以包含该WAL文件的时间序列号。以WAL文件信息包括WAL文件的标识以及该WAL文件的时间序列号为例,该第一分区的预写日志信息集合可以包括N 1个WAL文件的标识以及该N 1个WAL文件中的每个WAL文件的时间序列号,该第二分区的预写日志信息集合可以包括N 2个WAL文件的标识以及该N 2个WAL文件中的每个WAL文件的时间序列号。该第三分区的预写日志信息集合包括N 1个WAL文件的标识以及该N 1个WAL文件中的每个WAL文件的时间序列号,该第三分区的预写日志信息集合还可以包括该N 2个WAL文件的标识以及该N 2个WAL文件中的每个WAL文件的时间序列号。 Taking KVDB using the LSM-tree algorithm as an example, the write-ahead log information can be the WAL file information stored in the manifest file. As described above, the WAL file information includes the identification of the WAL file. Optionally, the time serial number of the WAL file may also be included. WAL file to a serial number identification information includes time and WAL WAL document files as an example, the first pre-write log partition information set may comprise identifying each of the N 1 WAL files and the N 1 WAL file time WAL document sequence number, write-ahead log information of the second set may include a time partition identifier SEQ ID N 2 th WAL WAL files for each file, and the N 2 th WAL file. Write-ahead log information of the third partition comprises a set of time series of numbers identifying the N 1 WAL WAL files for each file, and the files in the N 1 WAL, and the third partition of the pre-write log information set may further include the time sequence number identifies the file and for each WAL WAL th N 2 N 2 th file WAL file.
在一些实施例中,第一预写日志信息和第二预写日志信息在该第一预写日志信息集合中的先后顺序与该第一预写日志信息和该第二预写日志信息在该第三预写日志信息集合中的先后顺序相同。该第一预写日志信息和该第二预写日志信息是该第一分区的预写日志信息集合包括的N 1个预写日志中的任意两个预写日志信息。换句话说,在该第一分区的预写日志信息集合中,若该第一预写日志信息在该第二预写日志信息之前,则在该第三分区的预写日志信息集合中,该第一预写日志信息仍然在该第二预写日志信息之前。 In some embodiments, the order of the first write-ahead log information and the second write-ahead log information in the first write-ahead log information set is the same as the order of the first write-ahead log information and the second write-ahead log information. The sequence in the third write-ahead log information set is the same. The first write-ahead log information and the second write-ahead log information are any two write-ahead log information of N 1 write-ahead logs included in the write-ahead log information set of the first partition. In other words, in the write-ahead log information set of the first partition, if the first write-ahead log information precedes the second write-ahead log information, then in the write-ahead log information set of the third partition, the The first write-ahead log information is still before the second write-ahead log information.
类似的,第三预写日志信息和第四预写日志信息在该第二分区的预写日志信息集合中的先后顺序与该第三预写日志信息和该第四预写日志信息在该第三分区的预写日志信息集合中的先后顺序相同。该第三预写日志信息和该第四预写日志信息是该第二分区的预写日志信息集合包括的N 2个预写日志中的任意两个预写日志信息。换句话说,在该第二分区的预写日志信息集合中,若该第三预写日志信息在该第四预写日志信息之前,则在该第三分区的预写日志信息集合中,该第三预写日志信息仍然在该第四预写日志信息之前。 Similarly, the order of the third write-ahead log information and the fourth write-ahead log information in the write-ahead log information set of the second partition is the same as that of the third write-ahead log information and the fourth write-ahead log information in the first The sequence of the three-part write-ahead log information set is the same. The third write-ahead log information and the fourth write-ahead log information are any two write-ahead log information of the N 2 write-ahead logs included in the write-ahead log information set of the second partition. In other words, in the write-ahead log information set of the second partition, if the third write-ahead log information precedes the fourth write-ahead log information, then in the write-ahead log information set of the third partition, the The third write-ahead log information is still before the fourth write-ahead log information.
表10示出了两个清单文件中的WAL文件信息的合并示意。Table 10 shows the combination of WAL file information in the two manifest files.
表10Table 10
第一分区First partition 第二分区Second partition 第三分区Third partition
f1.w.9,f1.w.8f1.w.9, f1.w.8 f2.w.11,f2.w.7f2.w.11, f2.w.7 f1.w.9,f2.w.11,f1.w.8,f2.w.7f1.w.9, f2.w.11, f1.w.8, f2.w.7
表10中以WAL文件的标识代表WAL文件信息,WAL文件的标识的先后顺序标识WAL文件的时间序列号。表10中所示的WAL文件的标识的排序是根据WAL文件的时间序列号排列的,右侧的WAL文件的标识的时间序列号低于左侧WAL文件的标识的时间序列号。In Table 10, the identification of the WAL file represents the WAL file information, and the sequence of the identification of the WAL file identifies the time serial number of the WAL file. The order of the identification of the WAL file shown in Table 10 is arranged according to the time serial number of the WAL file. The time serial number of the WAL file on the right is lower than the time serial number of the WAL file on the left.
如表10所示,第一分区的清单文件1中包括两个WAL文件信息,该两个WAL文件的标识分别为:f1.w.8和f1.w.9,且f1.w.8的时间序列号低于f1.w.9的时间序列号。第二分区的清单文件2包括两个WAL文件信息,该两个WAL文件的标识分别为:f2.w.7和f1.w.11,且f2.w.7的时间序列号低于f1.w.11的时间序列号。As shown in Table 10, the manifest file 1 of the first partition includes two WAL file information, and the identifiers of the two WAL files are: f1.w.8 and f1.w.9, and f1.w.8's The time serial number is lower than the time serial number of f1.w.9. The manifest file 2 of the second partition includes two WAL file information, the identifiers of the two WAL files are: f2.w.7 and f1.w.11, and the time serial number of f2.w.7 is lower than f1. w.11 time serial number.
从表10中可以看出,第一分区的两个WAL文件信息与第二分区的两个WAL文件信 息合并后,得到第三分区的清单文件3中的WAL文件信息。清单文件3中包括4个WAL文件信息。这4个WAL文件信息分别来自于清单文件1和清单文件2。此外,从表3中还可以看出,来自于清单文件1的两个WAL文件信息的先后顺序并未发生变化,f1.w.8在f1.w.9之前。类似的,来自于清单文件2的两个WAL文件信息的先后顺序并未发生变化,f2.w.7在f1.w.11之前。As can be seen from Table 10, after the two WAL file information of the first partition and the two WAL file information of the second partition are combined, the WAL file information in the manifest file 3 of the third partition is obtained. The manifest file 3 includes 4 WAL file information. The four WAL file information comes from manifest file 1 and manifest file 2, respectively. In addition, it can be seen from Table 3 that the sequence of the two WAL file information from the manifest file 1 has not changed, and f1.w.8 precedes f1.w.9. Similarly, the sequence of the two WAL file information from manifest file 2 has not changed, f2.w.7 precedes f1.w.11.
311,数据库服务器1向管理服务器发送响应信息,该响应信息中包括第三分区的当前文件标识。311. The database server 1 sends response information to the management server, where the response information includes the current file identifier of the third partition.
312,管理服务器更新分区路由表。312. The management server updates the partition routing table.
具体地,管理服务器将第一分区和第二分区标记为删除状态,并记录第三分区的左右边界和当前文件标识,第三分区的状态修改为正常。分区路由表中包含第三分区与数据库服务器1的映射关系,具体实现可以是第三分区的标识与数据库服务器的地址的映射关系。Specifically, the management server marks the first partition and the second partition as deleted, and records the left and right boundaries of the third partition and the current file identifier, and the status of the third partition is modified to be normal. The partition routing table includes the mapping relationship between the third partition and the database server 1. The specific implementation may be the mapping relationship between the identifier of the third partition and the address of the database server.
313,管理服务器向数据库服务器1和数据库服务器2发送分区完成消息,该分区完成消息用于指示第一分区与第二分区完成合并。313. The management server sends a partition completion message to the database server 1 and the database server 2. The partition completion message is used to instruct the first partition and the second partition to complete the merge.
第三分区的元数据中包含第一分区的元数据和第二分区的元数据。因此,以LSM-tree算法的KVDB为例,因为存储服务器存储相应分区的WAL文件,SSTable以及清单文件,本申请实施例提供的分区合并方案,数据库服务器1根据第三分区的元数据可以访问存储服务器中存储的第一分区和第二分区相应分区的WAL文件,SSTable以及清单文件,而不需要进行第一分区的数据条目和第二分区的数据条目的读出和写入操作即可实现了第一分区和第二分区的合并,从而减少了分区合并过程中读写数据量,提高了分区合并速度。另外,两个分区合并过程中,这两个分区的业务写入操作会被暂时冻结直到分区合并完成,本申请实施例也减少了分区的业务写入操作冻结时间。The metadata of the third partition includes metadata of the first partition and metadata of the second partition. Therefore, taking the KVDB of the LSM-tree algorithm as an example, because the storage server stores the WAL files, SSTables, and manifest files of the corresponding partitions, the partition consolidation scheme provided by the embodiment of this application, the database server 1 can access the storage according to the metadata of the third partition The WAL files, SSTables, and manifest files of the corresponding partitions of the first and second partitions stored in the server can be implemented without reading and writing data entries of the first and second partitions. The first partition and the second partition are merged, thereby reducing the amount of data read and written during the partition merge and improving the speed of partition merge. In addition, during the process of merging the two partitions, the business write operations of the two partitions will be temporarily frozen until the partition merge is completed. The embodiment of the present application also reduces the freeze time of the business write operations of the partitions.
314,数据库服务器1将第一分区的数据库关闭,并删除第一分区的当前文件和第一分区的元数据,向管理服务器发送成功响应,并将挂起的写请求返回重路由的错误响应,客户端收到重路由的错误响应消息后会重新向管理服务器发起请求更新分区路由表,然后将请求发送给新归属数据库服务器。314. The database server 1 closes the database of the first partition, deletes the current file of the first partition and the metadata of the first partition, sends a successful response to the management server, and returns the pending write request to the rerouted error response. After receiving the reroute error response message, the client sends a request to the management server to update the partition routing table, and then sends the request to the new home database server.
315,数据库服务器2将第二分区的数据库关闭,并删除第二分区的当前文件和第二分区的元数据,向管理服务器发送成功响应,并将挂起的写请求返回重路由的错误响应,客户端收到重路由的错误响应消息后会重新向管理服务器发起请求更新分区路由表,然后将请求发送给新归属数据库服务器。315. The database server 2 closes the database of the second partition, deletes the current file of the second partition and the metadata of the second partition, sends a successful response to the management server, and returns the pending write request to the rerouted error response. After receiving the reroute error response message, the client sends a request to the management server to update the partition routing table, and then sends the request to the new home database server.
316,管理服务器更新分区路由表,删除第一分区和第二分区的记录。316. The management server updates the partition routing table, and deletes records of the first partition and the second partition.
317、数据库服务器1后台启动层合并任务,以减少第三分区的元数据的层数。在合并完成前,第三分区不参与新的分区合并。317. The database server 1 starts a layer merging task in the background to reduce the number of layers of metadata of the third partition. Until the merge is completed, the third partition will not participate in the new partition merge.
如前所述,在一些实施例中,数据服务器1可以根据该目标列簇的数据存放单元信息确定该第三分区的二级列簇的数据存放单元信息。该第三分区的二级列簇的数据存放单元信息包括P层数据存放单元信息,其中该第三分区的二级列簇的数据存放单元信息的P层数据单元存放信息中第1层数据存放单元信息是该Q层数据存放单元信息的第0层数据存放单元信息对应的数据存放单元进行归并重排后的得到的数据存放单元的数据存放单元信息,该第三分区的二级列簇的数据存放单元信息的P层数据单元存放信息中的第2层至第 P-1层数据存放单元信息中的每层数据存放单元信息是该Q层数据存放单元信息的第1层数据存放单元信息至第Q-1层数据存放单元信息中的至少两层数据存放单元信息对应的数据存放单元进行归并重排后的得到的数据存放单元的数据存放单元信息。As mentioned above, in some embodiments, the data server 1 may determine the data storage unit information of the secondary column cluster of the third partition according to the data storage unit information of the target column cluster. The data storage unit information of the second-level column clusters of the third partition includes P-layer data storage unit information, and the first-level data storage of the P-level data unit storage information of the data storage unit information of the second-level column clusters of the third partition The unit information is the data storage unit information of the data storage unit obtained by merging and rearranging the data storage unit corresponding to the 0th-level data storage unit information of the Q-level data storage unit information. Each layer of data storage unit information in the P-layer data unit storage information in the P-layer data unit storage information in the data storage unit information is the first-layer data storage unit information in the Q-layer data storage unit information. The data storage unit information of the data storage unit obtained by merging and rearranging the data storage units corresponding to at least two layers of data storage unit information in the Q-1 layer data storage unit information.
数据服务器1根据该目标列簇的数据存放单元信息确定该第三分区的二级列簇的数据存放单元信息的过程就可以称为:层合并(level compaction)过程。The process in which the data server 1 determines the data storage unit information of the second-level column clusters of the third partition according to the data storage unit information of the target column cluster may be referred to as a level compaction process.
下面结合图4对层合并过程进行描述。The following describes the layer merging process with reference to FIG. 4.
如图4所示的PT1层0表示该第一分区的二级列簇的数据存放单元信息中第0层的数据存放单元信息,PT2层0表示该第二分区的二级列簇的数据存放单元信息中第0层的数据存放单元信息,PT3层0表示该第三分区的二级列簇的数据存放单元信息中第0层的数据存放单元信息,以此类推。层0表示在根据该第一分区的二级列簇的数据存放单元信息与该第二分区的二级列簇的数据存放单元信息得到该第三分区的二级列簇的数据存放单元信息的过程中的第0层数据存放单元信息,层1表示在根据该第一分区的二级列簇的数据存放单元信息与该第二分区的二级列簇的数据存放单元信息得到该第三分区的二级列簇的数据存放单元信息的过程中的第1层数据存放单元信息,以此类推。real:30MB,表示当前level中的实际数据量为30MB,max:400MB表示设定的该层的最大数据量为400MB。As shown in FIG. 4, PT1 layer 0 indicates the data storage unit information of the 0th layer in the data storage unit information of the second column cluster of the first partition, and PT2 layer 0 indicates the data storage unit information of the second column of the second partition. In the unit information, the layer 0 data stores unit information. PT3 layer 0 represents the layer 0 data storage unit information in the data storage unit information of the secondary column cluster of the third partition, and so on. Layer 0 indicates that the data storage unit information of the second column cluster of the third partition is obtained according to the data storage unit information of the second column cluster of the first partition and the data storage unit information of the second column cluster of the second partition. The layer 0 data storage unit information in the process, and the layer 1 indicates that the third partition is obtained according to the data storage unit information of the second column cluster of the first partition and the data storage unit information of the second column cluster of the second partition. In the process of storing unit information in the second-level column cluster, the first-level data stores unit information, and so on. real: 30MB, which means that the actual amount of data in the current level is 30MB, max: 400MB, which means that the maximum data amount set for this layer is 400MB.
该第一分区的二级列簇的数据存放单元信息的层0数据存放单元信息与该第二分区的二级列簇的数据存放单元信息的层0数据存放单元信息通过追加合并得到目标二级列簇的数据存放单元信息的层0数据存放单元信息。该第一分区的二级列簇的数据存放单元信息的层1至层3数据存放单元信息与该第二分区的二级列簇的数据存放单元信息的层1至层3数据存放单元信息通过叠加合并得到目标二级列簇的数据存放单元的层1至层6的数据存放单元信息。The layer 0 data storage unit information of the data storage unit information of the second-level column clusters of the first partition and the layer 0 data storage unit information of the data storage unit information of the second-level column clusters of the second partition are added and merged to obtain the target second level. The layer 0 data of the column cluster stores unit information. The layer 1 to layer 3 data storage unit information of the second column cluster data storage unit information of the first partition and the layer 1 to layer 3 data storage unit information of the data storage unit information of the second column cluster of the second partition pass The data storage unit information of layers 1 to 6 of the data storage unit of the target secondary column cluster is superimposed and merged.
如图4所示,目标二级列簇的数据存放单元信息的0层数据存放单元信息包括该第一分区的二级列簇的数据存放单元信息的层0数据存放单元信息与该第二分区的二级列簇的数据存放单元信息的层0数据存放单元信息。As shown in FIG. 4, the layer 0 data storage unit information of the data storage unit information of the target secondary column cluster includes the layer 0 data storage unit information of the data storage unit information of the secondary column cluster of the first partition and the second partition. The layer 0 data of the second-level column cluster stores unit information.
目标二级列簇的数据存放单元信息的层1数据存放单元信息包括第一分区的二级列簇的数据存放单元信息的层1数据存放单元信息。The layer 1 data storage unit information of the data storage unit information of the target secondary column cluster includes the layer 1 data storage unit information of the data storage unit information of the secondary column cluster of the first partition.
目标二级列簇的数据存放单元信息的层2数据存放单元信息包括第二分区的二级列簇的数据存放单元信息的层1数据存放单元信息。The layer 2 data storage unit information of the data storage unit information of the target secondary column cluster includes the layer 1 data storage unit information of the data storage unit information of the secondary column cluster of the second partition.
目标二级列簇的数据存放单元信息的层3数据存放单元信息包括第一分区的二级列簇的数据存放单元信息的层2数据存放单元信息。The layer 3 data storage unit information of the data storage unit information of the target secondary column cluster includes the layer 2 data storage unit information of the data storage unit information of the secondary column cluster of the first partition.
目标二级列簇的数据存放单元信息的层4数据存放单元信息包括第二分区的二级列簇的数据存放单元信息的层2数据存放单元信息。The layer 4 data storage unit information of the data storage unit information of the target secondary column cluster includes the layer 2 data storage unit information of the data storage unit information of the secondary column cluster of the second partition.
目标二级列簇的数据存放单元信息的层5数据存放单元信息包括第一分区的二级列簇的数据存放单元信息的层3数据存放单元信息。The layer 5 data storage unit information of the data storage unit information of the target secondary column cluster includes the layer 3 data storage unit information of the data storage unit information of the secondary column cluster of the first partition.
目标二级列簇的数据存放单元信息的层6数据存放单元信息包括第二分区的二级列簇的数据存放单元信息的层3数据存放单元信息。The layer 6 data storage unit information of the data storage unit information of the target secondary column cluster includes the layer 3 data storage unit information of the data storage unit information of the secondary column cluster of the second partition.
为便于描述,除非特殊说明,以下所称的层1数据存放单元信息就是指目标二级列簇的数据存放单元信息的层1数据存放单元信息,类似的,层2数据存放单元信息就是指目 标二级列簇的数据存放单元信息的层2数据存放单元信息,以此类推。For ease of description, unless otherwise specified, the layer 1 data storage unit information referred to below refers to the layer 1 data storage unit information of the target secondary column cluster data storage unit information. Similarly, the layer 2 data storage unit information refers to the target The layer 2 data of the secondary column cluster stores the unit information, and so on.
为便于描述,还以采用LSM-tree算法的KVDB为例进行描述。相应的,数据存放单元信息为SSTable信息,For ease of description, the KVDB using the LSM-tree algorithm is also described as an example. Correspondingly, the data storage unit information is SSTable information.
首先对归并排序示进行介绍。假设SSTable 1中的存储了如下内容:{Key01-Value,Key10-Value,Key19-Value,},假设SSTable 2中的存储了如下内容:{Key08-Value,Key12-Value,Key17-Value,}。其中Value中的数据是否相同并不重要,因为是按照Key来排序的。First, the merge sort order is described. Assume that the following content is stored in SSTable1: {Key01-Value, Key10-Value, Key19-Value,}, and that the following content is stored in SSTable2: {Key08-Value, Key12-Value, Key17-Value,}. It is not important whether the data in Value is the same, because it is sorted by Key.
归并排序过程如下:The merge sort process is as follows:
1)申请第一空间,并读取SSTable 1和SSTable 2的内容到该第一空间中;1) Apply for the first space and read the contents of SSTable 1 and SSTable 2 into the first space;
2)申请第二空间,第二空间的大小为SSTable 1和SSTable 2之和的大小,该第二空间用来存放合并后的SSTable内容;2) Apply for a second space. The size of the second space is the sum of SSTable1 and SSTable2. This second space is used to store the combined SSTable content.
3)设定两个指针,最初位置分别为SSTable 1和SSTable 2的起始的数据条目;3) Set two pointers, the initial positions are the first data entries of SSTable1 and SSTable2;
4)比较两个指针所指向的SSTable 1和SSTable 2中的数据条目,选择条目键值相对小的数据条目放入到该第二空间,并移动指针到下一位置,该第二空间中保存的数据条目就是合并后的SSTable;4) Compare the data entries in SSTable1 and SSTable2 pointed by the two pointers, select the data entry with a relatively small entry key value and place it in the second space, and move the pointer to the next position, which is stored in the second space The data entry is the merged SSTable;
5)重复步骤4直到某一指针达到SSTable 1和SSTable 2中的一个最后一个数据条目;5) Repeat step 4 until a pointer reaches one of the last data entries in SSTable 1 and SSTable 2;
6)将另一SSTable剩下的所有数据条目直接复制到合并后的SSTable最后;6) Copy all remaining data entries of another SSTable directly to the end of the merged SSTable;
7),得到排序内容如下:{Key01-Value,Key08-Value,Key10-Value,Key12-Value,Key17-Value,Key19-Value,};7), get the sorted content as follows: {Key01-Value, Key08-Value, Key10-Value, Key12-Value, Key17-Value, Key19-Value,};
8)再将上述内容写入两个新的SSTable,SSTable 3和SSTable 4:8) Write the above contents into two new SSTables, SSTable3 and SSTable4:
SSTable 3:{Key01-Value,Key08-Value,Key10-Value}SSTable 3: {Key01-Value, Key08-Value, Key10-Value}
SSTable 4:{Key12-Value,Key17-Value,Key19-Value}。SSTable 4: {Key12-Value, Key17-Value, Key19-Value}.
层合并过程具体包括以下几步:The layer merge process includes the following steps:
第一步,将层1SSTable信息所指示的所有SSTable与层2SSTable信息所指示的所有SSTable归并排序后,重新写入新的SSTable,并在清单文件中记录层2新增的SSTable对应的SSTable信息,同时将层1和层2原有的SSTable信息以及原有的SSTable信息所指示的SSTable标记删除。假设如图4所示的层1SSTable信息所指示的所有SSTable与层2SSTable信息所指示的所有SSTable均需要进行读写,则总读写数据量为200MB×2。The first step is to merge and sort all the SSTables indicated by the layer 1 SSTable information and all the SSTables indicated by the layer 2 sstable information, and then rewrite them into the new SSTable, and record the SSTable information corresponding to the newly added SSTable in the layer 2 At the same time, the original SSTable information of layer 1 and layer 2 and the SSTable tag indicated by the original SSTable information are deleted. Assume that all the SSTables indicated by the layer 1 SSTable information and all the SSTables indicated by the layer 2 SSTable information need to be read and written as shown in FIG. 4, the total read and write data amount is 200 MB × 2.
具体地,假设表11是层1SSTable信息与层2SSTable信息的示意。Specifically, it is assumed that Table 11 is an illustration of layer 1 SSTable information and layer 2 SSTable information.
表11Table 11
Figure PCTCN2019097559-appb-000013
Figure PCTCN2019097559-appb-000013
Figure PCTCN2019097559-appb-000014
Figure PCTCN2019097559-appb-000014
需要说明的是,由于对层1以及更高层,时间序列号无意义,因此表11和表12中没有包括时间序列号。It should be noted that, since time series numbers are meaningless for layers 1 and higher, the time series numbers are not included in Tables 11 and 12.
表11所示的层1SSTable信息对应的SSTable为f1a.1.1和f1a.1.2,层2SSTable信息对应的SSTable为f2a.1.1和f2a.1.2。新建SSTable f3a.1.1和f3a.1.2,将SSTable为f1a.1.1和f2a.1.1的内容归并排序后写入到新建的SSTable f3a.1.1和f3a.1.2中(参见前面的归并排序示例)。类似的,新建SSTable f3a.1.3和f3a.1.4,将SSTable为f1a.1.2和f2a.2.2的内容归并排序后写入到新建的SSTable f3a.1.3和f3a.1.4中。在清单文件中记录新增的SSTable对应的SSTable信息。在此情况下,该清单文件中记录的SSTable信息可以为表12所示。The SSTables corresponding to the layer 1 SSTable information shown in Table 11 are f1a.1.1 and f1a.1.2, and the SSTables corresponding to the layer 2 SSTable information are f2a.1.1 and f2a.1.2. Create new SSTables f3a.1.1 and f3a.1.2, merge and sort the contents of SSTables f1a.1.1 and f2a.1.1 and write them to the newly created SSTables f3a.1.1 and f3a.1.2 (see the previous example of merge and sort). Similarly, create new SSTables f3a.1.3 and f3a.1.4, merge and sort the contents of SSTables f1a.1.2 and f2a.2.2, and write them to the newly created SSTables f3a.1.3 and f3a.1.4. Record the SSTable information corresponding to the newly added SSTable in the manifest file. In this case, the SSTable information recorded in the manifest file can be shown in Table 12.
表12Table 12
Figure PCTCN2019097559-appb-000015
Figure PCTCN2019097559-appb-000015
如表12所示,该清单文件中只包括新建的SSTable信息,原层1与层2的SSTable信息已经被删除。相应的,原层1与层2的SSTable信息所指示的SSTable也被删除。As shown in Table 12, the manifest file only includes newly created SSTable information, and the original SSTable information of Layer 1 and Layer 2 has been deleted. Correspondingly, the SSTables indicated by the SSTable information of the original layer 1 and layer 2 are also deleted.
通过上述合并过程,层1中没有数据,层2的数据翻倍。Through the above merging process, there is no data in layer 1 and the data in layer 2 is doubled.
以上例子中,层1SSTable信息所指示的所有SSTable与层2SSTable信息所指示的所有SSTable均有重叠,因此需要重新排序后写入新的SSTable。在当要归并的两个SSTable范围无重叠时,该SSTable内容不需重写。In the above example, all the SSTables indicated by the layer 1 SSTable information overlap with all the SSTables indicated by the layer 2 SSTable information, so the new SSTable needs to be reordered. When the two SSTables to be merged do not overlap, the SSTable content does not need to be rewritten.
第二步,将层0的SSTable进行归并排序后,重新写入新的SSTable,并在清单文件中记录新增的SSTable所对应的SSTable信息,新增的SSTable所对应的SSTable信息中的层标识为1。具体归并排序过程与步骤1类似,在此就不必赘述。假设如图4所示的层0的所有SSTable均需要进行读写,则总读写数据量为30MB×4。In the second step, after merging and sorting the SSTable at layer 0, rewrite the new SSTable, and record the SSTable information corresponding to the newly added SSTable in the manifest file, and the layer identifier in the SSTable information corresponding to the new SSTable. Is 1. The specific merging and sorting process is similar to step 1, and it is unnecessary to repeat them here. Assume that all SSTables in layer 0 shown in Figure 4 need to read and write, and the total read and write data volume is 30MB × 4.
第三步,将层3SSTable信息所指示的所有SSTable与层4SSTable信息所指示的所有SSTable归并排序后,重新写入新的SSTable,并在清单文件中记录层4新增的SSTable对应的SSTable信息,同时将层3和层4原有的SSTable信息以及原有的SSTable信息所指示的SSTable标记删除。层3增加只读标识,不再写入新的SSTable。假设如图4所示的层3SSTable信息所指示的所有SSTable与层4SSTable信息所指示的所有SSTable均需要进行 读写,则总读写数据量为2GB×2。The third step is to merge and sort all the SSTables indicated by the layer 3SSTable information and all the SSTables indicated by the layer 4SSTable information, and rewrite them into the new SSTable, and record the SSTable information corresponding to the SSTable added by the layer 4 in the manifest file. At the same time, the original SSTable information of layer 3 and layer 4 and the SSTable tag indicated by the original SSTable information are deleted. Layer 3 adds a read-only flag and no longer writes a new SSTable. Assume that all SSTables indicated by the layer 3SSTable information and all SSTables indicated by the layer 4SSTable information shown in Figure 4 need to read and write, and the total read and write data amount is 2GB × 2.
第四步,将层5SSTable信息所指示的所有SSTable与层6SSTable信息所指示的所有SSTable归并排序后,重新写入新的SSTable,并在清单文件中记录层6新增的SSTable对应的SSTable信息,同时将层5和层6原有的SSTable信息以及原有的SSTable信息所指示的SSTable标记删除。层5增加只读标识,不再写入新的SSTable。假设如图4所示的层5SSTable信息所指示的所有SSTable与层6SSTable信息所指示的所有SSTable均需要进行读写,则总读写数据量为2GB×2。The fourth step is to merge and sort all the SSTables indicated by the layer 5SSTable information and all the SSTables indicated by the layer 6SSTable information, and then rewrite them into the new SSTable, and record the SSTable information corresponding to the SSTable added by the layer 6 in the manifest file At the same time, the original SSTable information of layers 5 and 6 and the SSTable tag indicated by the original SSTable information are deleted. Layer 5 adds a read-only flag and no longer writes a new SSTable. Assume that all the SSTables indicated by the layer 5SSTable information and all the SSTables indicated by the layer 6SSTable information shown in FIG. 4 need to read and write, and the total read and write data amount is 2GB × 2.
第五步,删除层3与层5。具体操作如下:将清单文件中层标识为4的SSTable信息的层标识修改为3,将清单文件中层标识为6的SSTable信息的层标识修改为4,以此达到删除原层3和原层5的目的The fifth step is to delete layers 3 and 5. The specific operation is as follows: the layer ID of the SSTable information with the layer ID of 4 in the manifest file is changed to 3, and the layer ID of the SSTable information with the layer ID of 6 in the manifest file is changed to 4, so as to delete the original layer 3 and the original layer 5. purpose
第六步,将新的层2(即第一步得到的层2)SSTable信息所指示的所有SSTable与新的层3(即第五步得到的层2)SSTable信息所指示的所有SSTable归并排序后,重新写入新的SSTable,并在清单文件中记录新的层3新增的SSTable对应的SSTable信息,同时将新的层2和新的层3原有的SSTable信息以及原有的SSTable信息所指示的SSTable标记删除。将新的层2删除,清单文件中层标识为3的SSTable信息的层标识修改为2,将清单文件中层标识为4的SSTable信息的层标识修改为3。假设如图4所示的新的层2(即第一步得到的层2)SSTable信息所指示的所有SSTable与新的层3(即第五步得到的层2)SSTable信息均需要进行读写,则总读写数据量为400MB+4GB。In the sixth step, all the SSTables indicated by the SSTable information in the new layer 2 (that is, layer 2 obtained in the first step) and all the SSTables indicated by the SSTable information in the new layer 3 (that is, layer 2 obtained in the fifth step) are sorted. Then, re-write the new SSTable, and record the SSTable information corresponding to the new SSTable in the new layer 3 in the manifest file. At the same time, the original SSTable information of the new layer 2 and new layer 3 and the original SSTable information The indicated SSTable tag is deleted. The new layer 2 is deleted, the layer identifier of the SSTable information with the layer identifier 3 in the manifest file is modified to 2, and the layer identifier of the SSTable information with the layer identifier 4 in the manifest file is modified to 3. Assume that all the SSTables indicated by the SSTable information in the new layer 2 (that is, layer 2 obtained in the first step) and the SSTable information in the new layer 3 (that is, layer 2 obtained in the fifth step) need to be read and written as shown in Figure 4. , The total amount of read and write data is 400MB + 4GB.
通过上述第一步至第六步,就得到该第三二级列簇的数据存放单元信息,以及第三二级列簇的数据存放单元信息对应的数据存放单元。Through the above steps 1 to 6, the data storage unit information of the third-level and second-level column clusters and the data storage unit corresponding to the data storage unit information of the third-level and second-level column clusters are obtained.
可以理解的是,第一步、第二步、第三步、第四步、第五步和第六步,这样的表述仅是为了便于区分不同的步骤,而并非是对步骤顺序的限定。通过上述六步的描述可以看出,第一步、第三步和第四步的先后顺序可以发生变化。It can be understood that the expressions of the first step, the second step, the third step, the fourth step, the fifth step, and the sixth step are only for the convenience of distinguishing different steps, and are not a limitation on the order of the steps. From the description of the above six steps, it can be seen that the order of the first step, the third step, and the fourth step can be changed.
通过层合并处理,可以将数据存放单元的层数降低,这样可以便于查询该数据库中保存的数据条目。Through the layer merging process, the number of layers of the data storage unit can be reduced, which can facilitate querying the data entries stored in the database.
当然,在一些实施例中,该数据库服务器1也可以不进行层合并处理。Of course, in some embodiments, the database server 1 may not perform layer merge processing.
本申请实施例提供的分布式数据库系统可以用于存储分布式对象存储系统的元数据,分布式文件系统的元数据或者分布式块存储系统的元数据。The distributed database system provided in the embodiments of the present application may be used to store metadata of a distributed object storage system, metadata of a distributed file system, or metadata of a distributed block storage system.
图5是根据本申请实施例提供的数据库服务器的结构框图。如图5所示数据库服务器500包括:通信单元501和处理单元502。FIG. 5 is a structural block diagram of a database server according to an embodiment of the present application. As shown in FIG. 5, the database server 500 includes a communication unit 501 and a processing unit 502.
通信单元501,用于接收管理服务器发送的合并指令,该合并指令用于实现将第一分区和第二分区合并为第三分区,其中该第一分区和该第二分区为相邻分区;该合并指令包含该第一分区的当前文件的标识和该第二分区的当前文件的标识;该第一分区的当前文件中记录有存储该第一分区的元数据的文件的文件标识;该第二分区的当前文件中记录有存储该第二分区的元数据的文件的文件标识,该第一分区运行于该数据库服务器,该第二分区运行于另一数据库服务器。The communication unit 501 is configured to receive a merge instruction sent by a management server, where the merge instruction is used to implement merging a first partition and a second partition into a third partition, wherein the first partition and the second partition are adjacent partitions; The merge instruction includes the identifier of the current file of the first partition and the identifier of the current file of the second partition; the current file of the first partition records the file identifier of the file storing the metadata of the first partition; the second A file identifier of a file storing metadata of the second partition is recorded in a current file of the partition, the first partition runs on the database server, and the second partition runs on another database server.
处理单元502用于:The processing unit 502 is configured to:
根据该第一分区的当前文件的标识获取该第一分区的元数据。Obtain metadata of the first partition according to the identifier of the current file of the first partition.
根据该第二分区的当前文件的标识获取该第二分区的元数据;Obtaining metadata of the second partition according to the identifier of the current file of the second partition;
合并该第一分区的元数据和该第二分区的元数据生成该第三分区的元数据。Merging metadata of the first partition and metadata of the second partition to generate metadata of the third partition.
图5所示的数据库服务器500的另一种实现中,通信单元501,用于接收管理服务器发送的合并指令,该合并指令用于实现将该第一分区和该第二分区合并为第三分区,其中该第一分区和该第二分区为相邻分区;处理单元502用于:根据该第一分区获取该第一分区的元数据和该第二分区的元数据,合并该第一分区的元数据和该第二分区的元数据生成该第三分区的元数据。In another implementation of the database server 500 shown in FIG. 5, the communication unit 501 is configured to receive a merge instruction sent by the management server, and the merge instruction is used to implement merging the first partition and the second partition into a third partition. Wherein the first partition and the second partition are adjacent partitions; the processing unit 502 is configured to: obtain metadata of the first partition and metadata of the second partition according to the first partition, and merge the first partition and the second partition; The metadata and the metadata of the second partition generate metadata of the third partition.
图5所示的数据库服务器500可以执行如图3所示的数据库服务器1执行的各个步骤。图5所示的数据库服务器500中各个单元的具体功能和有益效果可以参见图3所示的方法,在此就不必赘述。The database server 500 shown in FIG. 5 can perform various steps performed by the database server 1 shown in FIG. 3. For specific functions and beneficial effects of each unit in the database server 500 shown in FIG. 5, reference may be made to the method shown in FIG. 3, and it is unnecessary to repeat them here.
一种可能的实现方式中,处理单元502可以由处理器实现,通信单元501可以由网络接口卡实现。在另一些实施例中,通信单元501还可以由总线适配器实现。通信单元501具体实现可以支持一种或多种访问协议,例如,以太报文协议、Infiniband协议等,本发明实施例对此不作限定。在另外一种实现中,通信单元501和处理单元502也可以由软件实现,或者由软件和硬件共同实现。In a possible implementation manner, the processing unit 502 may be implemented by a processor, and the communication unit 501 may be implemented by a network interface card. In other embodiments, the communication unit 501 may also be implemented by a bus adapter. The specific implementation of the communication unit 501 may support one or more access protocols, for example, an Ethernet message protocol, an Infiniband protocol, and the like, which are not limited in the embodiment of the present invention. In another implementation, the communication unit 501 and the processing unit 502 may also be implemented by software, or both software and hardware.
图6是根据本发明实施例提供的数据库服务器的结构框图。如图6所示,数据库服务器600包括处理器601和通信接口602。处理器601可以用于对数据进行处理,以及对数据库服务器进行控制,执行软件程序,处理软件程序的数据等。通信接口602主要用于通信,例如与分布式数据库系统中的管理服务器通信。FIG. 6 is a structural block diagram of a database server according to an embodiment of the present invention. As shown in FIG. 6, the database server 600 includes a processor 601 and a communication interface 602. The processor 601 may be used to process data, control a database server, execute a software program, process data of the software program, and the like. The communication interface 602 is mainly used for communication, for example, communication with a management server in a distributed database system.
在本申请实施例中,可以将具有收发功能的电路视为数据库服务器的通信接口602,将具有处理功能的处理器视为数据库服务器600的处理器601。在一些实施例中,通信接口602可以由网络接口卡实现。在另一些实施例中,通信接口602还可以由总线适配器实现。通信接口602具体实现可以支持一种或多种访问协议,例如,以太报文协议、Infiniband协议等,本发明实施例对此不作限定。处理单元也可以称为处理器,处理单板,处理模块、处理装置等。In the embodiment of the present application, a circuit having a transmitting and receiving function may be regarded as a communication interface 602 of a database server, and a processor having a processing function may be regarded as a processor 601 of the database server 600. In some embodiments, the communication interface 602 may be implemented by a network interface card. In other embodiments, the communication interface 602 may also be implemented by a bus adapter. The specific implementation of the communication interface 602 may support one or more access protocols, for example, an Ethernet message protocol, an Infiniband protocol, and the like, which are not limited in the embodiment of the present invention. The processing unit may also be called a processor, a processing single board, a processing module, a processing device, and the like.
处理器601和通信接口602之间通过内部连接通路互相通信,传递控制和/或数据信号The processor 601 and the communication interface 602 communicate with each other through an internal connection path, and transfer control and / or data signals
上述本发明实施例揭示的方法可以应用于处理器601中,或者由处理器601实现。处理器601可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器601中的硬件的集成逻辑电路或者软件形式的指令完成。The method disclosed in the foregoing embodiment of the present invention may be applied to the processor 601, or implemented by the processor 601. The processor 601 may be an integrated circuit chip and has a signal processing capability. In the implementation process, each step of the above method may be completed by using an integrated logic circuit of hardware in the processor 601 or an instruction in the form of software.
本申请各实施例所述的处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储 器,处理器读取存储器中的指令,结合其硬件完成上述方法的步骤。The processor described in the embodiments of the present application may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), and a ready-made programmable gate array (field programmable gate array). , FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. Various methods, steps, and logical block diagrams disclosed in the embodiments of the present invention may be implemented or executed. A general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in combination with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or may be performed by using a combination of hardware and software modules in the decoding processor. Software modules can be located in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory or electrically erasable programmable memory, registers, etc. Storage media. The storage medium is located in the memory, and the processor reads the instructions in the memory and completes the steps of the above method in combination with its hardware.
可选的,在一些实施例中,处理器601可以是中央处理单元(Central Processing Unit,CPU)与存储器的组合,其中存储器可以存储用于执行如图3所示方法中数据库服务器1执行的方法的指令。CPU可以执行存储器中存储的指令结合其他硬件(例如通信接口602)完成如图3所示方法中数据库服务器1执行的步骤,具体工作过程和有益效果可以参见图3所示实施例中的描述。Optionally, in some embodiments, the processor 601 may be a combination of a central processing unit (CPU) and a memory, where the memory may store a method for executing the method executed by the database server 1 in the method shown in FIG. 3 Instructions. The CPU may execute the instructions stored in the memory in combination with other hardware (for example, the communication interface 602) to complete the steps performed by the database server 1 in the method shown in FIG.
本申请实施例还提供一种芯片,该芯片包括收发单元和处理单元。其中,收发单元可以是输入输出电路、通信接口;处理单元为该芯片上集成的处理器或者微处理器或者集成电路。该芯片可以执行上述方法实施例中数据库服务器1执行的方法。An embodiment of the present application further provides a chip, and the chip includes a transceiver unit and a processing unit. The transceiver unit may be an input / output circuit or a communication interface; the processing unit is a processor or a microprocessor or an integrated circuit integrated on the chip. The chip can execute the method executed by the database server 1 in the foregoing method embodiment.
本申请实施例还提供一种计算机可读存储介质,其上存储有计算机指令,该计算机指令被执行时执行上述方法实施例中数据库服务器1执行的方法。The embodiment of the present application further provides a computer-readable storage medium having computer instructions stored thereon. When the computer instructions are executed, the method executed by the database server 1 in the foregoing method embodiment is executed.
本申请实施例还提供一种包含计算机指令的计算机程序产品,该计算机指令被执行时执行上述方法实施例中数据库服务器1执行的方法。The embodiment of the present application further provides a computer program product containing computer instructions, and when the computer instructions are executed, the method performed by the database server 1 in the foregoing method embodiment is executed.
图7是根据本申请实施例提供的管理服务器的结构框图。如图7所示管理服务器700包括:通信单元701和处理单元702。FIG. 7 is a structural block diagram of a management server according to an embodiment of the present application. As shown in FIG. 7, the management server 700 includes a communication unit 701 and a processing unit 702.
处理单元702,用于创建第三分区,确定将第一分区和第二分区合并为该第三分区。The processing unit 702 is configured to create a third partition, and determine to merge the first partition and the second partition into the third partition.
通信单元701,用于向第一数据库服务器发送的合并指令,该合并指令用于实现将该第一分区和该第二分区合并为该第三分区,其中该第一分区和该第二分区为相邻分区;该合并指令包含该第一分区的当前文件的标识和该第二分区的当前文件的标识;该第一分区的当前文件中记录有存储该第一分区的元数据的文件的文件标识;该第二分区的当前文件中记录有存储该第二分区的元数据的文件的文件标识,该第一分区运行于该第一数据库服务器,该第二分区运行于第二数据库服务器。The communication unit 701 is configured to send a merge instruction to the first database server, where the merge instruction is used to implement merging the first partition and the second partition into the third partition, where the first partition and the second partition are Adjacent partitions; the merge instruction includes the identifier of the current file of the first partition and the identifier of the current file of the second partition; the current file of the first partition records a file storing a file storing metadata of the first partition Identification; the current file of the second partition records a file identification of a file storing metadata of the second partition, the first partition runs on the first database server, and the second partition runs on the second database server.
在管理服务器700的另一种实现中,处理单元702,用于创建第三分区,确定将第一分区和第二分区合并为该第三分区。In another implementation of the management server 700, the processing unit 702 is configured to create a third partition, and determine to merge the first partition and the second partition into the third partition.
通信单元701,用于向第一数据库服务器发送的合并指令,该合并指令用于实现将该第一分区和该第二分区合并为该第三分区,其中该第一分区和该第二分区为相邻分区;该第一分区运行于该第一数据库服务器,该第二分区运行于第二数据库服务器。The communication unit 701 is configured to send a merge instruction to the first database server, where the merge instruction is used to implement merging the first partition and the second partition into the third partition, where the first partition and the second partition are Adjacent partitions; the first partition runs on the first database server, and the second partition runs on the second database server.
图7所示的管理服务器700可以执行如图3所示的管理服务器执行的各个步骤。图7所示的管理服务器700中各个单元的具体功能和有益效果可以参见图3所示的方法,在此就不必赘述。The management server 700 shown in FIG. 7 may perform various steps performed by the management server shown in FIG. 3. For specific functions and beneficial effects of each unit in the management server 700 shown in FIG. 7, reference may be made to the method shown in FIG. 3, and it is unnecessary to repeat them here.
一种可能的实现方式中,处理单元702可以由处理器实现,通信单元701可以由网络接口卡实现。在另一些实施例中,通信单元701还可以由总线适配器实现。通信单元701具体实现可以支持一种或多种访问协议,例如,以太报文协议、Infiniband协议等,本发明实施例对此不作限定。In a possible implementation manner, the processing unit 702 may be implemented by a processor, and the communication unit 701 may be implemented by a network interface card. In other embodiments, the communication unit 701 may also be implemented by a bus adapter. The specific implementation of the communication unit 701 may support one or more access protocols, for example, an Ethernet message protocol, an Infiniband protocol, and the like, which are not limited in the embodiment of the present invention.
图8是根据本发明实施例提供的管理服务器的结构框图。如图8所示,管理服务器800包括处理器801和通信接口802。处理器801可以用于对数据进行处理,以及对管理服务器800进行控制,执行软件程序,处理软件程序的数据等。在本申请实施例中,可以将具有收发功能的电路视为数据库服务器的通信接口802,将具有处理功能的处理器视为数据 库服务器的处理器801。关于管理服务器800的具体描述可参考数据库服务器600的描述,在此不再赘述。FIG. 8 is a structural block diagram of a management server according to an embodiment of the present invention. As shown in FIG. 8, the management server 800 includes a processor 801 and a communication interface 802. The processor 801 may be used to process data, control the management server 800, execute software programs, process data of the software programs, and the like. In the embodiment of the present application, a circuit having a transmitting / receiving function can be regarded as the communication interface 802 of the database server, and a processor having a processing function can be regarded as the processor 801 of the database server. For a specific description of the management server 800, reference may be made to the description of the database server 600, and details are not described herein again.
本申请实施例还提供一种计算机可读存储介质,其上存储有计算机指令,该计算机指令被执行时执行上述方法实施例中管理服务器执行的方法。An embodiment of the present application further provides a computer-readable storage medium having computer instructions stored thereon. When the computer instructions are executed, the method executed by the management server in the foregoing method embodiment is executed.
本申请实施例还提供一种包含计算机指令的计算机程序产品,该计算机指令被执行时执行上述方法实施例中管理服务器执行的方法。An embodiment of the present application further provides a computer program product including computer instructions, and when the computer instructions are executed, the method performed by the management server in the foregoing method embodiment is executed.
本领域技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能。Those skilled in the art may realize that the units and algorithm steps of each example described in combination with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professionals can use different methods to implement the described functions for each specific application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices, and units described above can refer to the corresponding processes in the foregoing method embodiments, and are not repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干计算机指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储计算机指令的介质。If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application is essentially a part that contributes to the existing technology or a part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several computer instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application. The foregoing storage media include: U disks, mobile hard disks, read-only memories (ROM), random access memories (RAM), magnetic disks or compact discs, and other media that can store computer instructions .

Claims (34)

  1. 一种分布式数据库系统中分区合并方法,其特征在于,所述分布式数据库系统包括第一数据库服务器、第二数据库服务器和管理服务器,所述第一数据库服务器运行第一分区,所述第二数据库服务器运行第二分区,其中所述第一分区和所述第二分区为相邻分区;所述方法包括:A partition consolidation method in a distributed database system, characterized in that the distributed database system includes a first database server, a second database server, and a management server, the first database server runs a first partition, and the second The database server runs a second partition, wherein the first partition and the second partition are adjacent partitions; the method includes:
    所述第一数据库服务器接收管理服务器发送的合并指令,所述合并指令用于实现将所述第一分区和所述第二分区合并为第三分区;所述合并指令包含所述第一分区的当前文件的标识和所述第二分区的当前文件的标识;所述第一分区的当前文件中记录有存储所述第一分区的元数据的文件的文件标识;所述第二分区的当前文件中记录有存储所述第二分区的元数据的文件的文件标识;Receiving, by the first database server, a merge instruction sent by a management server, where the merge instruction is used to implement merging the first partition and the second partition into a third partition; the merge instruction includes the first partition An identifier of a current file and an identifier of a current file of the second partition; a file identifier of a file storing metadata of the first partition recorded in the current file of the first partition; and a current file of the second partition A file identifier of a file in which the metadata of the second partition is stored;
    所述第一数据库服务器根据所述第一分区的当前文件的标识获取所述第一分区的元数据;Acquiring, by the first database server, metadata of the first partition according to an identifier of a current file of the first partition;
    所述第一数据库服务器根据所述第二分区的当前文件的标识获取所述第二分区的元数据;Acquiring, by the first database server, metadata of the second partition according to an identifier of a current file of the second partition;
    所述第一数据库服务器合并所述第一分区的元数据和所述第二分区的元数据生成所述第三分区的元数据。The first database server combines metadata of the first partition and metadata of the second partition to generate metadata of the third partition.
  2. 如权利要求1所述的方法,其特征在于,所述第一分区的元数据包括所述第一分区的二级列簇的数据存放单元信息,所述第二分区的元数据包括所述第二分区的二级列簇的数据存放单元信息,The method according to claim 1, wherein the metadata of the first partition includes data storage unit information of a secondary column cluster of the first partition, and the metadata of the second partition includes the first partition The data storage unit information of the secondary column cluster of the second partition,
    所述第一数据库服务器合并所述第一分区的元数据和所述第二分区的元数据生成所述第三分区的元数据,具体包括:The first database server combining metadata of the first partition and metadata of the second partition to generate metadata of the third partition specifically includes:
    所述第一数据库服务器合并所述第一分区的二级列簇的数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息生成目标二级列簇的数据存放单元信息;Combining, by the first database server, data storage unit information of the secondary column clusters of the first partition and data storage unit information of the secondary column clusters of the second partition to generate data storage unit information of the target secondary column clusters;
    根据所述目标二级列簇的数据存放单元信息确定所述第三分区的二级列簇的数据存放单元信息。The data storage unit information of the second-level column cluster of the third partition is determined according to the data storage unit information of the target second-level column cluster.
  3. 如权利要求2所述的方法,其特征在于,所述第一分区的二级列簇的数据存放单元信息包括P 1层数据存放单元信息,其中P 1为大于或等于2的正整数; The method according to claim 2, characterized in that the first two columns of the tufts of the partitioned data storage means includes information of layer data P 1 storing cell information, wherein P 1 is a positive integer equal to or greater than 2;
    所述第二分区的二级列簇的数据存放单元信息包括P 2层数据存放单元信息,其中P 2为大于或等于2的正整数; The data storage unit information of the second-level column clusters of the second partition includes P 2 layer data storage unit information, where P 2 is a positive integer greater than or equal to 2;
    所述目标二级列簇的数据存放单元信息包括Q层数据存放单元信息,所述Q层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息,其中,The data storage unit information of the target secondary column cluster includes Q layer data storage unit information, and the Q layer data storage unit information includes data storage unit information of the second column cluster of the first partition and the second partition. The data of the second-level column cluster stores unit information, where:
    所述Q层数据存放单元信息的一层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中的一层数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中的一层数据存放单元信息, The one-level data storage unit information of the Q-layer data storage unit information includes the one- level data storage unit information and the information in the P-level one-level data storage unit information included in the data storage unit information of the second-level column cluster of the first partition. two columns of clusters storing data of said second partition layer data units P 2 information comprises the data storage layer of the cell information storage unit information,
    所述Q层数据存放单元信息的Q-1层数据存放单元信息中的一层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中的一层数据存放单元信息或者所述第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单 元信息中的一层数据存放单元信息,其中Q等于P 1+P 2-1。 Q-1 Q-layer data of the layer data storage unit of the information storage layer of the data storage means includes information of the cell information storage unit of the information data of the first two columns of the cluster comprises a partition layer P 1 of the data storage unit of the information P 2 layer hierarchy data storage unit of the information data in two columns one cluster data storage means of the second partition information or the data stored in the cell information includes a cell information storage, wherein Q is equal to P 1 + P 2 -1.
  4. 如权利要求3所述的方法,其特征在于,所述Q层数据存放单元信息中的第0层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息的第0层数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息的第0层数据存放单元信息,The method according to claim 3, wherein the 0th layer data storage unit information in the Q layer data storage unit information includes the 0th layer of data storage unit information of the second-level column cluster of the first partition. Data storage unit information and layer 0 data storage unit information of the data storage unit information of the secondary column clusters of the second partition,
    所述Q层数据存放单元信息中的第2×q-1层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中P层数据存放单元信息的第q层数据存放单元信息,所述Q层数据单元存放信息中的第2×q层数据存放单元信息包括所述第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中P层数据存放单元信息的第q层数据存放单元信息,q=1,……,P-1,其中P的取值为P 1与P 2中的最小值减1。 The 2 × q-1 layer data storage unit information in the Q layer data storage unit information includes the P layer data in the P 1 layer data storage unit information included in the data storage unit information of the secondary column cluster of the first partition. The q-level data storage unit information storing unit information, and the 2 × q-level data storage unit information in the Q-layer data unit storage information includes the data storage unit information of the second-level column cluster of the second partition. q-th layer data storage unit of the information layer 2 data is stored in the cell information storage unit P layer data information, q = 1, ......, P -1, wherein the value of P is P 1 P 2 and the minimum value of minus 1 .
  5. 如权利要求3所述的方法,其特征在于,所述Q层数据存放单元信息中的第0层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息的第0层数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息的第0层数据存放单元信息,The method according to claim 3, wherein the 0th layer data storage unit information in the Q layer data storage unit information includes the 0th layer of data storage unit information of the second-level column cluster of the first partition. Data storage unit information and layer 0 data storage unit information of the data storage unit information of the secondary column clusters of the second partition,
    所述Q层数据存放单元信息中的第1层数据存放单元信息至第P-1层数据存放单元信息分别为所述第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中P层数据存放单元信息的中的第1层数据存放单元信息至第P-1层数据存放单元信息; The layer Q data storing cell information in the first layer to the second data storage unit of the information layer P-1 data storage unit of the information layer, respectively P 1 data information storage unit column two clusters of the first partition comprises The layer 1 data storage unit information to the layer P-1 data storage unit information in the storage unit information of the P layer data storage unit information;
    所述Q层数据存放单元信息中的第P层数据存放单元信息至第Q-1层数据存放单元信息分别为所述第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中P层数据存放单元信息的中的第1层数据存放单元信息至第P-1层数据存放单元信息,其中P的取值为P 1与P 2中的最小值减1。 The P layer data storage unit information to the Q-1 layer data storage unit information in the Q layer data storage unit information are the P 2 layer data included in the data storage unit information of the second column cluster of the second partition. Among the storage unit information, the layer 1 data storage unit information to the layer P-1 data storage unit information in the P layer data storage unit information, where the value of P is the minimum value of P 1 and P 2 minus 1.
  6. 如权利要求4或5所述的方法,其特征在于,所述方法还包括:所述数据库服务器根据所述目标二级列簇的数据存放单元信息,确定所述第三分区的二级列簇的数据存放单元信息,其中所述第三分区的二级列簇的数据存放单元信息包括P层数据存放单元信息,其中所述第三分区的二级列簇的数据存放单元信息的P层数据单元存放信息中第1层数据存放单元信息是所述Q层数据存放单元信息的第0层数据存放单元信息对应的数据存放单元进行归并重排后的得到的数据存放单元的数据存放单元信息,所述第三分区的二级列簇的数据存放单元信息的P层数据单元存放信息中的第2层至第P-1层数据存放单元信息中的每层数据存放单元信息是所述Q层数据存放单元信息的第1层数据存放单元信息至第Q-1层数据存放单元信息中的至少两层数据存放单元信息对应的数据存放单元进行归并重排后的得到的数据存放单元的数据存放单元信息。The method according to claim 4 or 5, further comprising: determining, by the database server, the secondary column cluster of the third partition according to the data storage unit information of the target secondary column cluster. Data storage unit information, wherein the data storage unit information of the second-level column cluster of the third partition includes P-layer data storage unit information, wherein the data storage unit information of the second-level column cluster of the third partition is P-layer data The first layer data storage unit information in the unit storage information is the data storage unit information of the data storage unit obtained by merging and rearranging the data storage unit corresponding to the 0th layer data storage unit information of the Q layer data storage unit information, Each layer of data storage unit information in the P-layer data unit storage information in the data storage unit information of the second-level column cluster of the third partition is the Q layer The data storage unit corresponding to at least two layers of the data storage unit information in the data storage unit information from the layer 1 data storage unit information to the layer Q-1 layer data storage unit information Data storage unit and rearranging obtained after normalization row information storage means.
  7. 如权利要求2至6中任一项所述的方法,其特征在于,所述二级列簇的数据存放单元信息中的每个数据存放单元信息中的条目键值的前缀是非分区键值。The method according to any one of claims 2 to 6, wherein a prefix of an entry key value in each data storage unit information in the data storage unit information of the secondary column cluster is a non-partitioned key value.
  8. 如权利要求1至7中任一项所述的方法,其特征在于,所述第一分区的元数据还包括所述第一分区的预写日志信息集合,所述第二分区的元数据还包括所述第二分区的预写日志信息集合,所述方法还包括:The method according to any one of claims 1 to 7, wherein the metadata of the first partition further comprises a set of write-ahead log information of the first partition, and the metadata of the second partition further Including the write-ahead log information set of the second partition, the method further includes:
    所述数据库服务器合并所述第一分区的预写日志信息集合和所述第二分区的预写日志信息集合生成所述第三分区的预写日志信息集合,其中,所述第三分区的预写日志信息集合包括所述第一分区的预写日志信息集合中的预写日志信息以及所述第二分区的预写日志 信息集合中的预写日志信息,其中N为大于或等于2的正整数,N 1和N 2为大于或等于1的正整数且N 1与N 2的和为N。 The database server merges the write-ahead log information set of the first partition and the write-ahead log information set of the second partition to generate the write-ahead log information set of the third partition, wherein the The write log information set includes pre-write log information in the pre-write log information set of the first partition and pre-write log information in the pre-write log information set of the second partition, where N is a positive number greater than or equal to 2. Integer, N 1 and N 2 are positive integers greater than or equal to 1 and the sum of N 1 and N 2 is N.
  9. 如权利要求1至8中任一项所述的方法,其特征在于,所述第一分区的元数据还包括所述第一分区的主列簇的数据存放单元信息,所述第二分区的元数据还包括所述第二分区的主列簇的数据存放单元信息,所述方法还包括:所述数据库服务器合并所述第一分区的主列簇的数据存放单元信息和所述第二分区的主列簇的数据存放单元信息成生所述第三分区的主列簇的数据存放单元信息。The method according to any one of claims 1 to 8, wherein the metadata of the first partition further comprises data storage unit information of a main column cluster of the first partition, and The metadata further includes data storage unit information of the main column cluster of the second partition, and the method further includes: the database server merging the data storage unit information of the main column cluster of the first partition and the second partition The data storage unit information of the main column cluster of the third column generates the data storage unit information of the main column cluster of the third partition.
  10. 如权利要求9所述的方法,其特征在于,所述第一分区的主列簇的数据存放单元信息包括K 1层数据存放单元信息,其中K 1为大于或等于1的正整数, The method according to claim 9, characterized in that the main column of the first cluster storing data partition includes a cell information storage layer data K 1 information units, wherein K 1 is a positive integer equal to or greater than 1,
    所述第二分区的主列簇的数据存放单元信息包括K 2层数据存放单元信息,其中K 2为大于或等于1的正整数; The data storage unit information of the main column cluster of the second partition includes K 2 layer data storage unit information, where K 2 is a positive integer greater than or equal to 1.
    所述第三分区的主列簇的数据存放单元信息包括K层数据存放单元信息,其中所述第三分区的主列簇的数据存放单元信息包括的K层数据存放单元信息中的第k层数据存放单元信息包括所述K 1层数据存放单元信息的K层数据存放单元信息中的第k层数据存放单元信息和所述K 2层数据存放单元信息的K层数据存放单元信息中的第k层数据存放单元信息,其中K为K 1与K 2的最小值,其中所述K 1层数据存放单元信息的第k层数据存放单元信息中的任一个数据存放单元信息的条目键值和所述K 2层数据存放单元信息的第k层数据存放单元信息中的任一个数据存放单元信息的条目键值不重叠。 The data storage unit information of the main column cluster of the third partition includes K-layer data storage unit information, wherein the data storage unit information of the main column cluster of the third partition includes k-th layer of K-layer data storage unit information. The data storage unit information includes the k-th data storage unit information in the K-level data storage unit information of the K 1- level data storage unit information and the K-level data storage unit information in the K 2- level data storage unit information. k-level data storage unit information, where K is the minimum value of K 1 and K 2 , where any one of the k-level data storage unit information in the k-th data storage unit information of the K 1- level data storage unit information has an entry key value and the k-th layer data storage layer data K 2 information storage unit according to any one data unit key entry information storage unit of the information do not overlap.
  11. 如权利要求9或10所述的方法,其特征在于,所述主列簇的数据存放单元信息中的每个数据存放单元信息中的条目键值的前缀是分区键值。The method according to claim 9 or 10, wherein a prefix of an entry key value in each data storage unit information in the data storage unit information of the main column cluster is a partition key value.
  12. 一种数据库服务器,其特征在于,所述数据库服务器包括:A database server, characterized in that the database server includes:
    通信单元,用于接收管理服务器发送的合并指令,所述合并指令用于实现将第一分区和第二分区合并为第三分区,其中所述第一分区和所述第二分区为相邻分区;所述合并指令包含所述第一分区的当前文件的标识和所述第二分区的当前文件的标识;所述第一分区的当前文件中记录有存储所述第一分区的元数据的文件的文件标识;所述第二分区的当前文件中记录有存储所述第二分区的元数据的文件的文件标识,所述第一分区运行于所述数据库服务器,所述第二分区运行于另一数据库服务器;A communication unit, configured to receive a merge instruction sent by a management server, the merge instruction being used to implement merging a first partition and a second partition into a third partition, wherein the first partition and the second partition are adjacent partitions The merge instruction includes an identifier of a current file of the first partition and an identifier of a current file of the second partition; a file storing metadata of the first partition is recorded in the current file of the first partition The file identifier of the file storing the metadata of the second partition is recorded in the current file of the second partition, the first partition runs on the database server, and the second partition runs on another A database server;
    处理单元用于:The processing unit is used for:
    根据所述第一分区的当前文件的标识获取所述第一分区的元数据;Obtaining metadata of the first partition according to an identifier of a current file of the first partition;
    根据所述第二分区的当前文件的标识获取所述第二分区的元数据;Obtaining metadata of the second partition according to the identifier of the current file of the second partition;
    合并所述第一分区的元数据和所述第二分区的元数据生成所述第三分区的元数据。Merging metadata of the first partition and metadata of the second partition to generate metadata of the third partition.
  13. 如权利要求12所述的数据库服务器,其特征在于,所述第一分区的元数据包括所述第一分区的二级列簇的数据存放单元信息,所述第二分区的元数据包括所述第二分区的二级列簇的数据存放单元信息,The database server according to claim 12, wherein the metadata of the first partition includes data storage unit information of a secondary column cluster of the first partition, and the metadata of the second partition includes the The data of the secondary column cluster of the second partition stores unit information,
    所述处理单元,具体用于合并所述第一分区的二级列簇的数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息生成目标二级列簇的数据存放单元信息;The processing unit is specifically configured to combine data storage unit information of the secondary column clusters of the first partition and data storage unit information of the secondary column clusters of the second partition to generate a data storage unit of the target secondary column cluster information;
    根据所述目标二级列簇的数据存放单元信息确定所述第三分区的二级列簇的数据存放单元信息。The data storage unit information of the second-level column cluster of the third partition is determined according to the data storage unit information of the target second-level column cluster.
  14. 如权利要求13所述的数据库服务器,其特征在于,所述第一分区的二级列簇的数据存放单元信息包括P 1层数据存放单元信息,其中P 1为大于或等于2的正整数; The database server as claimed in claim 13, characterized in that the first two columns of the tufts of the partitioned data storage means includes information of layer data P 1 storing cell information, wherein P 1 is a positive integer equal to or greater than 2;
    所述第二分区的二级列簇的数据存放单元信息包括P 2层数据存放单元信息,其中P 2为大于或等于2的正整数; The data storage unit information of the second-level column clusters of the second partition includes P 2 layer data storage unit information, where P 2 is a positive integer greater than or equal to 2;
    所述目标二级列簇的数据存放单元信息包括Q层数据存放单元信息,所述Q层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息,其中,The data storage unit information of the target secondary column cluster includes Q layer data storage unit information, and the Q layer data storage unit information includes data storage unit information of the second column cluster of the first partition and the second partition. The data of the second-level column cluster stores unit information, where:
    所述Q层数据存放单元信息的一层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中的一层数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中的一层数据存放单元信息, The one-level data storage unit information of the Q-layer data storage unit information includes the one- level data storage unit information and the information in the P-level one-level data storage unit information included in the data storage unit information of the second-level column cluster of the first partition. two columns of clusters storing data of said second partition layer data units P 2 information comprises the data storage layer of the cell information storage unit information,
    所述Q层数据存放单元信息的Q-1层数据存放单元信息中的一层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中的一层数据存放单元信息或者所述第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中的一层数据存放单元信息,其中Q等于P 1+P 2-1。 Q-1 Q-layer data of the layer data storage unit of the information storage layer of the data storage means includes information of the cell information storage unit of the information data of the first two columns of the cluster comprises a partition layer P 1 of the data storage unit of the information P 2 layer hierarchy data storage unit of the information data in two columns one cluster data storage means of the second partition information or the data stored in the cell information includes a cell information storage, wherein Q is equal to P 1 + P 2 -1.
  15. 如权利要求14所述的数据库服务器,其特征在于,所述Q层数据存放单元信息中的第0层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息的第0层数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息的第0层数据存放单元信息,The database server according to claim 14, wherein the 0th-level data storage unit information in the Q-level data storage unit information includes the 0th-level data storage unit information of the second-level column cluster of the first partition. Layer data storage unit information and layer 0 data storage unit information of the second column cluster data storage unit information of the second partition,
    所述Q层数据存放单元信息中的第2×q-1层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中P层数据存放单元信息的第q层数据存放单元信息,所述Q层数据单元存放信息中的第2×q层数据存放单元信息包括所述第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中P层数据存放单元信息的第q层数据存放单元信息,q=1,……,P-1,其中P的取值为P 1与P 2中的最小值减1。 The 2 × q-1 layer data storage unit information in the Q layer data storage unit information includes the P layer data in the P 1 layer data storage unit information included in the data storage unit information of the secondary column cluster of the first partition. The q-level data storage unit information storing unit information, and the 2 × q-level data storage unit information in the Q-layer data unit storage information includes the data storage unit information of the second-level column cluster of the second partition. q-th layer data storage unit of the information layer 2 data is stored in the cell information storage unit P layer data information, q = 1, ......, P -1, wherein the value of P is P 1 P 2 and the minimum value of minus 1 .
  16. 如权利要求14所述的数据库服务器,其特征在于,所述Q层数据存放单元信息中的第0层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息的第0层数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息的第0层数据存放单元信息,The database server according to claim 14, wherein the 0th-level data storage unit information in the Q-level data storage unit information includes the 0th-level data storage unit information of the second-level column cluster of the first partition. Layer data storage unit information and layer 0 data storage unit information of the second column cluster data storage unit information of the second partition,
    所述Q层数据存放单元信息中的第1层数据存放单元信息至第P-1层数据存放单元信息分别为所述第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中P层数据存放单元信息的中的第1层数据存放单元信息至第P-1层数据存放单元信息; The layer Q data storing cell information in the first layer to the second data storage unit of the information layer P-1 data storage unit of the information layer, respectively P 1 data information storage unit column two clusters of the first partition comprises The layer 1 data storage unit information to the layer P-1 data storage unit information in the storage unit information of the P layer data storage unit information;
    所述Q层数据存放单元信息中的第P层数据存放单元信息至第Q-1层数据存放单元信息分别为所述第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中P层数据存放单元信息的中的第1层数据存放单元信息至第P-1层数据存放单元信息,其中P的取值为P 1与P 2中的最小值减1。 The P layer data storage unit information to the Q-1 layer data storage unit information in the Q layer data storage unit information are the P 2 layer data included in the data storage unit information of the second column cluster of the second partition. Among the storage unit information, the layer 1 data storage unit information to the layer P-1 data storage unit information in the P layer data storage unit information, where the value of P is the minimum value of P 1 and P 2 minus 1.
  17. 如权利要求15或16所述的数据库服务器,其特征在于,所述处理单元,还用于根据所述目标二级列簇的数据存放单元信息,确定所述第三分区的二级列簇的数据存放单 元信息,其中所述第三分区的二级列簇的数据存放单元信息包括P层数据存放单元信息,其中所述第三分区的二级列簇的数据存放单元信息的P层数据单元存放信息中第1层数据存放单元信息是所述Q层数据存放单元信息的第0层数据存放单元信息对应的数据存放单元进行归并重排后的得到的数据存放单元的数据存放单元信息,所述第三分区的二级列簇的数据存放单元信息的P层数据单元存放信息中的第2层至第P-1层数据存放单元信息中的每层数据存放单元信息是所述Q层数据存放单元信息的第1层数据存放单元信息至第Q-1层数据存放单元信息中的至少两层数据存放单元信息对应的数据存放单元进行归并重排后的得到的数据存放单元的数据存放单元信息。The database server according to claim 15 or 16, wherein the processing unit is further configured to determine the secondary column cluster of the third partition according to the data storage unit information of the target secondary column cluster. Data storage unit information, wherein the data storage unit information of the second-level column cluster of the third partition includes P-level data storage unit information, wherein the data storage unit information of the second-level column cluster of the third partition is P-level data unit The first-level data storage unit information in the storage information is the data storage unit information of the data storage unit obtained by merging and rearranging the data storage unit corresponding to the 0th-level data storage unit information of the Q-level data storage unit information. The layer 2 data storage unit information of the second-level column cluster of the third partition is the layer 2 to layer P-1 data storage unit information. Each layer of data storage unit information is the Q layer data. The data storage unit corresponding to at least two layers of data storage unit information in the layer 1 data storage unit information to the layer Q-1 layer data storage unit information Data storage units obtained after normalization and rearranging information storage unit.
  18. 如权利要求13至17中任一项所述的数据库服务器,其特征在于,所述二级列簇的数据存放单元信息中的每个数据存放单元信息中的条目键值的前缀是非分区键值。The database server according to any one of claims 13 to 17, wherein a prefix of an entry key value in each data storage unit information in the data storage unit information of the secondary column cluster is a non-partitioned key value .
  19. 如权利要求12至18中任一项所述的数据库服务器,其特征在于,所述第一分区的元数据还包括所述第一分区的预写日志信息集合,所述第二分区的元数据还包括所述第二分区的预写日志信息集合,所述处理单元,还用于合并所述第一分区的预写日志信息集合和所述第二分区的预写日志信息集合生成所述第三分区的预写日志信息集合,其中,所述第三分区的预写日志信息集合包括所述第一分区的预写日志信息集合中的预写日志信息以及所述第二分区的预写日志信息集合中的预写日志信息,其中N为大于或等于2的正整数,N 1和N 2为大于或等于1的正整数且N 1与N 2的和为N。 The database server according to any one of claims 12 to 18, wherein the metadata of the first partition further comprises a set of write-ahead log information of the first partition, and the metadata of the second partition It further includes a pre-write log information set of the second partition, and the processing unit is further configured to combine the pre-write log information set of the first partition and the pre-write log information set of the second partition to generate the first The three-part write-ahead log information set, wherein the third-part write-ahead log information set includes the pre-write log information in the pre-write log information set of the first partition and the pre-write log of the second partition The write-ahead log information in the information set, where N is a positive integer greater than or equal to 2, N 1 and N 2 are positive integers greater than or equal to 1, and the sum of N 1 and N 2 is N.
  20. 如权利要求12至19中任一项所述的数据库服务器,其特征在于,所述第一分区的元数据还包括所述第一分区的主列簇的数据存放单元信息,所述第二分区的元数据还包括所述第二分区的主列簇的数据存放单元信息,所述处理单元,还用于合并所述第一分区的主列簇的数据存放单元信息和所述第二分区的主列簇的数据存放单元信息生成所述第三分区的主列簇的数据存放单元信息。The database server according to any one of claims 12 to 19, wherein the metadata of the first partition further comprises data storage unit information of a main column cluster of the first partition, and the second partition The metadata further includes data storage unit information of the main column clusters of the second partition, and the processing unit is further configured to merge the data storage unit information of the main column clusters of the first partition and the second partition's The data storage unit information of the main column cluster generates data storage unit information of the main column cluster of the third partition.
  21. 如权利要求20所述的数据库服务器,其特征在于,所述第一分区的主列簇的数据存放单元信息包括K 1层数据存放单元信息,其中K 1为大于或等于1的正整数, The database server as claimed in claim 20, characterized in that the main column of the first cluster storing data partition includes a cell information storage layer data K 1 information units, wherein K 1 is a positive integer equal to or greater than 1,
    所述第二分区的主列簇的数据存放单元信息包括K 2层数据存放单元信息,其中K 2为大于或等于1的正整数; The data storage unit information of the main column cluster of the second partition includes K 2 layer data storage unit information, where K 2 is a positive integer greater than or equal to 1.
    所述第三分区的主列簇的数据存放单元信息包括K层数据存放单元信息,其中所述第三分区的主列簇的数据存放单元信息包括的K层数据存放单元信息中的第k层数据存放单元信息包括所述K 1层数据存放单元信息的K层数据存放单元信息中的第k层数据存放单元信息和所述K 2层数据存放单元信息的K层数据存放单元信息中的第k层数据存放单元信息,其中K为K 1与K 2的最小值,其中所述K 1层数据存放单元信息的第k层数据存放单元信息中的任一个数据存放单元信息的条目键值和所述K 2层数据存放单元信息的第k层数据存放单元信息中的任一个数据存放单元信息的条目键值不重叠。 The data storage unit information of the main column cluster of the third partition includes K-layer data storage unit information, wherein the data storage unit information of the main column cluster of the third partition includes k-th layer of K-layer data storage unit information. The data storage unit information includes the k-th data storage unit information in the K-level data storage unit information of the K 1- level data storage unit information and the K-level data storage unit information in the K 2- level data storage unit information. k-level data storage unit information, where K is the minimum value of K 1 and K 2 , where any one of the k-level data storage unit information in the k-th data storage unit information of the K 1- level data storage unit information has an entry key value and the k-th layer data storage layer data K 2 information storage unit according to any one data unit key entry information storage unit of the information do not overlap.
  22. 如权利要求20或21所述的数据库服务器,其特征在于,所述主列簇的数据存放单元信息中的每个数据存放单元信息中的条目键值的前缀是分区键值。The database server according to claim 20 or 21, wherein a prefix of an entry key value in each data storage unit information in the data storage unit information of the main column cluster is a partition key value.
  23. 一种数据库服务器,其特征在于,所述数据库服务器包括通信接口和处理器,所述通信接口和所述处理器通信;其中,A database server, characterized in that the database server includes a communication interface and a processor, and the communication interface communicates with the processor; wherein,
    所述通信接口,用于接收管理服务器发送的合并指令,所述合并指令用于实现将第一 分区和第二分区合并为第三分区,其中所述第一分区和所述第二分区为相邻分区;所述合并指令包含所述第一分区的当前文件的标识和所述第二分区的当前文件的标识;所述第一分区的当前文件中记录有存储所述第一分区的元数据的文件的文件标识;所述第二分区的当前文件中记录有存储所述第二分区的元数据的文件的文件标识,所述第一分区运行于所述数据库服务器,所述第二分区运行于另一数据库服务器;The communication interface is configured to receive a merge instruction sent by a management server, where the merge instruction is used to implement merging a first partition and a second partition into a third partition, wherein the first partition and the second partition are related to each other. Adjacent partitions; the merge instruction includes an identifier of the current file of the first partition and an identifier of the current file of the second partition; and the metadata of the first partition is recorded in the current file of the first partition The file identifier of the file; the file identifier of the file storing the metadata of the second partition is recorded in the current file of the second partition, the first partition runs on the database server, and the second partition runs On another database server;
    所述处理器用于:The processor is configured to:
    根据所述第一分区的当前文件的标识获取所述第一分区的元数据;Obtaining metadata of the first partition according to an identifier of a current file of the first partition;
    根据所述第二分区的当前文件的标识获取所述第二分区的元数据;Obtaining metadata of the second partition according to the identifier of the current file of the second partition;
    合并所述第一分区的元数据和所述第二分区的元数据生成所述第三分区的元数据。Merging metadata of the first partition and metadata of the second partition to generate metadata of the third partition.
  24. 如权利要求23所述的数据库服务器,其特征在于,所述第一分区的元数据包括所述第一分区的二级列簇的数据存放单元信息,所述第二分区的元数据包括所述第二分区的二级列簇的数据存放单元信息,The database server according to claim 23, wherein the metadata of the first partition includes data storage unit information of a secondary column cluster of the first partition, and the metadata of the second partition includes the The data of the secondary column cluster of the second partition stores unit information,
    所述处理器,具体用于合并所述第一分区的二级列簇的数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息生成目标二级列簇的数据存放单元信息;The processor is specifically configured to merge data storage unit information of the second-level column clusters of the first partition and data storage unit information of the second-level column clusters of the second partition to generate a data storage unit of the target second-level column cluster. information;
    根据所述目标二级列簇的数据存放单元信息确定所述第三分区的二级列簇的数据存放单元信息。The data storage unit information of the second-level column cluster of the third partition is determined according to the data storage unit information of the target second-level column cluster.
  25. 如权利要求24所述的数据库服务器,其特征在于,所述第一分区的二级列簇的数据存放单元信息包括P 1层数据存放单元信息,其中P 1为大于或等于2的正整数; The database server as claimed in claim 24, characterized in that the first two columns of the tufts of the partitioned data storage means includes information of layer data P 1 storing cell information, wherein P 1 is a positive integer equal to or greater than 2;
    所述第二分区的二级列簇的数据存放单元信息包括P 2层数据存放单元信息,其中P 2为大于或等于2的正整数; The data storage unit information of the second-level column clusters of the second partition includes P 2 layer data storage unit information, where P 2 is a positive integer greater than or equal to 2;
    所述目标二级列簇的数据存放单元信息包括Q层数据存放单元信息,所述Q层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息,其中,The data storage unit information of the target secondary column cluster includes Q layer data storage unit information, and the Q layer data storage unit information includes data storage unit information of the second column cluster of the first partition and the second partition. The data of the second-level column cluster stores unit information, where:
    所述Q层数据存放单元信息的一层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中的一层数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中的一层数据存放单元信息, The one-level data storage unit information of the Q-layer data storage unit information includes the one- level data storage unit information and the information in the P-level one-level data storage unit information included in the data storage unit information of the second-level column cluster of the first partition. two columns of clusters storing data of said second partition layer data units P 2 information comprises the data storage layer of the cell information storage unit information,
    所述Q层数据存放单元信息的Q-1层数据存放单元信息中的一层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中的一层数据存放单元信息或者所述第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中的一层数据存放单元信息,其中Q等于P 1+P 2-1。 Q-1 Q-layer data of the layer data storage unit of the information storage layer of the data storage means includes information of the cell information storage unit of the information data of the first two columns of the cluster comprises a partition layer P 1 of the data storage unit of the information P 2 layer hierarchy data storage unit of the information data in two columns one cluster data storage means of the second partition information or the data stored in the cell information includes a cell information storage, wherein Q is equal to P 1 + P 2 -1.
  26. 如权利要求25所述的数据库服务器,其特征在于,所述Q层数据存放单元信息中的第0层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息的第0层数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息的第0层数据存放单元信息,The database server according to claim 25, wherein the 0th-level data storage unit information in the Q-level data storage unit information includes the 0th-level data storage unit information of the second-level column cluster of the first partition. Layer data storage unit information and layer 0 data storage unit information of the second column cluster data storage unit information of the second partition,
    所述Q层数据存放单元信息中的第2×q-1层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中P层数据存放单元信息的第q层数据存放单元信息,所述Q层数据单元存放信息中的第2×q层数据存放单元信息包括所 述第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中P层数据存放单元信息的第q层数据存放单元信息,q=1,……,P-1,其中P的取值为P 1与P 2中的最小值减1。 The 2 × q-1 layer data storage unit information in the Q layer data storage unit information includes the P layer data in the P 1 layer data storage unit information included in the data storage unit information of the secondary column cluster of the first partition. The q-level data storage unit information storing unit information, and the 2 × q-level data storage unit information in the Q-layer data unit storage information includes the data storage unit information of the second-level column cluster of the second partition. q-th layer data storage unit of the information layer 2 data is stored in the cell information storage unit P layer data information, q = 1, ......, P -1, wherein the value of P is P 1 P 2 and the minimum value of minus 1 .
  27. 如权利要求25所述的数据库服务器,其特征在于,所述Q层数据存放单元信息中的第0层数据存放单元信息包括所述第一分区的二级列簇的数据存放单元信息的第0层数据存放单元信息和所述第二分区的二级列簇的数据存放单元信息的第0层数据存放单元信息,The database server according to claim 25, wherein the 0th-level data storage unit information in the Q-level data storage unit information includes the 0th-level data storage unit information of the second-level column cluster of the first partition. Layer data storage unit information and layer 0 data storage unit information of the second column cluster data storage unit information of the second partition,
    所述Q层数据存放单元信息中的第1层数据存放单元信息至第P-1层数据存放单元信息分别为所述第一分区的二级列簇的数据存放单元信息包括的P 1层数据存放单元信息中P层数据存放单元信息的中的第1层数据存放单元信息至第P-1层数据存放单元信息; The layer Q data storing cell information in the first layer to the second data storage unit of the information layer P-1 data storage unit of the information layer, respectively P 1 data information storage unit column two clusters of the first partition comprises The layer 1 data storage unit information to the layer P-1 data storage unit information in the storage unit information of the P layer data storage unit information;
    所述Q层数据存放单元信息中的第P层数据存放单元信息至第Q-1层数据存放单元信息分别为所述第二分区的二级列簇的数据存放单元信息包括的P 2层数据存放单元信息中P层数据存放单元信息的中的第1层数据存放单元信息至第P-1层数据存放单元信息,其中P的取值为P 1与P 2中的最小值减1。 The P layer data storage unit information to the Q-1 layer data storage unit information in the Q layer data storage unit information are the P 2 layer data included in the data storage unit information of the second column cluster of the second partition. Among the storage unit information, the layer 1 data storage unit information to the layer P-1 data storage unit information in the P layer data storage unit information, where the value of P is the minimum value of P 1 and P 2 minus 1.
  28. 如权利要求26或27所述的数据库服务器,其特征在于,所述处理器,还用于根据所述目标二级列簇的数据存放单元信息,确定所述第三分区的二级列簇的数据存放单元信息,其中所述第三分区的二级列簇的数据存放单元信息包括P层数据存放单元信息,其中所述第三分区的二级列簇的数据存放单元信息的P层数据单元存放信息中第1层数据存放单元信息是所述Q层数据存放单元信息的第0层数据存放单元信息对应的数据存放单元进行归并重排后的得到的数据存放单元的数据存放单元信息,所述第三分区的二级列簇的数据存放单元信息的P层数据单元存放信息中的第2层至第P-1层数据存放单元信息中的每层数据存放单元信息是所述Q层数据存放单元信息的第1层数据存放单元信息至第Q-1层数据存放单元信息中的至少两层数据存放单元信息对应的数据存放单元进行归并重排后的得到的数据存放单元的数据存放单元信息。The database server according to claim 26 or 27, wherein the processor is further configured to determine the secondary column cluster of the third partition according to the data storage unit information of the target secondary column cluster. Data storage unit information, wherein the data storage unit information of the second-level column cluster of the third partition includes P-level data storage unit information, wherein the data storage unit information of the second-level column cluster of the third partition is P-level data unit The first-level data storage unit information in the storage information is the data storage unit information of the data storage unit obtained by merging and rearranging the data storage unit corresponding to the 0th-level data storage unit information of the Q-level data storage unit information. The layer 2 data storage unit information of the second-level column cluster of the third partition is the layer 2 to layer P-1 data storage unit information. Each layer of data storage unit information is the Q layer data. The data storage unit corresponding to at least two layers of the data storage unit information from the layer 1 data storage unit information to the layer Q-1 layer data storage unit information is stored. The data storage unit information of the data storage unit obtained after the rearrangement.
  29. 如权利要求24至28中任一项所述的数据库服务器,其特征在于,所述二级列簇的数据存放单元信息中的每个数据存放单元信息中的条目键值的前缀是非分区键值。The database server according to any one of claims 24 to 28, wherein a prefix of an entry key value in each data storage unit information in the data storage unit information of the secondary column cluster is a non-partitioned key value .
  30. 如权利要求23至29中任一项所述的数据库服务器,其特征在于,所述第一分区的元数据还包括所述第一分区的预写日志信息集合,所述第二分区的元数据还包括所述第二分区的预写日志信息集合,所述处理器,还用于合并所述第一分区的预写日志信息集合和所述第二分区的预写日志信息集合生成所述第三分区的预写日志信息集合,其中,所述第三分区的预写日志信息集合包括所述第一分区的预写日志信息集合中的预写日志信息以及所述第二分区的预写日志信息集合中的预写日志信息,其中N为大于或等于2的正整数,N 1和N 2为大于或等于1的正整数且N 1与N 2的和为N。 The database server according to any one of claims 23 to 29, wherein the metadata of the first partition further comprises a set of write-ahead log information of the first partition, and the metadata of the second partition The processor further includes a write-ahead log information set of the second partition, and the processor is further configured to merge the write-ahead log information set of the first partition and the write-ahead log information set of the second partition to generate the first The three-part write-ahead log information set, wherein the third-part write-ahead log information set includes the pre-write log information in the pre-write log information set of the first partition and the pre-write log of the second partition The write-ahead log information in the information set, where N is a positive integer greater than or equal to 2, N 1 and N 2 are positive integers greater than or equal to 1, and the sum of N 1 and N 2 is N.
  31. 如权利要求23至30中任一项所述的数据库服务器,其特征在于,所述第一分区的元数据还包括所述第一分区的主列簇的数据存放单元信息,所述第二分区的元数据还包括所述第二分区的主列簇的数据存放单元信息,所述处理器,还用于合并所述第一分区的主列簇的数据存放单元信息和所述第二分区的主列簇的数据存放单元信息生成所述第三分区的主列簇的数据存放单元信息。The database server according to any one of claims 23 to 30, wherein the metadata of the first partition further comprises data storage unit information of a main column cluster of the first partition, and the second partition The metadata further includes data storage unit information of the main column cluster of the second partition, and the processor is further configured to merge the data storage unit information of the main column cluster of the first partition and the second partition's The data storage unit information of the main column cluster generates data storage unit information of the main column cluster of the third partition.
  32. 如权利要求31所述的数据库服务器,其特征在于,所述第一分区的主列簇的数据存放单元信息包括K 1层数据存放单元信息,其中K 1为大于或等于1的正整数, The database server as claimed in claim 31, characterized in that the main column of the first cluster storing data partition includes a cell information storage layer data K 1 information units, wherein K 1 is a positive integer equal to or greater than 1,
    所述第二分区的主列簇的数据存放单元信息包括K 2层数据存放单元信息,其中K 2为大于或等于1的正整数; The data storage unit information of the main column cluster of the second partition includes K 2 layer data storage unit information, where K 2 is a positive integer greater than or equal to 1.
    所述第三分区的主列簇的数据存放单元信息包括K层数据存放单元信息,其中所述第三分区的主列簇的数据存放单元信息包括的K层数据存放单元信息中的第k层数据存放单元信息包括所述K 1层数据存放单元信息的K层数据存放单元信息中的第k层数据存放单元信息和所述K 2层数据存放单元信息的K层数据存放单元信息中的第k层数据存放单元信息,其中K为K 1与K 2的最小值,其中所述K 1层数据存放单元信息的第k层数据存放单元信息中的任一个数据存放单元信息的条目键值和所述K 2层数据存放单元信息的第k层数据存放单元信息中的任一个数据存放单元信息的条目键值不重叠。 The data storage unit information of the main column cluster of the third partition includes K-layer data storage unit information, wherein the data storage unit information of the main column cluster of the third partition includes k-th layer of K-layer data storage unit information. The data storage unit information includes the k-th data storage unit information in the K-level data storage unit information of the K 1- level data storage unit information and the K-level data storage unit information in the K 2- level data storage unit information. k-level data storage unit information, where K is the minimum value of K 1 and K 2 , where any one of the k-level data storage unit information in the k-th data storage unit information of the K 1- level data storage unit information has an entry key value and the k-th layer data storage layer data K 2 information storage unit according to any one data unit key entry information storage unit of the information do not overlap.
  33. 如权利要求31或32所述的数据库服务器,其特征在于,所述主列簇的数据存放单元信息中的每个数据存放单元信息中的条目键值的前缀是分区键值。The database server according to claim 31 or 32, wherein a prefix of an entry key value in each data storage unit information in the data storage unit information of the main column cluster is a partition key value.
  34. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质包含计算机指令,当数据库服务器运行所述计算机指令,使得所述数据库服务器执行如权利要求1至权利要求11中任一项权利要求所述的方法。A computer-readable storage medium, characterized in that the computer-readable storage medium includes computer instructions, and when a database server runs the computer instructions, the database server executes any one of claims 1 to 11 The method of claim.
    .
PCT/CN2019/097559 2018-08-14 2019-07-24 Partition merging method and database server WO2020034818A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19849439.5A EP3825866A4 (en) 2018-08-14 2019-07-24 Partition merging method and database server
US17/171,706 US11762881B2 (en) 2018-08-14 2021-02-09 Partition merging method and database server

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201810919475.9 2018-08-14
CN201810919475 2018-08-14
CN201811147298.3A CN110825794B (en) 2018-08-14 2018-09-29 Partition merging method and database server
CN201811147298.3 2018-09-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/171,706 Continuation US11762881B2 (en) 2018-08-14 2021-02-09 Partition merging method and database server

Publications (1)

Publication Number Publication Date
WO2020034818A1 true WO2020034818A1 (en) 2020-02-20

Family

ID=69525127

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/097559 WO2020034818A1 (en) 2018-08-14 2019-07-24 Partition merging method and database server

Country Status (1)

Country Link
WO (1) WO2020034818A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593436A (en) * 2013-11-12 2014-02-19 华为技术有限公司 File merging method and device
CN106156168A (en) * 2015-04-16 2016-11-23 华为技术有限公司 The method of data is being inquired about and across subregion inquiry unit in partitioned data base
CN108959510A (en) * 2018-06-27 2018-12-07 阿里巴巴集团控股有限公司 A kind of partition level connection method of distributed data base and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593436A (en) * 2013-11-12 2014-02-19 华为技术有限公司 File merging method and device
CN106156168A (en) * 2015-04-16 2016-11-23 华为技术有限公司 The method of data is being inquired about and across subregion inquiry unit in partitioned data base
CN108959510A (en) * 2018-06-27 2018-12-07 阿里巴巴集团控股有限公司 A kind of partition level connection method of distributed data base and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP3825866A4 *
ZHANG QIAN : "Research on High Efficient Data Mining Algorithm Under the Distributed Environment", CHINA MASTER'S THESES FULL-TEXT DATABASE, 1 March 2018 (2018-03-01), pages 1 - 61, XP055778717 *

Similar Documents

Publication Publication Date Title
CN110825794B (en) Partition merging method and database server
US20200322159A1 (en) Method for index-based and integrity-assured search in a blockchain
CN105975587B (en) A kind of high performance memory database index organization and access method
US20200210399A1 (en) Signature-based cache optimization for data preparation
US20170109378A1 (en) Distributed pipeline optimization for data preparation
CN108600321A (en) A kind of diagram data storage method and system based on distributed memory cloud
CN109255055A (en) A kind of diagram data access method and device based on packet associated table
CN105677761A (en) Data sharding method and system
US20170109389A1 (en) Step editor for data preparation
US10592153B1 (en) Redistributing a data set amongst partitions according to a secondary hashing scheme
CN111324305A (en) Data writing/reading method in distributed storage system
EP3362808A1 (en) Cache optimization for data preparation
US9239852B1 (en) Item collections
CN107273443B (en) Mixed indexing method based on metadata of big data model
WO2020034818A1 (en) Partition merging method and database server
CN111274259A (en) Data updating method for storage nodes in distributed storage system
JP6006740B2 (en) Index management device
CN102597969A (en) Database management device using key-value store with attributes, and key-value-store structure caching-device therefor
Kvet Database Block Management using Master Index
US20210056090A1 (en) Cache optimization for data preparation
US11275737B2 (en) Assignment of objects to processing engines for efficient database operations
Sehili et al. Multi-party privacy preserving record linkage in dynamic metric space
Li et al. A partition model and strategy based on the Stoer–Wagner algorithm for SaaS multi-tenant data
US11288447B2 (en) Step editor for data preparation
US20220335030A1 (en) Cache optimization for data preparation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19849439

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019849439

Country of ref document: EP

Effective date: 20210219